Publicação
Leveraging NLP and machine learning for English (L1) writing assessment in developmental education
| datacite.subject.sdg | 04:Educação de Qualidade | |
| datacite.subject.sdg | 10:Reduzir as Desigualdades | |
| datacite.subject.sdg | 09:Indústria, Inovação e Infraestruturas | |
| dc.contributor.author | Da Corte, Miguel | |
| dc.contributor.author | Baptista, Jorge | |
| dc.date.accessioned | 2026-05-14T11:30:12Z | |
| dc.date.available | 2026-05-14T11:30:12Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | This study investigates using machine learning and linguistic features to predict placements in Developmental Education (DevEd) courses based on English (L1) writing proficiency. Placement in these courses is often performed using systems like ACCUPLACER, which automatically assesses and scores standardized writing assignments in entrance exams. Literature on ACCUPLACER’s assessment methods and the features accounted for in the scoring process is scarce. To identify the linguistic features important for placement decisions, 100 essays were randomly selected and analyzed from a pool of essays written by 290 native speakers. A total of 457 Linguistic attributes were extracted using COH-METRIX (106), the Common Text Analysis Platform (CTAP) (330), plus 21 DevEd-specific features produced by the manual annotation of the corpus. Using the ORANGE Text Mining toolkit, several supervised Machine-learning (ML) experiments with two classification scenarios (full and split sample essays) were conducted to determine the best linguistic features and bestperforming ML algorithm. Results revealed that the Naive Bayes, with a selection of the 30 highest-ranking features (21 CTAP, 7 COH-METRIX, 2 DevEd-specific) based on the Information Gain scoring method, achieved a classification accuracy (CA) of 77.3%, improving to 81.8% with 60 features. This approach surpassed the baseline accuracy of 72.7% for the full essay scenario, demonstrating enhanced placement accuracy and providing new insights into students’ linguistic skills in DevEd. | eng |
| dc.identifier.doi | 10.5220/0012740500003693 | |
| dc.identifier.uri | http://hdl.handle.net/10400.1/28962 | |
| dc.language.iso | eng | |
| dc.peerreviewed | yes | |
| dc.publisher | SCITEPRESS - Science and Technology Publications | |
| dc.relation | Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa | |
| dc.relation.ispartof | Proceedings of the 16th International Conference on Computer Supported Education | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.subject | Developmental education (DevEd) | |
| dc.subject | Automatic writing assessment systems | |
| dc.subject | Natural language processing (NLP) | |
| dc.subject | Machine-learning models | |
| dc.title | Leveraging NLP and machine learning for English (L1) writing assessment in developmental education | eng |
| dc.type | conference object | |
| dspace.entity.type | Publication | |
| oaire.awardNumber | UIDB/50021/2020 | |
| oaire.awardTitle | Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50021%2F2020/PT | |
| oaire.citation.endPage | 140 | |
| oaire.citation.startPage | 128 | |
| oaire.citation.title | Proceedings of the 16th International Conference on Computer Supported Education | |
| oaire.citation.volume | 2 | |
| oaire.fundingStream | 6817 - DCRRNI ID | |
| oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
| person.familyName | Da Corte | |
| person.familyName | Baptista | |
| person.givenName | Miguel | |
| person.givenName | Jorge | |
| person.identifier.ciencia-id | 7010-5366-22C5 | |
| person.identifier.orcid | 0000-0001-8782-8377 | |
| person.identifier.orcid | 0000-0003-4603-4364 | |
| person.identifier.rid | H-7699-2013 | |
| person.identifier.scopus-author-id | 14035269500 | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| relation.isAuthorOfPublication | 4a524eae-b359-47fa-8978-028ac5ffb57e | |
| relation.isAuthorOfPublication | e817fa28-a005-40e2-9ba4-03fdaedd7df3 | |
| relation.isAuthorOfPublication.latestForDiscovery | 4a524eae-b359-47fa-8978-028ac5ffb57e | |
| relation.isProjectOfPublication | 0b14d63a-8f78-4e31-8a86-b72e1f07871f | |
| relation.isProjectOfPublication.latestForDiscovery | 0b14d63a-8f78-4e31-8a86-b72e1f07871f |
