Assisting European Portuguese teaching: linguistic features extraction and automatic readability classifier

Curto, Pedro; Mamede, Nuno; Baptista, Jorge

http://hdl.handle.net/10400.1/9766

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
9766.pdf		193.62 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Curto, Pedro

Mamede, Nuno

Baptista, Jorge

Resumo(s)

This paper describes two automatic systems: a linguistic features extractor and a text readability classifier for European Portuguese texts. Its main goal is to assist the selection of adequate reading materials to support Portuguese teaching, especially as a second language. To the feature extraction from texts, the system uses several Natural Language Processing (NLP) tools. Currently, 52 features are extracted: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, among others. A classifier was created using these features and a corpus, previously annotated readability level, adopting the five-levels language classification official standard for Portuguese as Second Language. In a five-levels (from A1 to C1) scenario, the best-performing learning algorithm (LogitBoost) achieved an accuracy of 75.11% with a root mean square error (RMSE) of 0.269. In a three-levels (A, B and C) scenario, the best-performing learning algorithm (C4.5 grafted) achieved 81.44% accuracy, with a RMSE of 0.346.