Logo do repositório
 
A carregar...
Miniatura
Publicação

Authorship attribution in portuguese using character N-grams

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
11987.pdf163.08 KBAdobe PDF Ver/Abrir

Orientador(es)

Resumo(s)

For the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experiments with different feature representations and machine-learning algorithms. Moreover, the paper demonstrates that the performance of the character n-gram approach can be improved by fine-tuning the feature set and by appropriately selecting the length and type of character n-grams. This relatively simple and language-independent approach to the AA task outperforms both a bag-of-words baseline and other approaches, using the same corpus.

Descrição

Palavras-chave

Language Authorship attribution Character n-grams Portuguese Stylometry Computational linguistics Machine learning

Contexto Educativo

Citação

Projetos de investigação

Projeto de investigaçãoVer mais

Unidades organizacionais

Fascículo

Editora

Budapest University of Technology and Economics

Licença CC

Métricas Alternativas