Repository logo
 
Loading...
Thumbnail Image
Publication

Authorship attribution in portuguese using character N-grams

Use this identifier to reference this record.
Name:Description:Size:Format: 
11987.pdf163.08 KBAdobe PDF Download

Advisor(s)

Abstract(s)

For the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experiments with different feature representations and machine-learning algorithms. Moreover, the paper demonstrates that the performance of the character n-gram approach can be improved by fine-tuning the feature set and by appropriately selecting the length and type of character n-grams. This relatively simple and language-independent approach to the AA task outperforms both a bag-of-words baseline and other approaches, using the same corpus.

Description

Keywords

Language Authorship attribution Character n-grams Portuguese Stylometry Computational linguistics Machine learning

Citation

Research Projects

Research ProjectShow more

Organizational Units

Journal Issue

Publisher

Budapest University of Technology and Economics

CC License

Altmetrics

Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 22
  • Captures
    • Readers: 20
see details