Publication
Early experiments on automatic annotation of Portuguese medieval texts
dc.contributor.author | Bico, Maria Inês | |
dc.contributor.author | Baptista, Jorge | |
dc.contributor.author | Batista, Fernando | |
dc.contributor.author | Cardeira, Esperança | |
dc.date.accessioned | 2023-01-24T10:40:05Z | |
dc.date.available | 2023-01-24T10:40:05Z | |
dc.date.issued | 2022 | |
dc.description.abstract | This paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (similar to 155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively. | pt_PT |
dc.description.sponsorship | UIBD/152806/2022 | |
dc.description.version | info:eu-repo/semantics/submittedVersion | pt_PT |
dc.identifier.doi | 10.1007/978-3-031-16802-4_44 | pt_PT |
dc.identifier.issn | 0302-9743 | |
dc.identifier.uri | http://hdl.handle.net/10400.1/18903 | |
dc.language.iso | eng | pt_PT |
dc.peerreviewed | yes | pt_PT |
dc.publisher | Springer | pt_PT |
dc.relation | Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa | |
dc.relation | Center of Linguistics of the University of Lisbon | |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Automatic annotation | pt_PT |
dc.subject | Lemmatization | pt_PT |
dc.subject | Part-of-speech | pt_PT |
dc.subject | Tagging | pt_PT |
dc.subject | Old portuguese | pt_PT |
dc.title | Early experiments on automatic annotation of Portuguese medieval texts | pt_PT |
dc.type | journal article | |
dspace.entity.type | Publication | |
oaire.awardTitle | Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa | |
oaire.awardTitle | Center of Linguistics of the University of Lisbon | |
oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50021%2F2020/PT | |
oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F00214%2F2020/PT | |
oaire.citation.endPage | 449 | pt_PT |
oaire.citation.startPage | 442 | pt_PT |
oaire.citation.title | Linking Theory and Practice of Digital Libraries | pt_PT |
oaire.citation.volume | 13541 | pt_PT |
oaire.fundingStream | 6817 - DCRRNI ID | |
oaire.fundingStream | 6817 - DCRRNI ID | |
person.familyName | Baptista | |
person.givenName | Jorge | |
person.identifier.ciencia-id | 7010-5366-22C5 | |
person.identifier.orcid | 0000-0003-4603-4364 | |
person.identifier.rid | H-7699-2013 | |
person.identifier.scopus-author-id | 14035269500 | |
project.funder.identifier | http://doi.org/10.13039/501100001871 | |
project.funder.identifier | http://doi.org/10.13039/501100001871 | |
project.funder.name | Fundação para a Ciência e a Tecnologia | |
project.funder.name | Fundação para a Ciência e a Tecnologia | |
rcaap.rights | openAccess | pt_PT |
rcaap.type | article | pt_PT |
relation.isAuthorOfPublication | e817fa28-a005-40e2-9ba4-03fdaedd7df3 | |
relation.isAuthorOfPublication.latestForDiscovery | e817fa28-a005-40e2-9ba4-03fdaedd7df3 | |
relation.isProjectOfPublication | 0b14d63a-8f78-4e31-8a86-b72e1f07871f | |
relation.isProjectOfPublication | 16097e28-ca04-4a9c-b4b5-70ee17a3747a | |
relation.isProjectOfPublication.latestForDiscovery | 0b14d63a-8f78-4e31-8a86-b72e1f07871f |