Repository logo
 
Publication

Early experiments on automatic annotation of Portuguese medieval texts

dc.contributor.authorBico, Maria Inês
dc.contributor.authorBaptista, Jorge
dc.contributor.authorBatista, Fernando
dc.contributor.authorCardeira, Esperança
dc.date.accessioned2023-01-24T10:40:05Z
dc.date.available2023-01-24T10:40:05Z
dc.date.issued2022
dc.description.abstractThis paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (similar to 155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively.pt_PT
dc.description.sponsorshipUIBD/152806/2022
dc.description.versioninfo:eu-repo/semantics/submittedVersionpt_PT
dc.identifier.doi10.1007/978-3-031-16802-4_44pt_PT
dc.identifier.issn0302-9743
dc.identifier.urihttp://hdl.handle.net/10400.1/18903
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringerpt_PT
dc.relationInstituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa
dc.relationCenter of Linguistics of the University of Lisbon
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectAutomatic annotationpt_PT
dc.subjectLemmatizationpt_PT
dc.subjectPart-of-speechpt_PT
dc.subjectTaggingpt_PT
dc.subjectOld portuguesept_PT
dc.titleEarly experiments on automatic annotation of Portuguese medieval textspt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleInstituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa
oaire.awardTitleCenter of Linguistics of the University of Lisbon
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F50021%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F00214%2F2020/PT
oaire.citation.endPage449pt_PT
oaire.citation.startPage442pt_PT
oaire.citation.titleLinking Theory and Practice of Digital Librariespt_PT
oaire.citation.volume13541pt_PT
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStream6817 - DCRRNI ID
person.familyNameBaptista
person.givenNameJorge
person.identifier.ciencia-id7010-5366-22C5
person.identifier.orcid0000-0003-4603-4364
person.identifier.ridH-7699-2013
person.identifier.scopus-author-id14035269500
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicatione817fa28-a005-40e2-9ba4-03fdaedd7df3
relation.isAuthorOfPublication.latestForDiscoverye817fa28-a005-40e2-9ba4-03fdaedd7df3
relation.isProjectOfPublication0b14d63a-8f78-4e31-8a86-b72e1f07871f
relation.isProjectOfPublication16097e28-ca04-4a9c-b4b5-70ee17a3747a
relation.isProjectOfPublication.latestForDiscovery0b14d63a-8f78-4e31-8a86-b72e1f07871f

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lemmatization_and_POS_of_PT_Medieval_Texts.pdf
Size:
349.21 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.46 KB
Format:
Item-specific license agreed upon to submission
Description: