Utilize este identificador para referenciar este registo: http://hdl.handle.net/10400.1/4884
Título: Mapping, filtering and measuring impact of ambiguous words in Portuguese
Autor: Baptista, Jorge
Faísca, Luís
Palavras-chave: Processamento Computacional de Linguagem Natural
Línguística de corpora
Data: 2007
Editora: Presses Universitaires de Franche-Comté
Citação: Baptista, Jorge; Faísca, Luís. Mapping, filtering and measuring impact of ambiguous words in Portuguese, In Formaliser les langues avec l’ordinateur: de INTEX à Nooj, 305-324, ISBN: 978-2-84867-189-5. Besançon: Presses Universitaires de Franche-Comté, 2007.
Resumo: This paper deals with ambiguous simple words of Portuguese. The Portuguese dictionary of simple inflected words contains (DELAF) 936.215 entries, from which there are 889.986 different inflected forms. It is possible to obtain the full list of ambiguous inflected forms (43.126), that is, word forms belonging to different categories and/or lemmas: capital,A/N/N (capital). We may consider A/N/N an ambiguity class. There are 137 ambiguity classes. Each ambiguity class presents a certain level of ambiguity (Amb) that corresponds to the number of lexical entries associated to each ambiguous form (again, for class A/N/N Amb=3). Based on this information it is possible to map how ambiguity affects the lexicon. Using the frequency information associated to the list of tokens of a large corpus (the CETEMPÚBLICO corpus, with 200 million words), it is possible to calculate how ambiguity affects real texts. Combining the two types of information, it is possible to devise and evaluate different strategies to reduce lexical ambiguity.
Peer review: yes
URI: http://hdl.handle.net/10400.1/4884
ISBN: 978-2-84867-189-5
Aparece nas colecções:FCH3-Livros (ou partes, com ou sem arbitragem científica)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Mapping, filtering and measuring impact of ambiguous words in Portuguese.pdf347,94 kBAdobe PDFVer/Abrir


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote Degois 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.