Name: | Description: | Size: | Format: | |
---|---|---|---|---|
347.94 KB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
This paper deals with ambiguous simple words of Portuguese. The Portuguese dictionary of simple inflected words contains (DELAF) 936.215 entries, from which there are 889.986 different inflected forms. It is possible to obtain the full list of ambiguous inflected forms (43.126), that is, word forms belonging to different categories and/or lemmas: capital,A/N/N (capital). We may consider A/N/N an ambiguity class. There are 137 ambiguity classes. Each ambiguity class presents a certain level of ambiguity (Amb) that corresponds to the number of lexical entries associated to each ambiguous form (again, for class A/N/N Amb=3). Based on this information it is possible to map how ambiguity affects the lexicon. Using the frequency information associated to the list of tokens of a large corpus (the CETEMPÚBLICO corpus, with 200 million words), it is possible to calculate how ambiguity affects real texts. Combining the two types of information, it is possible to devise and evaluate different strategies to reduce lexical ambiguity.
Description
Keywords
Processamento Computacional de Linguagem Natural Línguística de corpora
Citation
Baptista, Jorge; Faísca, Luís. Mapping, filtering and measuring impact of ambiguous words in Portuguese, In Formaliser les langues avec l’ordinateur: de INTEX à Nooj, 305-324, ISBN: 978-2-84867-189-5. Besançon: Presses Universitaires de Franche-Comté, 2007.
Publisher
Presses Universitaires de Franche-Comté