Loading...
3 results
Search Results
Now showing 1 - 3 of 3
- Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approachPublication . Barreiro, Anabela; Mota, Cristina; Baptista, Jorge; Chacoto, Lucília; Carvalho, PaulaThis paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The resource components include: (i) a lexicon-grammar based dictionary of 2100 predicate nouns co-occurring with the support verb ser de 'be of', such as in ser de uma ajuda inestimavel 'be of invaluable help'; (ii) a lexicon-grammar based dictionary of 6000 predicate nouns co-occurring with the support verb fazer 'do' or 'make', such as in fazer uma comparacao 'make a comparison'; and (iii) a lexicon-grammar based dictionary of about 5000 human intransitive adjectives co-occurring with the copula verbs ser and/or estar 'be', such as in ser simpatico 'be kind' or estar entusiasmado 'be enthusiastic'. A set of local grammars explore the properties described in linguistic resources, enabling a variety of text transformation tasks for paraphrasing applications. The paper highlights the different complementary and synergistic components and integration efforts, and presents some preliminary evaluation results on the inclusion of such resources in the eSPERTo paraphrase generation system.
- Authorship attribution in portuguese using character N-gramsPublication . Markov, Ilia; Baptista, Jorge; Pichardo-Lagunas, ObduliaFor the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experiments with different feature representations and machine-learning algorithms. Moreover, the paper demonstrates that the performance of the character n-gram approach can be improved by fine-tuning the feature set and by appropriately selecting the length and type of character n-grams. This relatively simple and language-independent approach to the AA task outperforms both a bag-of-words baseline and other approaches, using the same corpus.
- Syntax Deep ExplorerPublication . Correia, José; Baptista, Jorge; Mamede, NunoThe analysis of the co-occurrence patterns between words allows for a better understanding of the use (and meaning) of words and its most straightforward applications are lexicography and linguist description in general. Some tools already produce co-occurrence information about words taken from Portuguese corpora, but few can use lemmata or syntactic dependency information. Syntax Deep Explorer is a new tool that uses several association measures to quantify several co-occurrence types, defined on the syntactic dependencies (e.g. subject, complement, modifier) between a target word lemma and its co-locates. The resulting co-occurrence statistics is represented in lex-grams, that is, a synopsis of the syntactically-based co-occurrence patterns of a word distribution within a given corpus. These lex-grams are obtained from a large-sized Portuguese corpus processed by STRING [19] and are presented in a user-friendly way through a graphical interface. The Syntax Deep Explorer will allow the development of finer lexical resources and the improvement of STRING processing in general, as well as providing public access to co-occurrence information derived from parsed corpora.