Repository logo
 
Loading...
Project Logo
Research Project

Untitled

Authors

Publications

ProPAM/Static: A static view of a methodology for process and project alignment
Publication . Martins, Paula Ventura; Silva, Alberto Rodrigues da
Process descriptions represent high-level plans and do not contain information necessary for concrete software projects. Processes that are unrelated to daily practices or hardly mapped to project practices, cause misalignments between processes and projects. We argue that software processes should emerge and evolve collaboratively within an organization. In this chapter we present a Process and Project Alignment Methodology for agile software process improvement and particularly describe its static view.
Assisting European Portuguese teaching: linguistic features extraction and automatic readability classifier
Publication . Curto, Pedro; Mamede, Nuno; Baptista, Jorge
This paper describes two automatic systems: a linguistic features extractor and a text readability classifier for European Portuguese texts. Its main goal is to assist the selection of adequate reading materials to support Portuguese teaching, especially as a second language. To the feature extraction from texts, the system uses several Natural Language Processing (NLP) tools. Currently, 52 features are extracted: parts-of-speech (POS), syllables, words, chunks and phrases, averages and frequencies, among others. A classifier was created using these features and a corpus, previously annotated readability level, adopting the five-levels language classification official standard for Portuguese as Second Language. In a five-levels (from A1 to C1) scenario, the best-performing learning algorithm (LogitBoost) achieved an accuracy of 75.11% with a root mean square error (RMSE) of 0.269. In a three-levels (A, B and C) scenario, the best-performing learning algorithm (C4.5 grafted) achieved 81.44% accuracy, with a RMSE of 0.346.
Parafraseamento automático de registo informal em registo formal na Língua Portuguesa
Publication . Barreiro, Anabela Marques; Rebelo-Arnold, Ida; Baptista, Jorge; Mota, Cristina; Garcez, Isabel
Este artigo apresenta o processo de automatização de parafraseamento em português e conversão de construções típicas do registo informal ou da linguagem falada em construções de registo formal usadas na linguagem escrita. Ilustraremos o processo de automatização com exemplos extraídos do corpus e-PACT, que envolvem a colocação normalizada de pronomes clíticos quando co-ocorrem com compostos verbais. A tarefa consiste em parafrasear e normalizar, entre outras, construções como vou-lhe/posso-lhe fazer uma surpresa em vou/posso fazer-lhe uma surpresa. Este artigo apresenta o processo de automatização de parafraseamento em português e conversão de construções típicas do registo informal ou da linguagem falada em construções de registo formal usadas na linguagem escrita. Ilustraremos o processo de automatização com exemplos extraídos do corpus e-PACT,que envolvem a colocação normalizada de pronomesclíticos quando co-ocorrem com compostos verbais.A tarefa consiste em parafrasear e normalizar, en-tre outras, constru ̧c ̃oes comovou-lhe/posso-lhefazeruma surpresaemvou/posso fazer-lheuma surpresa,em que o pronome cl ́ıticolhemigra de uma posi ̧c ̃aoencl ́ıtica imediatamente a seguir ao primeiro verbo docomposto verbal para uma posi ̧c ̃ao encl ́ıtica a seguirao verbo principal, que ́e o verbo respons ́avel pelasele ̧c ̃ao do argumento pronominal. O primeiro verbo ́e um verbo auxiliar ou um verbo volitivo, e.g.,querer.Este ́e um procedimento padronizado no processo de revisão em português europeu. Casos como este representam fenómenos linguísticos em que os estudantes de língua portuguesa e falantes em geral se confundem ou onde “tropeçam”. O artigo enfatiza a língua padrão em que os fenómenos observados ocorrem,descreve exemplos de interesse encontrados no cor-pus e apresenta uma solução automática, baseada na aplica ̧c ̃ao de gramáticas transformacionais genéricas,que facilitam a normalização de inadequações ou falhas sintáticas (registos informais) encontradas nas construções pesquisadas em construções padronizadas típicas da escrita formal ou escrita profissional.
Automatic generation of exercises on passive transformation in portuguese
Publication . Baptista, Jorge; Lourenço, Sandra; Mamede, Nuno J.
Technology plays a very important role in education and Intelligent Computer-Assisted Language Learning (iCALL) has emerged as a complementary or even alternative method to the conventional language teaching practices. The automatic generation (and correction) of language exercises based on real texts extracted from corpora constitutes a non-trivial challenge to iCALL tutorial systems, and may involve the use of sophisticated Natural Language Processing tools and large-scale linguistic resources. This paper presents the main issues related to the automatic generation of exercises on the Passive transformation, a commonly occurring type of exercises in language textbooks, but also a very complex topic of Portuguese grammar. The paper describes the methods used to produce a large batch of passive-active sentence pairs, where the active sentence was automatically generated from naturally occurring passive sentences, taken from a large-sized, publicly available, corpus. Sentence pairs are ranked by difficulty level. A sample of randomly selected sentence pairs (40 from difficult level, 100 from medium, and 100 from easy level) was manually evaluated by an expert. Results are presented and error analysis is performed. The sentence pairs can be used as prime and correct answer for iCALL systems.
Authorship attribution in portuguese using character N-grams
Publication . Markov, Ilia; Baptista, Jorge; Pichardo-Lagunas, Obdulia
For the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experiments with different feature representations and machine-learning algorithms. Moreover, the paper demonstrates that the performance of the character n-gram approach can be improved by fine-tuning the feature set and by appropriately selecting the length and type of character n-grams. This relatively simple and language-independent approach to the AA task outperforms both a bag-of-words baseline and other approaches, using the same corpus.

Organizational Units

Description

Keywords

Contributors

Funders

Funding agency

Fundação para a Ciência e a Tecnologia

Funding programme

5876

Funding Award Number

UID/CEC/50021/2013

ID