Percorrer por autor "Batista, Fernando"
A mostrar 1 - 4 de 4
Resultados por página
Opções de ordenação
- Early experiments on automatic annotation of Portuguese medieval textsPublication . Bico, Maria Inês; Baptista, Jorge; Batista, Fernando; Cardeira, EsperançaThis paper presents the challenges and solutions adopted to the lemmatization and part-of-speech (PoS) tagging of a corpus of Old Portuguese texts (up to 1525), to pave the way to the implementation of an automatic annotation of these Medieval texts. A highly granular tagset, previously devised for Modern Portuguese, was adapted to this end. A large text (similar to 155 thousand words) was manually annotated for PoS and lemmata and used to train an initial PoS-tagger model. When applied to two other texts, the resulting model attained 91.2% precision with a textual variant of the same text, and 67.4% with a new, unseen text. A second model was then trained with the data provided by the previous three texts and applied to two other unseen texts. The new model achieved a precision of 77.3% and 82.4%, respectively.
- Evidências do português médio no corpus de textos antigosPublication . Bico, Maria Inês; Cardeira, Esperança; Baptista, Jorge; Batista, FernandoA partir de um conjunto de dados semi-automaticamente anotados do Corpus de Textos Antigos (CTA), este artigo propõe-se a analisar os resultados obtidos sobre a síncope de -d- intervocálico no morfema da 2.ª pessoa plural, e a consequente resolução do hiato, e as terminações de Particípio Passado -udo/-ido nos verbos com origem etimológica nas 2.ª e 3.ª conjugações latinas. A novidade deste artigo está no recurso a métodos de Processamento de Linguagem Natural (PLN) para a otimização da obtenção e extração sistemática dos dados relevantes para análise, contribuindo para um estudo que engloba um maior conjunto de textos. É apresentada a metodologia adotada para a anotação dos dados, e consequente extração dos dados relevantes à análise, afirmando-se a importância do recurso a métodos e ferramentas de PLN para o estudo linguístico e para a descrição dos estados anteriores da língua portuguesa.
- Examining Airbnb guest satisfaction tendencies: a text mining approachPublication . Cavique, Mariana; Ribeiro, Ricardo; Batista, Fernando; Correia, AntóniaGiven Airbnb's changes since its inception and the dynamism of customer preferences, a study that sheds light on how customer satisfaction is evolving is relevant. An automated method is proposed for identifying these satisfaction tendencies at a large scale. This study follows a text mining approach to analyse 590,070 reviews posted between 2010 and 2019 on the Airbnb platform in Lisbon. Topic Modelling is employed in order to identify the main topics discussed in the reviews, and Sentiment Analysis to understand the topics that compose guest's satisfaction in the context of Airbnb services. Three major topics are extracted from Airbnb reviews: 'host's service', 'physical aspects', and 'location'. Although a positivity bias in guest reviews is confirmed, the satisfaction level seems to be decreasing over the years. The results also reveal that 'physical aspects' is the predominant topic when considering the negative guest reviews. This research considers big data the base to create knowledge, data spanning over the years, offering consistency to the research.
- From host's descriptions to guests' reviews: semantic similaritiesPublication . Cavique, Mariana; Ribeiro, Ricardo; Batista, Fernando; Correia, AntóniaThis study investigates the semantic alignment between Airbnb property descriptions and guest reviews. Word2Vec embeddings and affinity propagation clustering are used to identify granular semantic concepts, enabling a detailed comparison of the two text types. A new metric, concept coverage ratio, is introduced to measure the extent to which the guest review content is reflected in property descriptions. Results show that a higher concept coverage ratio is generally associated with more positive sentiment in reviews, suggesting that better alignment between host and guest perspectives contributes to guest satisfaction. However, longer and detailed descriptions may limit the potential for pleasantly surprising guests, as it reduces the chance for positive disconfirmation. These findings offer practical insights for improving communication in peer-to-peer accommodation.
