Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1.03 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
This paper relates data on lexical availability with data on textual frequency
of proverbs in European Portuguese. Each data source should provide
different perspectives on the use of proverbs in the language. This should allow
an empirically well-motivated selection of proverbs aiming at the development
of NLP resources, specifically for applications for learning Portuguese as a Foreign
Language and for the diagnosis/therapy of speech impairments/disabilities.
A large database (over 114,000 proverbs and their variants) was independently
classified by two annotators, according to intuitively estimated lexical availability.
Next, a random, stratified sample was selected and lexical availability was
then confirmed with an online survey. Frequency data was gathered from two
web browsers and a large-sized, publicly available, corpus of journalistic texts.
Results from the survey, the web and the corpus by and large confirm the initial
intuitive classification and a core of commonly used proverbs was defined
Description
Keywords
European Portuguese Proverbs Frequency in corpus Lexical availability
Citation
Publisher
Springer Publishing Company