Browsing by Author "Ostovar, Vahid"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Persian idioms: collection and identification in textsPublication . Ostovar, Vahid; Baptista, JorgeAn idiom is a string of words whose meaning is different from the meaning conveyed by its individual words. This project will study Persian idioms with the structure N0 C1 V, which are sentences with a free subject (N0), a frozen direct object (C1) and a verb (V). The purpose of this project is to build a database of Persian idioms in order to be used in the computational processing of this language. First, a selection of web sources was used for the collection of idioms; second, from this a database of Persian idioms was built; third, a set of finite-state tools was used to intersect the database with reference graphs and build FSTs (transducers) for corpus exploring; forth, these FSTs were then used to extract idiom candidates from a large subtitles Persian corpus; fifth, the resulting candidates lists was evaluated in order to: (a) estimate the scope of the database; (b) determine the precision of the task of identifying the idioms, using the finite-state tools; and (c) compare it with two association measures (t-test and chi-square). Results show chi-square to be an efficient association measures to retrieve idioms candidates; however, the finite state tolls allow for a better precision. Attention should also give to the idioms´ main verb; namely, full verbs tend to yield more precise result than more grammaticalized verbs such as support verbs. The database, in its current state, contains 364 verbal idioms form a single formal class.
