Error detection for post-editing rule-based machine translation

Valotkaite, Justina

http://hdl.handle.net/10400.1/11064

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
Justina.Thesis.pdf		2.05 MB	Adobe PDF	Download

Send Feedback

Authors

Valotkaite, Justina

Advisor(s)

Specia, Lucia

Orasan, Constantin

Baptista, Jorge

Abstract(s)

The increasing role of Post-editing (PE) as a way of improving Machine Translation (MT) output and a faster alternative to translating from scratch among translators has lately attracted researchers’ attention. A number of recent studies have proposed various attempts to facilitate this task, especially for the outputs of Statistical Machine Translation (SMT). However, little attention in the field has been given to Rule-based Machine Translation (RBMT). In this dissertation an effort was made to provide support for the PE task through Error Detection (ED). A deep linguistic error analysis was done in a sample of English sentences in two text domains translated from Portuguese by two RBMT systems. The hypothesis is that automatically identifying and highlighting errors in translations can help to perform the PE task faster, make it more efficient and less tedious. As RBMT systems tend to make repetitive, systematic mistakes translators are forced to post-edit the same mistakes which makes their task monotonous and frustrating. In order to solve this problem, a set of 40 contrastive rules was designed tackling various linguistic phenomena on the basis of the translation errors identified in the error analysis. By applying this linguistic approach the project aimed at demonstrating that one can have a rule-based system working on the basis of designed rules which could help to detect and highlight translation errors in the RBMT output. The rules were verified by performing an experimental error analysis on a new data set whose results revealed that their coverage was 98.21%. The implementation results demonstrated a successful performance of the system. In addition, the results of a psycholinguistic experiment performed with human translators confirmed that having highlighted errors is useful as this can help translators perform the postediting task up to 12 seconds per error faster and improve their efficiency by minimizing the number of missed errors.