A Hybrid Model for Autonomous Danish-Arabic Statistical Machine Translation
Mossab AL Hunaity, Department of Computer Information Systems, Ajman University, Ajma, United Arab Emirates.
Manuscript received on June 15, 2014. | Revised Manuscript received on June 19, 2014. | Manuscript published on June 25, 2014. | PP:24-30 | Volume-2 Issue-8, June 2014. | Retrieval Number: J11760541017
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: We present a simple and efficient method for enhancing the Danish-Arabic (DA-AR) statistical machine translation system. The model mainly is composed of two major parts, information retrieval unit and SMT system. We train our baseline with small DA-AR corpora. We use the Arabic translation output as a query to Lemur information retrieval tool to search for a similar matching sentence in a very larger Arabic corpus. We use Translation Error Rate (TER) filter to select the best output of the IR system. We evaluate our approach and prove that it enhances the quality of translation. We extend our experiments to measure the effect of adding more language resources to our baseline. We mine available DA-EN and EN-AR resources to produce parallel DA-AR sentences. We use the new resources in training our baseline. We evaluate the quality of the extracted data by showing that it significantly improves the performance of our baseline performance.
Keywords: (DA-AR), (TER), Danish-Arabic , DA-EN and EN AR, baseline performance