Click here to flash read.
While resources for English language are fairly sufficient to understand
content on social media, similar resources in Arabic are still immature. The
main reason that the resources in Arabic are insufficient is that Arabic has
many dialects in addition to the standard version (MSA). Arabs do not use MSA
in their daily communications; rather, they use dialectal versions.
Unfortunately, social users transfer this phenomenon into their use of social
media platforms, which in turn has raised an urgent need for building suitable
AI models for language-dependent applications. Existing machine translation
(MT) systems designed for MSA fail to work well with Arabic dialects. In light
of this, it is necessary to adapt to the informal nature of communication on
social networks by developing MT systems that can effectively handle the
various dialects of Arabic. Unlike for MSA that shows advanced progress in MT
systems, little effort has been exerted to utilize Arabic dialects for MT
systems. While few attempts have been made to build translation datasets for
dialectal Arabic, they are domain dependent and are not OSN cultural-language
friendly. In this work, we attempt to alleviate these limitations by proposing
an online social network-based multidialect Arabic dataset that is crafted by
contextually translating English tweets into four Arabic dialects: Gulf,
Yemeni, Iraqi, and Levantine. To perform the translation, we followed our
proposed guideline framework for content translation, which could be
universally applicable for translation between foreign languages and local
dialects. We validated the authenticity of our proposed dataset by developing
neural MT models for four Arabic dialects. Our results have shown a superior
performance of our NMT models trained using our dataset. We believe that our
dataset can reliably serve as an Arabic multidialectal translation dataset for
informal MT tasks.
No creative common's license