Please use this identifier to cite or link to this item: http://dspace.univ-mascara.dz:8080/jspui/handle/123456789/1073
Title: Arabic Machine Translation of Social Media Content
Authors: Baligh, BABAALI
Keywords: Neural Machine Translation
Large Language Model
Data augmentation
Low resource language
Issue Date: 1-Oct-2024
Abstract: Machine translation serves as a crucial tool for breaking down language barriers and facilitating communication and information access across diverse linguistic contexts. However, its efficacy heavily relies on the availability of sufficient and high-quality training data, a challenge often encountered in low-resource language settings. In this study, we explore methods to enhance Neural Machine Translation (NMT) systems by employing data augmentation techniques to address the challenges posed by such scenarios. Our experimentation involved various augmentation strategies, including Back Translation, Copied Corpus, and innovative methods like Right Rotation Augmentation, with the aim of enriching training data and improving translation quality. Through rigorous evaluation comparing augmented NMT models with the baseline, we observed significant enhancements in translation quality, as evidenced by improved BLEU scores. Our analysis underscores the effectiveness of different augmentation techniques in bolstering NMT systems, especially in low-resource language contexts. Furthermore, our comparative analysis between Seq2Seq NMT models and GPT-based models sheds light on their architectural intricacies and performance characteristics. Evaluating their performance across diverse translation tasks, we found that the ChatGPT model consistently outperformed the Seq2Seq model, exhibiting higher COMET, BLEU, and ChrF scores. Notably, the ChatGPT model demonstrated superior performance in translating from the Algerian Arabic dialect (DZDA) to Modern Standard Arabic (MSA). Moreover, transitioning from zero-shot to few-shot scenarios led to enhanced translation performance f or ChatGPT models across both language pairs. These findings contribute to a deeper understanding of the interplay between Seq2Seq and GPT-based models in machine translation, offering valuable insights for future advancements in the field.
URI: http://dspace.univ-mascara.dz:8080/jspui/handle/123456789/1073
Appears in Collections:Thèse de Doctorat

Files in This Item:
File Description SizeFormat 
Thesis.pdf3,18 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.