Arabic Machine Translation of Social Media Content

Baligh, BABAALI

Please use this identifier to cite or link to this item: http://dspace.univ-mascara.dz:8080/jspui/handle/123456789/1073

Full metadata record

DC Field	Value	Language
dc.contributor.author	Baligh, BABAALI	-
dc.date.accessioned	2024-10-01T12:08:45Z	-
dc.date.available	2024-10-01T12:08:45Z	-
dc.date.issued	2024-10-01	-
dc.identifier.uri	http://dspace.univ-mascara.dz:8080/jspui/handle/123456789/1073	-
dc.description.abstract	Machine translation serves as a crucial tool for breaking down language barriers and facilitating communication and information access across diverse linguistic contexts. However, its efficacy heavily relies on the availability of sufficient and high-quality training data, a challenge often encountered in low-resource language settings. In this study, we explore methods to enhance Neural Machine Translation (NMT) systems by employing data augmentation techniques to address the challenges posed by such scenarios. Our experimentation involved various augmentation strategies, including Back Translation, Copied Corpus, and innovative methods like Right Rotation Augmentation, with the aim of enriching training data and improving translation quality. Through rigorous evaluation comparing augmented NMT models with the baseline, we observed significant enhancements in translation quality, as evidenced by improved BLEU scores. Our analysis underscores the effectiveness of different augmentation techniques in bolstering NMT systems, especially in low-resource language contexts. Furthermore, our comparative analysis between Seq2Seq NMT models and GPT-based models sheds light on their architectural intricacies and performance characteristics. Evaluating their performance across diverse translation tasks, we found that the ChatGPT model consistently outperformed the Seq2Seq model, exhibiting higher COMET, BLEU, and ChrF scores. Notably, the ChatGPT model demonstrated superior performance in translating from the Algerian Arabic dialect (DZDA) to Modern Standard Arabic (MSA). Moreover, transitioning from zero-shot to few-shot scenarios led to enhanced translation performance f or ChatGPT models across both language pairs. These findings contribute to a deeper understanding of the interplay between Seq2Seq and GPT-based models in machine translation, offering valuable insights for future advancements in the field.	en_US
dc.subject	Neural Machine Translation	en_US
dc.subject	Large Language Model	en_US
dc.subject	Data augmentation	en_US
dc.subject	Low resource language	en_US
dc.title	Arabic Machine Translation of Social Media Content	en_US
dc.type	Thesis	en_US
Appears in Collections:	Thèse de Doctorat

Files in This Item:

File	Description	Size	Format
Thesis.pdf		3,18 MB	Adobe PDF	View/Open

Show simple item record