My PhD research is related to Arabic Natural Language Processing. The overall aim of this research is to study the performance of compression Prediction by Partial Matching (PPM) against machine learning (ML) algorithms and deep learning (DL) algorithms when auto-classify Arabic text specifically the text from Hadith-related websites and social media platforms.
The specific objectives of the research are as follows:
- create a corpus of Arabic Hadiths using Hadith websites. The corpus will be annotated to determine different features of the Hadiths, such as the Isnad, Matan, topics and authenticity based on Hadith expert sources and to provide a ground truth. This will be achieved by paid annotators;
- create corpus of Arabic code-switching texts by using samples obtained from online sources such as Facebook. annotate these corpora to determine the occurrence of code-switching and provide a ground truth;
- create new corpora containing samples of text from different Arabic dialects such as Egyptian and Saudi to train Egyptian and Saudi models;
- adapt the PPM compression-based approach in different applications (using the corpora that were created for previous Objectives) such as
- detect code-switching in varieties and dialects,
- Hadith components categorisation,
- Hadith segmentation of Isnad and Matan,
- Hadith authenticity classification.
- compare the compression-based approach, traditional machine learning classifiers and the deep learning classifier to the automatic classification of Arabic Hadiths;
- compare the execution time of these methods to discover the faster and slower method of classifying Arabic Hadith.