The primary religious text of Islam is the Quran while Sunnah otherwise known as Hadith is the second source which refers to any action, saying order or silence approval of the holy prophet Mohammad, and has been delivered through a chain of narrators. Each Hadith has an Isnad, the chain of narrators, and a Matan, the act of prophet Mohammad. While most ordinances of Islam are mentioned in the Quran in general terms, detailed and vivid explanations are often provided in the Hadith. For example, prayer ‘الصلاة’ is mentioned in the Quran, while Hadith specifies what Muslim should do and say; Hadith explains the time for each prayer and what Muslim should do before starting the prayer as well as after finishing the prayer. In contrast to the Quran, some hadiths, handed down over centuries, have been corrupted by narrators who are not confident in transferring them.
The objectives of this research are firstly to create an Arabic Hadith corpus, including authentic and non-authentic Hadith. Secondly, to develop classifiers: automatic approaches to classify authentic and non-authentic Arabic Hadith. Thirdly, to develop a classifier of Isnad and Matan, two main parts of each Hadith. Fourthly, classify Arabic Hadith by their topics.
We will compare traditional machine learning classifier such as SVM and the compression-based approach, to find the best automatic classifier of Arabic Hadith.