For instance...
For instance, a word in Arabic consisting of three consonants like ( كتب ktb) “to write” canhave many interpretations with the presence of diacritics[^9] such as shown in Table 2. For Arabic language speakers, the only way to disambiguate the diacritic-less words isto locate them within the context. Analysis of 23,000 Arabic scripts showed that there is anaverage of 11.6 possible ways to assign diacritics for every diacritic-less word[^10] .
Table 2 Different interpretation of the Arabic word كتب (ktb) in the presence of diacritics In addition to the above description of complex morphology, Arabic has very complex syntaxes, linguistic and grammatical rules. It is clear that Arabic language has a very different and difficult structure than other languages. These differences makeit hard for language processing techniques made for other language to directly apply to Arabic.
So the objective of this project is to develop an Arabic Text-Document Classifier (ATC), and study the different techniques and parameters that may affect the performance of this classifier. In this project we will build ATC model and discus its implementation problems and decisions. Also we will use multiple classification techniques, namely support vector machine, Naive Bayes, k-Nearest Neighbors, and decision trees.
We will compare between them from the accuracy and processing time perspectives. This project can be then used as a base for other students who want to continue in this field to make deeper studies. Previous…