نحو تحسين أداء نموذج التميمي للوسم النحوي الآلي للغة العربية
DOI:
https://doi.org/10.34874/PRSM.ijal-vol7.33245Keywords:
POS tagging, initial model, tagset, train corpus, test corpus, retraining model, algorithmAbstract
One of the most important tasks in natural language processing is POS tagging. There are many works, projects and efforts made in this field, but they are not up to the required linguistic level due to many problems that arise at the level of implementation or performance. This paper aims to improve the performance of the initial CRF model of the POS tagging with the basic Tamimi tags based on a hand-tagged corpus with twelve grammatical tags (noun - adjective - verb - pronoun - adverb - interjection - practical - punctuation - abbreviation - foreign word - symbol - number), by the model retraining method and adding 11,,66 tokens throughout the Arabic eras, its regions, and domains. The performance has developed as the initial model achieved an accuracy score of 91.58%, and the enhanced model achieved a score of 93.87%.