نحو تحسين أداء نموذج التميمي للوسم النحوي الآلي للغة العربية

Authors

  • أفراح عبد العزيز حمد التميمي Imam Mohammad Ibn Saud Islamic University

DOI:

https://doi.org/10.34874/PRSM.ijal-vol7.33245

Keywords:

POS tagging, initial model, tagset, train corpus, test corpus, retraining model, algorithm

Abstract

One of the most important tasks in natural language processing is POS tagging. There are many works, projects and efforts made in this field, but they are not up to the required linguistic level due to many problems that arise at the level of implementation or performance. This paper aims to improve the performance of the initial CRF model of the POS tagging with the basic Tamimi tags based on a hand-tagged corpus with twelve grammatical tags (noun - adjective - verb - pronoun - adverb - interjection - practical - punctuation - abbreviation - foreign word - symbol - number), by the model retraining method and adding 11,,66 tokens throughout the Arabic eras, its regions, and domains. The performance has developed as the initial model achieved an accuracy score of 91.58%, and the enhanced model achieved a score of 93.87%.

Downloads

Published

26-06-2022

Issue

Section

Book Reviews