TOPIC MODELING FOR URDU ARTICLES USING UNSUPERVISED LEARNING APPROACHES
DOI:
https://doi.org/10.57041/pjosr.v4i1.1138Keywords:
Natural Language Processing (NLP), machine learning, lAbstract
Topic modeling is a commonly used text-mining tool for discovering hidden semantic structures within a text corpus. This paper introduces an unsupervised learning-based topic modeling approach for Urdu documents, a language with limited resources. By leveraging unsupervised learning techniques such as Latent Dirichlet Allocation (LDA) and Unsupervised Latent Semantic Indexing (ULSI), specific and accurate topics are extracted from Urdu texts. The experimental results illustrate the dominance of our recommended ULSI model and LDA model, achieving 99% and 98% accuracy, and 44% and 37% coherence values in LDA and ULSI, respectively. The experimental results demonstrate the superiority of the proposed ULSI and LDA models, which achieve high accuracy and coherence values.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 https://pjosr.com/index.php/pjosr/cr
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.