Topic Modelling for Urdu Articles Using Unsupervised Learning Approaches
DOI:
https://doi.org/10.57041/vol4iss01pp75-82Keywords:
Natural Language Processing (NLP), Local Dirichlet Allocation, Urdu Latent Dirichlet Allocation, PredictionAbstract
Topic modelling is a commonly used text-mining tool for discovering hidden semantic structures within a text corpus. This paper introduces an unsupervised learning-based topic modelling approach for Urdu documents, a language with limited resources. Specific and accurate topics are extracted from Urdu texts using unsupervised learning techniques such as Latent Dirichlet Allocation (LDA) and Unsupervised Latent Semantic Indexing (ULSI). The experimental results illustrate our recommended ULSI and LDA models' dominance, achieving 99% and 98% accuracy and 44% and 37% coherence values in LDA and ULSI, respectively. The experimental results demonstrate the superiority of the proposed ULSI and LDA models, which achieve high accuracy and coherence values.Downloads
Published
2024-08-21
How to Cite
Ashir, M., Saeed, A., Ullah, M. F., Ali, S. N., Sauood, M., Anwar, M., Hussain, N., & Ali, S. (2024). Topic Modelling for Urdu Articles Using Unsupervised Learning Approaches. Pakistan Journal of Scientific Research, 4(01), 75–82. https://doi.org/10.57041/vol4iss01pp75-82
Issue
Section
Articles
License
Copyright (c) 2024 https://pjosr.com/index.php/pjosr/cr

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.