TOPIC MODELING FOR URDU ARTICLES USING UNSUPERVISED LEARNING APPROACHES

Authors

  • Muhammad Ashir
  • Ali Saeed UCP, Lahore
  • Muhammad Farhat Ullah
  • Syed Nasir Ali
  • Muhammad Sauood
  • Mehmood Anwar
  • Naveed Hussain

DOI:

https://doi.org/10.57041/pjosr.v4i1.1138

Keywords:

Natural Language Processing (NLP), machine learning, l

Abstract

Topic modeling is a commonly used text-mining tool for discovering hidden semantic structures within a text corpus. This paper introduces an unsupervised learning-based topic modeling approach for Urdu documents, a language with limited resources. By leveraging unsupervised learning techniques such as Latent Dirichlet Allocation (LDA) and Unsupervised Latent Semantic Indexing (ULSI), specific and accurate topics are extracted from Urdu texts. The experimental results illustrate the dominance of our recommended ULSI model and LDA model, achieving 99% and 98% accuracy, and 44% and 37% coherence values in LDA and ULSI, respectively. The experimental results demonstrate the superiority of the proposed ULSI and LDA models, which achieve high accuracy and coherence values.

Downloads

Published

2024-08-21

How to Cite

Ashir, M. ., Saeed, A., Ullah, M. F., Ali, S. N., Sauood, M., Anwar, M., & Hussain, N. (2024). TOPIC MODELING FOR URDU ARTICLES USING UNSUPERVISED LEARNING APPROACHES. Pakistan Journal of Scientific Research, 4(1), 75–82. https://doi.org/10.57041/pjosr.v4i1.1138

Most read articles by the same author(s)