Enhancement of Embedded Topic Model

Authors

  • Asma Gul Department of Computer Software Engineering, University of Engineering & Technology Mardan, Pakistan
  • S. Zafar Ali Shah Department of Computer Software Engineering, University of Engineering & Technology Mardan, Pakistan
  • Sadaqat Jan Department of Computer Software Engineering, University of Engineering & Technology Mardan, Pakistan

DOI:

https://doi.org/10.57041/ijeet.v2i1.923

Keywords:

Text mining, topic modelling, text summarization, topic diversity, embedded topic modelling

Abstract

This comparative study aims to examine the performance of the Embedded Topic Model (ETM) in producing coherent topics when applied to noun-restricted corpora.  As nouns in any dataset are the most informative features so involving them will improve the topic quality. To evaluate this hypothesis, we compare the performance of two topic models: ETM (Embedded topic Model) and LDA (Latent Dirichlet allocation) on three variations of the dataset. The first dataset version is the original pre-processed dataset, while the second version consists of the dataset reduced to noun phrases only, the third version represents the dataset reduced to nouns only. To assess the performance of both models, we employ two widely used measures: Topic Coherence (TC) and Topic Diversity (TD).  The experimental results revealed that the embedded topic model outperforms LDA across all the variations of the datasets.  Remarkably it exhibits exceptional performance for the dataset having only nouns. In addition, the time to train the model is also reduced when the vocabulary is reduced to nouns only. Overall, this paper presents evaluations to show how the Embedded Topic Model significantly improves topic quality, especially in noun-restricted contexts. These findings provide insightful information for researchers and practitioners in the area regarding the possible advantages of using noun-based corpus reduction strategies in topic modeling tasks.

Downloads

Published

2023-07-06

How to Cite

Gul, A., Shah, S. Z. A., & Jan, S. (2023). Enhancement of Embedded Topic Model . International Journal of Emerging Engineering and Technology, 2(1), 46–51. https://doi.org/10.57041/ijeet.v2i1.923