Enhancement of Embedded Topic Model

Asma Gul; S. Zafar Ali Shah; Sadaqat Jan

doi:10.57041/ijeet.v2i1.923

Authors

Asma Gul Department of Computer Software Engineering, University of Engineering & Technology Mardan, Pakistan
S. Zafar Ali Shah Department of Computer Software Engineering, University of Engineering & Technology Mardan, Pakistan
Sadaqat Jan Department of Computer Software Engineering, University of Engineering & Technology Mardan, Pakistan

DOI:

https://doi.org/10.57041/ijeet.v2i1.923

Keywords:

Text mining, topic modelling, text summarization, topic diversity, embedded topic modelling

Abstract

This comparative study aims to examine the performance of the Embedded Topic Model (ETM) in producing coherent topics when applied to noun-restricted corpora. As nouns in any dataset are the most informative features so involving them will improve the topic quality. To evaluate this hypothesis, we compare the performance of two topic models: ETM (Embedded topic Model) and LDA (Latent Dirichlet allocation) on three variations of the dataset. The first dataset version is the original pre-processed dataset, while the second version consists of the dataset reduced to noun phrases only, the third version represents the dataset reduced to nouns only. To assess the performance of both models, we employ two widely used measures: Topic Coherence (TC) and Topic Diversity (TD). The experimental results revealed that the embedded topic model outperforms LDA across all the variations of the datasets. Remarkably it exhibits exceptional performance for the dataset having only nouns. In addition, the time to train the model is also reduced when the vocabulary is reduced to nouns only. Overall, this paper presents evaluations to show how the Embedded Topic Model significantly improves topic quality, especially in noun-restricted contexts. These findings provide insightful information for researchers and practitioners in the area regarding the possible advantages of using noun-based corpus reduction strategies in topic modeling tasks.

Enhancement of Embedded Topic Model

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue

Make a Submission

Keywords