Enhancement of Embedded Topic Model
DOI:
https://doi.org/10.57041/ijeet.v2i1.923Keywords:
Text mining, topic modelling, text summarization, topic diversity, embedded topic modellingAbstract
This comparative study aims to examine the performance of the Embedded Topic Model (ETM) in producing coherent topics when applied to noun-restricted corpora. As nouns in any dataset are the most informative features so involving them will improve the topic quality. To evaluate this hypothesis, we compare the performance of two topic models: ETM (Embedded topic Model) and LDA (Latent Dirichlet allocation) on three variations of the dataset. The first dataset version is the original pre-processed dataset, while the second version consists of the dataset reduced to noun phrases only, the third version represents the dataset reduced to nouns only. To assess the performance of both models, we employ two widely used measures: Topic Coherence (TC) and Topic Diversity (TD). The experimental results revealed that the embedded topic model outperforms LDA across all the variations of the datasets. Remarkably it exhibits exceptional performance for the dataset having only nouns. In addition, the time to train the model is also reduced when the vocabulary is reduced to nouns only. Overall, this paper presents evaluations to show how the Embedded Topic Model significantly improves topic quality, especially in noun-restricted contexts. These findings provide insightful information for researchers and practitioners in the area regarding the possible advantages of using noun-based corpus reduction strategies in topic modeling tasks.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 https://pjosr.com/index.php/ijeet/cr
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.