This repository provides a framework for automated evaluation of dynamically evolving topic taxonomies in scientific literature using Large Language Models (LLMs). The framework addresses challenges in evaluating topic models, such as static metrics and reliance on expert annotators, by leveraging LLMs to assess key quality dimensions like coherence, diversity, repetitiveness, and topic-document alignment.
By integrating multiple topic modeling techniques and an LLM-based evaluation pipeline, this approach ensures robust, scalable, and interpretable results for a variety of datasets and research needs.
- Modular Topic Modeling: Includes support for multiple topic modeling techniques such as LDA, ProdLDA, CombinedTM, and BERTopic.
- LLM-Based Evaluation: Offers scalable and dynamic evaluations of topic models using tailored LLM prompts.
- Customizable Pipelines: Allows for parameter tuning, evaluation metric customization, and integration with new datasets.
- Tutorial Jupyter Notebook: Demonstrates how to preprocess data, run topic models, and interpret the results.
The src/topic_models/ directory contains scripts for different topic modeling techniques:
lda.py: Latent Dirichlet Allocation (LDA)prodlda.py: Product of Experts LDA (ProdLDA)combinedtm.py: Contextualized Topic Model (CombinedTM)bertopic.py: BERTopic
You can find detailed instructions in the Jupyter notebooks located in run_topic_modeling/, which walks through the entire topic modeling process, from data preparation to extracting and interpreting topic distributions.
The scripts for LLM-based evaluation are located in src/llm/:
llm_judgment.py: Core evaluation logic.prompt_templates.py: Predefined prompt templates for LLMs.llm_model.py: Helper functions to interact with LLMs.
You can run run_llm.py to run the LLMs.
This repository is associated with the following research papers:
-
Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation
Accepted at IRCDL 2025. -
Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models
Accepted at International Journal on Digital Libraries (IJDL), forthcoming.
@inproceedings{bridging2025ircdl,
title = {Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation},
author = {Zhiyin Tan, Jennifer D'Souza},
booktitle = {Proceedings of the 21st Conference on Information and Research Science Connecting to Digital and Library Science (IRCDL 2025)},
year = {2025},
url = {https://ceur-ws.org/Vol-3937/paper15.pdf}
}
@misc{purpose2025ijdl,
title = {Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models},
author = {Zhiyin Tan, Jennifer D'Souza},
year = {2025},
eprint = {2509.07142},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
note = {Accepted at International Journal on Digital Libraries (IJDL), forthcoming},
url = {https://arxiv.org/abs/2509.07142}
}