Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

Overview

This repository provides a framework for automated evaluation of dynamically evolving topic taxonomies in scientific literature using Large Language Models (LLMs). The framework addresses challenges in evaluating topic models, such as static metrics and reliance on expert annotators, by leveraging LLMs to assess key quality dimensions like coherence, diversity, repetitiveness, and topic-document alignment.

By integrating multiple topic modeling techniques and an LLM-based evaluation pipeline, this approach ensures robust, scalable, and interpretable results for a variety of datasets and research needs.

Key Features

Modular Topic Modeling: Includes support for multiple topic modeling techniques such as LDA, ProdLDA, CombinedTM, and BERTopic.
LLM-Based Evaluation: Offers scalable and dynamic evaluations of topic models using tailored LLM prompts.
Customizable Pipelines: Allows for parameter tuning, evaluation metric customization, and integration with new datasets.
Tutorial Jupyter Notebook: Demonstrates how to preprocess data, run topic models, and interpret the results.

Usage

1. Topic Modeling

The src/topic_models/ directory contains scripts for different topic modeling techniques:

lda.py: Latent Dirichlet Allocation (LDA)
prodlda.py: Product of Experts LDA (ProdLDA)
combinedtm.py: Contextualized Topic Model (CombinedTM)
bertopic.py: BERTopic

You can find detailed instructions in the Jupyter notebooks located in run_topic_modeling/, which walks through the entire topic modeling process, from data preparation to extracting and interpreting topic distributions.

2. LLM-Based Evaluation

The scripts for LLM-based evaluation are located in src/llm/:

llm_judgment.py: Core evaluation logic.
prompt_templates.py: Predefined prompt templates for LLMs.
llm_model.py: Helper functions to interact with LLMs.

You can run run_llm.py to run the LLMs.

Publications

This repository is associated with the following research papers:

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation
Accepted at IRCDL 2025.
Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models
Accepted at International Journal on Digital Libraries (IJDL), forthcoming.

BibTeX

@inproceedings{bridging2025ircdl,
  title     = {Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation},
  author    = {Zhiyin Tan, Jennifer D'Souza},
  booktitle = {Proceedings of the 21st Conference on Information and Research Science Connecting to Digital and Library Science (IRCDL 2025)},
  year      = {2025},
  url       = {https://ceur-ws.org/Vol-3937/paper15.pdf}
}

@misc{purpose2025ijdl,
  title        = {Toward Purpose-oriented Topic Model Evaluation enabled by Large Language Models},
  author       = {Zhiyin Tan, Jennifer D'Souza},
  year         = {2025},
  eprint       = {2509.07142},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  note         = {Accepted at International Journal on Digital Libraries (IJDL), forthcoming},
  url          = {https://arxiv.org/abs/2509.07142}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
presentation		presentation
run_topic_modeling		run_topic_modeling
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_adversial_test.py		run_adversial_test.py
run_llm.py		run_llm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

Overview

Key Features

Usage

1. Topic Modeling

2. LLM-Based Evaluation

Publications

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

Overview

Key Features

Usage

1. Topic Modeling

2. LLM-Based Evaluation

Publications

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages