`comparator`

LLM head-to-head evaluations using a range of different datasets and scorers. Can be used to compare performance between two different models, or the same model at different checkpoints to validate training.

Prerequisites

This program downloads parquet datasets from HuggingFace, using Apache Arrow. Install it on your machine here.

Afterwards, use cmake to build the program:

cmake --build <build-dir> --target comparator -j 8

Usage example

[comparator] INFO - Running scoring task: dataset=MRPC, model=babbage-002, endpoint=https://api.openai.com/v1/completions, config=default, split=train, scoring metric: F1, workers=5
[comparator] INFO - Running scoring task: dataset=MRPC, model=gpt-3.5-turbo-instruct, endpoint=https://api.openai.com/v1/completions, config=default, split=train, scoring metric: F1, workers=5
[comparator] INFO - model gpt-3.5-turbo-instruct got F1 score on dataset MRPC: 0.695652
[comparator] INFO - model babbage-002 got F1 score on dataset MRPC: 0.484848

CLI args

./comparator help
comparator [evaluate/compare] [ARGS]
base args (applies to both 'compare' and 'evaluate'):
--max-tokens -> max tokens for model responses
--num-logprobs -> num logprobs to return from model responses
--workers -> number of workers to use per dataset scoring
--scorer -> metric to score with e.g. accuracy, f1, etc
comparator evaluate args:
--model -> model id for inference
--endpoint -> endpoint uri
--dataset -> hf dataset id to use, e.g. SetFit/mrpc
--config -> hf dataset id config to use (e.g. default, cola, etc)
--split -> hf dataset split to use (e.g. train, test, etc)
comparator compare args:
--model_a -> model id for model A
--endpoint_a -> endpoint uri for model A
--model_b -> model id for model B
--endpoint_b -> endpoint uri for model B
--dataset -> hf dataset id to use, e.g. SetFit/mrpc
--config -> hf dataset id config to use (e.g. default, cola, etc)
--split -> hf dataset split to use (e.g. train, test, etc)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
include		include
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`comparator`

Prerequisites

Usage example

CLI args

About

Uh oh!

Releases

Packages

Languages

sangstar/comparator

Folders and files

Latest commit

History

Repository files navigation

comparator

Prerequisites

Usage example

CLI args

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`comparator`

Packages