Releases: BitMind-AI/gasbench
Release 0.7.3
Removes 12 MAVOS-DD freevc/knnvc subsets from the synthetic video pool pending correct relabeling. These datasets contain real (untampered) video with voice-converted audio only and were previously miscategorized as synthetic video. They will be reintroduced in v0.7.4 under the correct labels (real video / synthetic audio), once this subnet cycle completes.
Release 0.7.2
Increased exam size to 5000 for image, video and audio.
Release 0.7.1
Summary
Two changes shipping together.
Sampling floors stopped mode budgets from meaning anything once the corpus grew: small mode asks for 600 total video samples, but the per-dataset minimums inflated that to ~24,200 across 238 datasets, pushing entrance exams past their 2-hour timeout. The floors now scale down to fit the mode's total budget; full-mode allocations are unchanged.
Version single-sourcing removes the duplicated version string — pyproject.toml is now the only place the version lives.
What changes
Sampling: per-dataset floors now respect the mode's total budget
calculate_weighted_dataset_sampling clamps every dataset's allocation to static floors (REGULAR_DATASET_MIN_SAMPLES=100, GASSTATION_DATASET_MIN_SAMPLES=500). With the video corpus at 238 datasets, these floors silently override the mode targets — the first v18 entrance exam processed 154/238 datasets in 2 hours, timed out, and the model was incorrectly failed and blocked.
The floors now cap at an even share of target_total_samples (target // num_datasets, gasstation-weighted), so a mode's budget holds regardless of corpus size:
| Run | Before | After |
|---|---|---|
| small video, 238 datasets | ~24,200 samples (timeout) | ~486 samples (~20–40 min) |
| full video | 107/dataset, ~25,900 | unchanged |
| full image | 282/dataset, ~55,000 | unchanged |
Full-mode allocations are numerically identical — the floors only ever bound when the budget was being ignored, so production benchmark scores are unaffected. If more per-dataset signal is wanted in exams, BENCHMARK_TOTAL_OVERRIDES["small"] is now an honest dial (e.g. 600 → 2,400 ≈ 10/dataset).
Version: pyproject.toml is the single source
__version__ was hardcoded in both pyproject.toml and gasbench/__init__.py and had to be bumped in lockstep. __init__.py now derives it via importlib.metadata.version("gasbench") (with a 0.0.0+unknown fallback for uninstalled source checkouts). The CLI --version flag is unaffected.
Release 0.7.0
Summary
Two improvements shipping together.
Score composition makes holdout and gasstation data carry a declared share of competition metrics instead of whatever their sample counts happen to contribute. sn34_score previously pooled all samples equally — holdout and gasstation carried roughly 16% / 4% (image), 12.5% / 3% (video), 8% / 0% (audio) of the metric purely by accident of dataset size. The existing holdout_weight knob only ever affected the benchmark_score accuracy field, never MCC/Brier/sn34.
Dataset config restructuring replaces the monolithic per-modality YAML files with paired real_<modality>.yaml / synthetic_<modality>.yaml files, tags datasets with a content_category for vertical-specific filtering, and consolidates benchmark sizes to a single source of truth.
What changes
Score composition
Metrics.update()accepts a per-sampleweight— confusion matrix, MCC, Brier, and CE are all weight-aware. Semantics:weight=wcontributes exactly aswduplicate samples would (property-tested).compute_metrics_from_df(score_composition=...)— takes a target share per provenance class, e.g.{"public": 0.5, "holdout": 0.3, "gasstation": 0.2}, classifies samples by dataset name (-holdout-/gasstation/ public), and derives per-class weights from target vs realized counts. Classes absent from a run (e.g. audio has no gasstation data) are dropped and remaining shares renormalized. Results includescore_composition(target),realized_composition, andprovenance_weightsso every run is self-documenting.- Plumbing:
BenchmarkRunConfig.score_composition,run_benchmark(score_composition=...), image / video / audio bench functions, CLI--holdout-share/--gasstation-share(public gets the remainder).
Dataset config restructuring
Config files split by real vs. synthetic per modality:
| Before | After |
|---|---|
image_datasets.yaml + image_human_datasets.yaml |
real_images.yaml + synthetic_images.yaml |
audio_datasets.yaml |
real_audio.yaml + synthetic_audio.yaml |
video_datasets.yaml + video_human_datasets.yaml |
real_videos.yaml + synthetic_videos.yaml |
Content category tagging
DatasetConfig gains two new optional fields: content_category (e.g. faces, documents) and generator_family. A new --content-category <CATEGORY> CLI flag filters a run to only datasets matching that tag — useful for vertical-specific evaluations without maintaining separate config files.
gasbench run --image-model ./my_model/ --content-category facesBenchmark size consolidation
Full-mode sizes were embedded in per-YAML benchmark_size fields and a parallel set of bolted-on size configs. Both are replaced by a single declaration in config.py:
"full": {"image": 55000, "video": 26000, "audio": 37000}Legacy image_benchmark_size / video_benchmark_size dual-format handling removed. Custom --dataset-config YAMLs still override size for that run.
Cleanup
- Dead code removed from
dataset/cache.py dataset/config.py: dict-to-DatasetConfigpath consolidated; size config removeddataset/download.py: exception handling tightened; per-run download cap at 1 000 files
Multi-OS dependency support
onnxruntime and decord are now platform-conditional in pyproject.toml:
| Platform | onnxruntime | decord |
|---|---|---|
| Linux | onnxruntime-gpu==1.24.2 |
decord==0.6.0 |
| macOS | onnxruntime>=1.22.0 |
— |
| Windows | onnxruntime>=1.22.0 |
— |
processing/media.py gains an opencv fallback when decord is unavailable (macOS dev installs, non-Linux CI).
Backwards compatibility
Score metrics: default paths unchanged. Without score_composition, all metrics compute exactly as before; the legacy holdout_weight accuracy-only path is preserved unchanged. Verified by back-compat tests (test_no_composition_is_backward_compatible, test_legacy_holdout_weight_only_affects_accuracy, test_uniform_composition_matches_pooled).
Config: existing --dataset-config custom YAMLs still work; benchmark_size in a custom config is still respected. The cpu extras group in pyproject.toml is removed — platform markers now handle the onnxruntime split automatically.
Tests
12 new unit tests in tests/unit/test_weighted_metrics.py:
- weight == duplication equivalence across MCC / Brier / CE / sn34
- weight scale-invariance
- provenance classification; target-share weight derivation (incl. absent-class renormalization, zero-target fallback)
- composition shifts sn34 toward holdout performance; pooled-equivalence golden test
- legacy
holdout_weightstill affects accuracy only
Release 0.6.7
| Released dataset name | Obfuscated holdout name | Modality | Media type |
|---|---|---|---|
casia_web_face_part1 |
real-image-holdout-e7a7377b |
image |
real |
casia_web_face_part10 |
real-image-holdout-9de807d2 |
image |
real |
casia_web_face_part11 |
real-image-holdout-a08902c2 |
image |
real |
casia_web_face_part12 |
real-image-holdout-09f696f5 |
image |
real |
casia_web_face_part13 |
real-image-holdout-1f3d2834 |
image |
real |
human-fg-net-real-face |
real-image-holdout-cfa0ec04 |
image |
real |
inst-it-dataset-videos-raw |
real-image-holdout-de9fe39e |
image |
real |
inst-it-dataset-videos-vpt |
real-image-holdout-315ec756 |
image |
real |
real-human-faces-data-set |
real-image-holdout-2f459c5f |
image |
real |
shhq-1.0 |
real-image-holdout-3099c51c |
image |
real |
vtuav-dataset-test-lt-001 |
real-image-holdout-57c9ef9d |
image |
real |
arc2face-1 |
synthetic-image-holdout-b210ba6d |
image |
synthetic |
posedreamer28 |
synthetic-image-holdout-14eb8c65 |
image |
synthetic |
posedreamer29 |
synthetic-image-holdout-bb33c84e |
image |
synthetic |
posedreamer30 |
synthetic-image-holdout-6db55ca9 |
image |
synthetic |
posedreamer31 |
synthetic-image-holdout-8169044e |
image |
synthetic |
posedreamer32 |
synthetic-image-holdout-4ecace03 |
image |
synthetic |
sg2-n10k-arc-r14-lang-v1 |
synthetic-image-holdout-524c2534 |
image |
synthetic |
spoof_png |
synthetic-image-holdout-abd2c011 |
image |
synthetic |
dh-facevid-1k-0000-part-08 |
real-video-holdout-05e1f42c |
video |
real |
dh-facevid-1k-0000-part-09 |
real-video-holdout-b451e823 |
video |
real |
dh-facevid-1k-0000-part-10 |
real-video-holdout-b2369df1 |
video |
real |
dh-facevid-1k-0005-part_2 |
real-video-holdout-a86f29a5 |
video |
real |
dh-facevid-1k-0005-part_3 |
real-video-holdout-45cdc454 |
video |
real |
dh-facevid-1k-0005-part_4 |
real-video-holdout-ef672baf |
video |
real |
dh-facevid-1k-0005-part_5 |
real-video-holdout-3018d860 |
video |
real |
haa500-v1-0 |
real-video-holdout-5e96e53f |
video |
real |
mavos-dd-arabic-real |
real-video-holdout-e421ee99 |
video |
real |
mavos-dd-english_real |
real-video-holdout-6ab0ed11 |
video |
real |
mavos-dd-german_real |
real-video-holdout-e71c3e03 |
video |
real |
mavos-dd-hindi_real |
real-video-holdout-d0960f2d |
video |
real |
mavos-dd-mandarin_real |
real-video-holdout-3a63331d |
video |
real |
oops-dataset |
real-video-holdout-5d1b9f6b |
video |
real |
vatex-3 |
real-video-holdout-6941c006 |
video |
real |
deepaction-v1 |
synthetic-video-holdout-bc6a27d4 |
video |
synthetic |
mcnet |
synthetic-video-holdout-29358cff |
video |
synthetic |
mobileswap |
synthetic-video-holdout-ea6535cf |
video |
synthetic |
mraa |
synthetic-video-holdout-9d4fd175 |
video |
synthetic |
oneshot |
synthetic-video-holdout-41e9d0ce |
video |
synthetic |
pirender |
synthetic-video-holdout-704b092a |
video |
synthetic |
Release 0.6.6
| Released dataset name | Obfuscated holdout name | Modality | Media type |
|---|---|---|---|
v13-e4s |
synthetic-video-holdout-90cc07d9 |
video |
synthetic |
v13-echonet-synthetic-v1 |
synthetic-video-holdout-589f8477 |
video |
synthetic |
v13-lemonade |
real-video-holdout-0ea2b302 |
video |
real |
v13-live-whisperx-526k |
real-video-holdout-40577fd5 |
video |
real |
v14-real-hallo3-training-data |
real-video-holdout-e99a6cd3 |
video |
real |
v14-real-moments-in-time-raw |
real-video-holdout-c627820a |
video |
real |
Released v15 video holdouts — Human
73 datasets.
| Released dataset name | Obfuscated holdout name | Modality | Media type |
|---|---|---|---|
DH-FaceVid-1K-0003-part_3 |
real-video-holdout-bb599c24 |
video |
real |
DH-FaceVid-1K-0003-part_4 |
real-video-holdout-0c5992ea |
video |
real |
DH-FaceVid-1K-0003-part_5 |
real-video-holdout-a778c46c |
video |
real |
MAVOS-DD-english_sonic |
synthetic-video-holdout-30ca29f6 |
video |
synthetic |
MAVOS-DD-freevc |
synthetic-video-holdout-476bb844 |
video |
synthetic |
MAVOS-DD-german_echomimic |
synthetic-video-holdout-4cda34f8 |
video |
synthetic |
MAVOS-DD-german_freevc |
synthetic-video-holdout-0ddd910c |
video |
synthetic |
MAVOS-DD-german_hififace |
synthetic-video-holdout-2d68a443 |
video |
synthetic |
MAVOS-DD-german_inswapper |
synthetic-video-holdout-ad7af038 |
video |
synthetic |
MAVOS-DD-german_knnvc |
synthetic-video-holdout-319f0b94 |
video |
synthetic |
MAVOS-DD-german_liveportrait |
synthetic-video-holdout-cd2b3dba |
video |
synthetic |
MAVOS-DD-german_memo |
synthetic-video-holdout-5f57b6ea |
video |
synthetic |
MAVOS-DD-german_roop |
synthetic-video-holdout-0f4d071b |
video |
synthetic |
MAVOS-DD-german_sonic |
synthetic-video-holdout-567ebf9e |
video |
synthetic |
MAVOS-DD-hindi_echomimic |
synthetic-video-holdout-ca8ba835 |
video |
synthetic |
MAVOS-DD-hindi_hififace |
synthetic-video-holdout-b83ad09b |
video |
synthetic |
MAVOS-DD-hindi_inswapper |
synthetic-video-holdout-8ae3fcab |
video |
synthetic |
MAVOS-DD-hindi_knnvc |
synthetic-video-holdout-5f319c69 |
video |
synthetic |
MAVOS-DD-hindi_liveportrait |
synthetic-video-holdout-28922fda |
video |
synthetic |
MAVOS-DD-hindi_memo |
synthetic-video-holdout-d36cb85b |
video |
synthetic |
MAVOS-DD-hindi_roop |
synthetic-video-holdout-1359e798 |
video |
synthetic |
MAVOS-DD-hindi_sonic |
synthetic-video-holdout-34b113dc |
video |
synthetic |
MAVOS-DD-knnvc |
synthetic-video-holdout-71c4ccd8 |
video |
synthetic |
MAVOS-DD-mandarin_echomimic |
synthetic-video-holdout-60de4b3a |
video |
synthetic |
MAVOS-DD-mandarin_freevc |
synthetic-video-holdout-421732bd |
video |
synthetic |
MAVOS-DD-mandarin_hififace |
synthetic-video-holdout-ef35d61b |
video |
synthetic |
MAVOS-DD-mandarin_inswapper |
synthetic-video-holdout-f4873f15 |
video |
synthetic |
MAVOS-DD-mandarin_knnvc |
synthetic-video-holdout-488f6e35 |
video |
synthetic |
MAVOS-DD-mandarin_liveportrait |
synthetic-video-holdout-1f619331 |
video |
synthetic |
MAVOS-DD-mandarin_memo |
synthetic-video-holdout-1f087f85 |
video |
synthetic |
MAVOS-DD-mandarin_roop |
synthetic-video-holdout-e44fa7cd |
video |
synthetic |
MAVOS-DD-mandarin_sonic |
synthetic-video-holdout-b4f9beec |
video |
synthetic |
MAVOS-DD-romanian_echomimic |
synthetic-video-holdout-8ea00713 |
video |
synthetic |
MAVOS-DD-romanian_freevc |
synthetic-video-holdout-91ec85a3 |
video |
synthetic |
MAVOS-DD-romanian_hififace |
synthetic-video-holdout-343d88d7 |
video |
synthetic |
MAVOS-DD-romanian_inswapper |
synthetic-video-holdout-1bc66d5d |
video |
synthetic |
MAVOS-DD-romanian_knnvc |
synthetic-video-holdout-79b89c16 |
video |
synthetic |
MAVOS-DD-romanian_liveportrait |
synthetic-video-holdout-ffff1093 |
video |
synthetic |
MAVOS-DD-romanian_memo |
synthetic-video-holdout-23ccac96 |
video |
synthetic |
MAVOS-DD-romanian_roop |
synthetic-video-holdout-b2ab16d9 |
video |
synthetic |
MAVOS-DD-romanian_sonic |
synthetic-video-holdout-97a8e023 |
video |
synthetic |
MAVOS-DD-russian_echomimic |
synthetic-video-holdout-41a03d06 |
video |
synthetic |
MAVOS-DD-russian_freevc |
synthetic-video-holdout-1a2148f3 |
video |
synthetic |
MAVOS-DD-russian_hififace |
synthetic-video-holdout-213d65fa |
video |
synthetic |
v15-human-vid-celebv-hq |
real-video-holdout-f10a679d |
video |
real |
v15-human-vid-dfdm_cfr23-dfaker |
synthetic-video-holdout-032b8470 |
video |
synthetic |
v15-human-vid-dfdm_cfr23-dfl-h128 |
synthetic-video-holdout-c92645dc |
video |
synthetic |
v15-human-vid-dfdm_cfr23-iae |
synthetic-video-holdout-9eee885c |
video |
synthetic |
v15-human-vid-dfdm_cfr23-lightweight |
synthetic-video-holdout-d127ad71 |
video |
synthetic |
v15-human-vid-dfdm_cfr23-real |
real-video-holdout-bd7b715b |
video |
real |
v15-human-vid-dh-facevid-1k-0002-part_1 |
real-video-holdout-a39f716a |
video |
real |
v15-human-vid-dh-facevid-1k-0002-part_2 |
real-video-holdout-055386d3 |
video |
real |
v15-human-vid-dh-facevid-1k-0002-part_3 |
real-video-holdout-3ac018ea |
video |
real |
v15-human-vid-dh-facevid-1k-0002-part_4 |
real-video-holdout-62772f42 |
video |
real |
v15-human-vid-dh-facevid-1k-0002-part_5 |
real-video-holdout-f4dab550 |
video |
real |
v15-human-vid-dh-facevid-1k-0003-part_1 |
real-video-holdout-8b493ecc |
video |
real |
v15-human-vid-dh-facevid-1k-0003-part_2 |
real-video-holdout-1cf5d2f9 |
video |
real |
v15-human-vid-digifakeav_echomimic_21501_22000 |
synthetic-video-holdout-00a91199 |
video |
synthetic |
v15-human-vid-digifakeavfvfa_with_audio |
synthetic-video-holdout-18c2b9ce |
video |
synthetic |
v15-human-vid-mavos-dd-arabic-echomimic |
synthetic-video-holdout-ec3d3cd0 |
video |
synthetic |
v15-human-vid-mavos-dd-arabic-hififace |
synthetic-video-holdout-0920c4a7 |
video |
synthetic |
v15-human-vid-mavos-dd-arabic-inswapper |
synthetic-video-holdout-af05e52b |
video |
synthetic |
v15-human-vid-mavos-dd-arabic-liveportrait |
synthetic-video-holdout-f61b042c |
video |
synthetic |
v15-human-vid-mavos-dd-arabic-roop |
synthetic-video-holdout-982fab07 |
video |
synthetic |
v15-human-vid-mavos-dd-arabic-sonic |
synthetic-video-holdout-2ebd76d7 |
video |
synthetic |
v15-human-vid-mavos-dd-english_echomimic |
synthetic-video-holdout-118ac16f |
video |
synthetic |
v15-human-vid-mavos-dd-english_freevc |
synthetic-video-holdout-4b760ab9 |
video |
synthetic |
v15-human-vid-mavos-dd-english_hififace |
synthetic-video-holdout-51cd64cb |
video |
synthetic |
v15-human-vid-mavos-dd-english_inswapper |
synthetic-video-holdout-9451219f |
video |
synthetic |
v15-human-vid-mavos-dd-english_knnvc |
synthetic-video-holdout-909d326b |
video |
synthetic |
v15-human-vid-mavos-dd-english_liveportrait |
synthetic-video-holdout-372db1bb |
video |
synthetic |
v15-human-vid-mavos-dd-english_memo |
synthetic-video-holdout-19d06111 |
video |
synthetic |
v15-human-vid-mavos-dd-english_roop |
synthetic-video-holdout-9f9485b2 |
video |
synthetic |
Release 0.6.5
- Removed
shooter-fakeimage dataset, previously incorrectly marked synthetic - Fixing label for
cosyvoice-instruct
Release 0.6.4
| Released dataset name | Obfuscated holdout name | Modality | Media type |
|---|---|---|---|
v14-real-vcapcv-vggsound-test-15446-audio-cut |
real-audio-holdout-ba647a50 |
audio |
real |
v14-real-vggsound-test-15446-video-cut |
real-audio-holdout-4e846c0f |
audio |
real |
v14-real-kallaama |
real-audio-holdout-574a0401 |
audio |
real |
v14-real-chichewa-dataset |
real-audio-holdout-dcb3cc7b |
audio |
real |
v14-real-vivos |
real-audio-holdout-74d82871 |
audio |
real |
v14-real-nisqa-corpus-dataset |
real-audio-holdout-2c36fa90 |
audio |
real |
v14-real-natural-odss |
real-audio-holdout-b8338bca |
audio |
real |
v14-real-fastpitch-hifigan |
real-audio-holdout-930542a5 |
audio |
real |
v14-real-daps |
real-audio-holdout-b20f4647 |
audio |
real |
v14-real-bci-datasets |
real-audio-holdout-aff9e8fe |
audio |
real |
v14-real-ravdess-speech-16k |
real-audio-holdout-ff483ea4 |
audio |
real |
v14-fake-somos |
synthetic-audio-holdout-222cd5cc |
audio |
synthetic |
v14-fake-diffgan-tts-aux |
synthetic-audio-holdout-19638fa3 |
audio |
synthetic |
v14-fake-grad-tts |
synthetic-audio-holdout-847ccfa5 |
audio |
synthetic |
v14-fake-tacotron2-dca-diffwave |
synthetic-audio-holdout-7d903c0d |
audio |
synthetic |
v14-fake-wavegrad2 |
synthetic-audio-holdout-4c09a9a9 |
audio |
synthetic |
v14-fake-diffgan-tts-naive |
synthetic-audio-holdout-435ce3e0 |
audio |
synthetic |
v14-fake-natspeech-diffspeech |
synthetic-audio-holdout-e6a66c72 |
audio |
synthetic |
v14-fake-fast-pitch |
synthetic-audio-holdout-b9a6a0d3 |
audio |
synthetic |
v14-fake-tacotron2-dca |
synthetic-audio-holdout-527cb6c8 |
audio |
synthetic |
v14-fake-tacotron2-dca-wavegrad |
synthetic-audio-holdout-3d18d856 |
audio |
synthetic |
v14-fake-diffgan-tts-shallow |
synthetic-audio-holdout-65bf3c65 |
audio |
synthetic |
v14-fake-prodiff |
synthetic-audio-holdout-97af50ad |
audio |
synthetic |
v14-fake-glow-tts |
synthetic-audio-holdout-3051cd6a |
audio |
synthetic |
v14-fake-tacotron2-dca-bddm |
synthetic-audio-holdout-6c72fa6a |
audio |
synthetic |
v14-fake-vits-1 |
synthetic-audio-holdout-0a918a26 |
audio |
synthetic |
v14-fake-vits |
synthetic-audio-holdout-11335f29 |
audio |
synthetic |
v14-fake-vcapv-t2a |
synthetic-audio-holdout-b2340f5f |
audio |
synthetic |
v14-MLAAD-Fake-part_01 |
fake-audio-holdout-4aeede7d |
audio |
fake |
v14-MLAAD-Fake-part_02 |
fake-audio-holdout-182aa55f |
audio |
fake |
v14-MLAAD-Fake-part_03 |
fake-audio-holdout-823cfdef |
audio |
fake |
v14-MLAAD-Fake-part_04 |
fake-audio-holdout-0a2f31b0 |
audio |
fake |
v14-MLAAD-Fake-part_05 |
fake-audio-holdout-065f3770 |
audio |
fake |
v14-MLAAD-Fake-part_06 |
fake-audio-holdout-df10edeb |
audio |
fake |
v14-MLAAD-Fake-part_07 |
fake-audio-holdout-9b51a670 |
audio |
fake |
v14-MLAAD-Fake-part_08 |
fake-audio-holdout-0c25b759 |
audio |
fake |
v14-MLAAD-Fake-part_09 |
fake-audio-holdout-7ce01af9 |
audio |
fake |
v14-MLAAD-Fake-part_10 |
fake-audio-holdout-418c4a90 |
audio |
fake |
v14-dag-asr-audio |
real-audio-holdout-c9491bb5 |
audio |
real |
v14-WaxalNLP-TTS-part_01 |
real-audio-holdout-de4859f5 |
audio |
real |
v14-WaxalNLP-TTS-part_02 |
real-audio-holdout-c1c223f4 |
audio |
real |
v14-WaxalNLP-TTS-part_03 |
real-audio-holdout-e53dfd16 |
audio |
real |
v14-WaxalNLP-TTS-part_04 |
real-audio-holdout-7f031cf1 |
audio |
real |
v14-WaxalNLP-TTS-part_05 |
real-audio-holdout-fcdd0715 |
audio |
real |
v14-WaxalNLP-TTS-part_06 |
real-audio-holdout-a6702abe |
audio |
real |
v14-WaxalNLP-TTS-part_07 |
real-audio-holdout-c462f462 |
audio |
real |
v14-WaxalNLP-TTS-part_08 |
real-audio-holdout-a80cd5b3 |
audio |
real |
v14-WaxalNLP-TTS-part_09 |
real-audio-holdout-a0791dd4 |
audio |
real |
v14-WaxalNLP-TTS-part_10 |
real-audio-holdout-14d066a5 |
audio |
real |
v14-real-mmhu-h-videos |
real-video-holdout-d4f56405 |
video |
real |
v14-real-mmhu-t-videos |
real-video-holdout-14cc82ed |
video |
real |
v14-real-mmhu-v-videos |
real-video-holdout-b7174cb7 |
video |
real |
v14-real-vivid |
real-video-holdout-8534b595 |
video |
real |
v14-real-soccernet-10s-5class |
real-video-holdout-74616031 |
video |
real |
v14-real-ofdvdnet |
real-video-holdout-b1b60d9c |
video |
real |
v14-real-or-video-mov |
real-video-holdout-aacc44e4 |
video |
real |
v14-real-poultry-videos |
real-video-holdout-acaccb84 |
video |
real |
v14-real-spatialvid-group-001 |
real-video-holdout-a5173e55 |
video |
real |
v14-real-spatialvid-group-002 |
real-video-holdout-b4180215 |
video |
real |
v14-real-spatialvid-group-003 |
real-video-holdout-908daf61 |
video |
real |
v14-real-spatialvid-group-004 |
real-video-holdout-52ee57f0 |
video |
real |
v14-real-spatialvid-group-005 |
real-video-holdout-58bed19f |
video |
real |
v14-real-videoespresso-train-video-01 |
real-video-holdout-2546c150 |
video |
real |
v14-real-videoespresso-train-video-02 |
real-video-holdout-e444b81e |
video |
real |
v14-real-open-o3-video |
real-video-holdout-73a9d20e |
video |
real |
v14-real-panflow-1 |
real-video-holdout-51fd7e37 |
video |
real |
v14-real-panflow-2 |
real-video-holdout-2ed3f604 |
video |
real |
v14-real-panflow-3 |
real-video-holdout-4d085ece |
video |
real |
v14-real-panflow-4 |
real-video-holdout-e4fe8a2c |
video |
real |
v14-real-dh-facevid-1k-0001 |
real-video-holdout-6fc0b313 |
video |
real |
v14-real-tracking-any-granularity-videos |
real-video-holdout-ccf53e73 |
video |
real |
v14-real-wild-animal-recognition-video-dataset |
real-video-holdout-d2b5f026 |
video |
real |
v14-real-wlasl-videos |
real-video-holdout-2e1dc2db |
video |
real |
v14-real-wlasl-videos-1 |
real-video-holdout-1c782169 |
video |
real |
v14-real-wlasl-raw-videos-mp4 |
real-video-holdout-1122cfce |
video |
real |
v14-real-youtubeclips |
real-video-holdout-b055743a |
video |
real |
v14-fake-allegro |
synthetic-video-holdout-089c6870 |
video |
synthetic |
v14-fake-animatediffturbo |
synthetic-video-holdout-d7c4ecc2 |
video |
synthetic |
v14-fake-ltxvideo |
synthetic-video-holdout-3785e46b |
video |
synthetic |
v14-fake-mochi1 |
synthetic-video-holdout-c4ccd94d |
video |
synthetic |
v14-fake-pyramidflow |
synthetic-video-holdout-38ebcccf |
video |
synthetic |
v14-fake-videocrafter2 |
synthetic-video-holdout-0b71f5df |
video |
synthetic |
v14-fake-animatediff |
synthetic-video-holdout-b4755652 |
video |
synthetic |
v14-fake-cogvideox |
synthetic-video-holdout-22f24e88 |
video |
synthetic |
v14-fake-fastsvd |
synthetic-video-holdout-58631be3 |
video |
synthetic |
v14-fake-lavie |
synthetic-video-holdout-e14b7524 |
video |
synthetic |
v14-fake-modelscope |
synthetic-video-holdout-4df93a32 |
video |
synthetic |
v14-fake-opensora12 |
synthetic-video-holdout-8ab9a23d |
video |
synthetic |
v14-fake-opensora |
synthetic-video-holdout-59a9b3c9 |
video |
synthetic |
v14-fake-t2vturbo |
synthetic-video-holdout-dd2a8901 |
video |
synthetic |
v14-fake-vcapav-t2v |
synthetic-video-holdout-e31c3965 |
video |
synthetic |
v14-fake-cameraclone-0316 |
synthetic-video-holdout-82bcb0bf |
video |
synthetic |
v14-fake-cameraclone-0317 |
synthetic-video-holdout-5a0cbed0 |
video |
synthetic |
v14-fake-cameraclone-0401 |
synthetic-video-holdout-d4b3ea97 |
video |
synthetic |
v14-fake-cameraclone-0402 |
synthetic-video-holdout-e194c1a8 |
video |
synthetic |
v14-fake-cameraclone-0404 |
synthetic-video-holdout-dc7cf915 |
video |
synthetic |
v14-fake-cameraclone-0407 |
synthetic-video-holdout-8e30a656 |
video |
synthetic |
v14-fake-cameraclone-0410 |
synthetic-video-holdout-decdec92 |
video |
synthetic |
v14-real-chinese-mp4-in-audio |
real-video-holdout-6e020e98 |
video |
real |
Synthetic-Images-Fire-Scenario |
synthetic-image-holdout-8fae48b4 |
image |
synthetic |
synthetic-dataset |
synthetic-image-holdout-e4b4fa03 |
image |
synthetic |
Midjourneyv5-5K |
synthetic-image-holdout-8dc0c9de |
image |
synthetic |
fake_sdxl_12k-part-1 |
synthetic-image-holdout-edd5dd1b |
image |
synthetic |
fake_sdxl_12k-part-2 |
synthetic-image-holdout-0f3be6af |
image |
synthetic |
fake_sdxl_12k-part-3 |
synthetic-image-holdout-f291d1b7 |
image |
synthetic |
fake_sdxl_12k-part-4 |
synthetic-image-holdout-64663d9f |
image |
synthetic |
Synthetic-Dog-Images |
synthetic-image-holdout-c11a8170 |
image |
synthetic |
synthetic_data_0.1 |
synthetic-image-holdout-67a8e625 |
image |
synthetic |
syntheticdata_0.15 |
synthetic-image-holdout-af60fa23 |
image |
synthetic |
ptd-synthetic |
synthetic-image-holdout-061fbf94 |
image |
synthetic |
image_patches_raw |
synthetic-image-holdout-6405b7db |
image |
synthetic |
stable-imagenet1k-flat |
synthetic-image-holdout-3b5b6e0b |
image |
synthetic |
Shooter-fake |
synthetic-image-holdout-29eb8247 |
image |
synthetic |
SDv15R-dpmsolver-25-15K-part0 |
synthetic-image-holdout-1d5da5bc |
image |
synthetic |
SDv15R-dpmsolver-25-15K-part1 |
`synthetic-image-hol... |
Release 0.6.3
Deprecating old cache policy logic, previously used to determine what samples to keep in the gasstation cache when fool rates were more dynamic.
Release 0.6.2
Parallelize gasbench data loading & fix memory leaks
Problem
Image and video benchmarks run unacceptably slowly when data was coming from NAS (not noticeable on local setups), and also occasionally OOM deep into runs. T
Root causes identified:
-
Sequential disk I/O from network volumes — The
DatasetIteratorreads each image/video file one-by-one in the producer thread. Each read from a NAS incurs network latency, paid serially N times. -
"Drain-all-futures" stall —
PrefetchPipelineaccumulatednum_workers * 2futures then blocked on ALL of them (for future in futures: future.result()). The pipeline stalls on the slowest task even when other workers are idle. -
Only 3 worker threads — With I/O-bound work (network volume reads + PIL decode), 3 threads underutilize available concurrency.
-
Memory leak from large images — Datasets with very large source images (100+ megapixels observed in logs) cause multi-GB memory spikes because image bytes are held in multiple places simultaneously: the sample dict, the result dict, and the batch queue. No explicit cleanup of PIL Image objects in multi-threaded workers.
Changes
gasbench/src/gasbench/dataset/iterator.py
- Added
lazy_read: boolparameter toDatasetIterator - When
True, image samples yield{"image_path": ...}instead of reading file bytes; video samples yield{"video_path": ...}for file-based videos (frame directories are already lazy) - Iterating the dataset becomes near-instant (path collection only, no I/O)
gasbench/src/gasbench/benchmarks/image_bench.py
- Rewrote
PrefetchPipelinewith three fixes:- Parallel I/O: New
_read_and_preprocess()does file read + PIL decode + augmentation as a single unit inside worker threads — 8 threads read from the network volume concurrently - Bounded sliding window: Uses
wait(FIRST_COMPLETED)withmax_in_flight = num_workers * 4 = 32instead of submit-all. Prevents unbounded memory growth from completed-but-unconsumed futures - Sample metadata stripping: Drops heavy keys (
image,image_bytes,image_path) from result dicts immediately after preprocessing — tracker only needs metadata fields
- Parallel I/O: New
- Default
num_workersincreased from 3 → 8 DatasetIteratorcreated withlazy_read=Trueexecutor.shutdown()now usescancel_futures=Truefor clean teardown
gasbench/src/gasbench/benchmarks/video_bench.py
- Same rewrite applied to
VideoPrefetchPipeline - Default
num_workersincreased from 3 → 4 (fewer than image due to heavier per-sample memory) max_in_flight = num_workers * 3 = 12(tighter bound for video frames)- Strips
video_bytesandvideo_pathfrom result dicts
gasbench/src/gasbench/processing/media.py
- Added explicit
image.close()inprocess_image_sample()after extracting the numpy array — prevents PIL Image objects from lingering in multi-threaded workers
Expected impact
| Metric | Before | After |
|---|---|---|
| Image I/O concurrency | 1 (serial) | 8 threads |
| Video I/O concurrency | 1 (serial) | 4 threads |
| Pipeline stall pattern | Drain all 6, block on slowest | FIRST_COMPLETED, no stalls |
| Peak in-flight samples (image) | 6 | 32 (bounded) |
| Peak in-flight samples (video) | 6 | 12 (bounded) |
| Image bytes in result dict | Held until tracker consumes | Stripped immediately |
| PIL Image cleanup | GC-dependent | Explicit .close() |
| Est. image benchmark time | ~5 hours (52 datasets) | ~1-2 hours |