Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
0040b97
Add deterministic training functionality to PyTorch LLaMA benchmark
Aishwarya-Tonpe Aug 5, 2025
e103dd0
llama: add periodic checksum logging (deterministic-only, log-only); …
Aishwarya-Tonpe Aug 11, 2025
87ff6d6
deterministic training: enable seeding + deterministic algorithms acr…
Aishwarya-Tonpe Aug 11, 2025
8eee235
tests(pytorch): add strict determinism skip guards and detailed docst…
Aishwarya-Tonpe Aug 11, 2025
fe34247
Refactor LLaMA model tests: align strict, soft determinism, and check…
Aishwarya-Tonpe Aug 11, 2025
c374dfe
examples: add deterministic and strict_determinism flags and docs to …
Aishwarya-Tonpe Aug 11, 2025
614f96c
Deterministic fingerprints: replace checksum with Loss+ActMean across…
Aishwarya-Tonpe Aug 12, 2025
689dc44
Deterministic training + reproducible logging: align GPT-2/LLaMA/LSTM…
Aishwarya-Tonpe Aug 16, 2025
33c3f6a
Adding flag: Checck-frequency
Aishwarya-Tonpe Aug 18, 2025
f35e98b
Add Check frequency flag to tests
Aishwarya-Tonpe Aug 19, 2025
dd7fcbe
Code refactor: Move enable_determinism to base class, add a consolida…
Aishwarya-Tonpe Aug 20, 2025
d439395
Code refactor: Add a new test folder to remove redundant code, remove…
Aishwarya-Tonpe Aug 20, 2025
da9c85a
Code refactor: Move loss and ActMean logging to base class from indiv…
Aishwarya-Tonpe Aug 20, 2025
2635aad
Code refactor: Move _benchmark() method to base class
Aishwarya-Tonpe Aug 20, 2025
4a21990
Code refactor: Add method _finalize_periodic_logging to base class to…
Aishwarya-Tonpe Aug 20, 2025
ddd3f23
Code cleanup: Remove unnecessary imports
Aishwarya-Tonpe Aug 20, 2025
a9cb452
Code cleanup: Remove unnecessary imports
Aishwarya-Tonpe Aug 20, 2025
52c5516
Code cleanup: Remove unnecessary imports
Aishwarya-Tonpe Aug 20, 2025
6623f59
Code cleanup: Remove unnecessary imports
Aishwarya-Tonpe Aug 20, 2025
8853c21
Tescase addition: Add Failure testcase, renameflag
Aishwarya-Tonpe Aug 21, 2025
14be806
Delete extra lines
Aishwarya-Tonpe Aug 21, 2025
8cd1c19
Add Docstrings, align imports, add assertions messages
Aishwarya-Tonpe Aug 26, 2025
99bdc16
Lint Checks
Aishwarya-Tonpe Aug 27, 2025
4bc0445
Lint Checks
Aishwarya-Tonpe Aug 28, 2025
2c8d856
Lint Checks
Aishwarya-Tonpe Aug 28, 2025
d8d9ca0
Failed check: Resolving failed pipeline check for creating temp file …
Aishwarya-Tonpe Aug 28, 2025
8bcd801
Pipeline failure fixes : Fixing Lint failures on test, example and ba…
Aishwarya-Tonpe Aug 28, 2025
315d07f
Pipeline failure fixes : Fixing Lint failures on test, example and ba…
Aishwarya-Tonpe Aug 28, 2025
5ae57f0
Pipeline failure error: Github not reflecting change in base file, at…
Aishwarya-Tonpe Aug 28, 2025
c379c5e
Pipeline failure fixes
Aishwarya-Tonpe Aug 28, 2025
3b186cf
Pipeline failure fixes
Aishwarya-Tonpe Aug 29, 2025
64d7b81
Test file lint fixes
Aishwarya-Tonpe Aug 29, 2025
90a6595
Pipeline Error: Mixtral create Model
Aishwarya-Tonpe Aug 29, 2025
055723c
Modifying test parameters for efficiency
Aishwarya-Tonpe Aug 29, 2025
b47688d
Attempting to skip tests for heavy models in CI
Aishwarya-Tonpe Aug 29, 2025
13ad2fe
Attempting to skip tests for heavy models in CI
Aishwarya-Tonpe Aug 29, 2025
2ed5ae0
Skipping tests for CICD
Aishwarya-Tonpe Aug 29, 2025
10ae1a3
Removing unnecessary code
Aishwarya-Tonpe Sep 3, 2025
fb21a9f
Adding Metadata Overriding logic to fetch metadata from the log file …
Aishwarya-Tonpe Sep 4, 2025
f3bb260
Adding Metadata Overriding logic to fetch metadata from the log file …
Aishwarya-Tonpe Sep 4, 2025
172b02b
Lint Fixes
Aishwarya-Tonpe Sep 4, 2025
de326d5
Pipeline failure fix
Aishwarya-Tonpe Sep 4, 2025
6497bf5
Adding test for coverage
Aishwarya-Tonpe Sep 4, 2025
8a8599e
Pipeline failure fix
Aishwarya-Tonpe Sep 4, 2025
a68b4df
Pipeline failure fix
Aishwarya-Tonpe Sep 4, 2025
e59fc61
Adding Info about deterministic traning to docs
Aishwarya-Tonpe Sep 15, 2025
7c6120d
Adding Info about deterministic traning to docs
Aishwarya-Tonpe Sep 15, 2025
860f0f9
Merge branch 'main' into aishwaryatonpe/deterministic-training
polarG Sep 22, 2025
2892a69
Comments resolve: Add docstrings, Make changes to ensure same lenghts…
Aishwarya-Tonpe Oct 1, 2025
0195d98
COmment resolve : Remove process_info, deprecated
Aishwarya-Tonpe Oct 1, 2025
ea6f7fc
Fixing Lint errors
Aishwarya-Tonpe Oct 1, 2025
d8acbf2
Lint checkes resolve
Aishwarya-Tonpe Oct 2, 2025
8629e8b
Lint checkes resolve
Aishwarya-Tonpe Oct 2, 2025
b15393f
Test case fixes : removing log-path from test-pytorch_determinism_all
Aishwarya-Tonpe Oct 2, 2025
529ab12
Comments removed
Aishwarya-Tonpe Oct 2, 2025
2cb80c0
Merge branch 'main' into aishwaryatonpe/deterministic-training
Aishwarya-Tonpe Oct 2, 2025
54d3449
Fixing test_pytorch_deterministic_all
Aishwarya-Tonpe Oct 2, 2025
e91ec63
Comments address : Removing redundant code
Aishwarya-Tonpe Oct 2, 2025
8fc3d5f
Moving seeding logic to make it centralised to model base
Aishwarya-Tonpe Oct 2, 2025
0848c7a
Moving seeding logic to make it centralised to model base
Aishwarya-Tonpe Oct 2, 2025
42718f0
Merge branch 'main' into aishwaryatonpe/deterministic-training
Aishwarya-Tonpe Oct 8, 2025
615bc94
Comments resolve: removing redundant method, adding loggers
Aishwarya-Tonpe Oct 8, 2025
a2e2e20
Merge branch 'main' into aishwaryatonpe/deterministic-training
Aishwarya-Tonpe Oct 9, 2025
59cfdd1
Resolving merge conflicts
Aishwarya-Tonpe Oct 9, 2025
e893a5a
Merge branch 'main' into aishwaryatonpe/deterministic-training
Aishwarya-Tonpe Oct 23, 2025
d909477
Merge branch 'main' into aishwaryatonpe/deterministic-training
Aishwarya-Tonpe Nov 10, 2025
436890e
Merge branch 'main' of https://github.com/microsoft/superbenchmark in…
Dec 8, 2025
e4d2f5e
Removing check_frequency parameter from is_finished method in train a…
Dec 8, 2025
d0bfd38
Comments resolve : Removing check_frequency assignment to the variable
Dec 8, 2025
197007a
Update superbench/benchmarks/model_benchmarks/pytorch_base.py
Aishwarya-Tonpe Dec 8, 2025
4724815
Update tests/benchmarks/model_benchmarks/test_pytorch_determinism_all.py
Aishwarya-Tonpe Dec 8, 2025
fdc82ad
Update superbench/benchmarks/model_benchmarks/pytorch_base.py
Aishwarya-Tonpe Dec 8, 2025
373fdf3
Logic change to add metrics to resuls_summary file, Logic change to m…
Aishwarya-Tonpe Dec 15, 2025
11e945e
Moving CUBLAS_WORKSPACE_CONFIG=:4096:8 to the code base so that it do…
Aishwarya-Tonpe Dec 15, 2025
4911580
Renaming --deterministic -> --enable-determinism
Aishwarya-Tonpe Dec 15, 2025
67fca5c
Comments resolve: minor deletions
Aishwarya-Tonpe Dec 15, 2025
ce18856
Update superbench/benchmarks/model_benchmarks/pytorch_base.py
Aishwarya-Tonpe Dec 15, 2025
31f46ad
Update superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py
Aishwarya-Tonpe Dec 15, 2025
c5895b1
Update docs/user-tutorial/benchmarks/model-benchmarks.md
Aishwarya-Tonpe Dec 15, 2025
e457b83
Refactoring the code: Moving utility functions to model_log_utils
Aishwarya-Tonpe Dec 16, 2025
02d568a
Merge branch 'aishwaryatonpe/deterministic-training' of https://githu…
Aishwarya-Tonpe Dec 16, 2025
a249916
Updating the user docs
Aishwarya-Tonpe Dec 16, 2025
039b17e
Updating the test files and fixing lint errors
Aishwarya-Tonpe Dec 17, 2025
a26518c
Lint error fixes
Aishwarya-Tonpe Dec 17, 2025
c8abf0c
Pipeline erros resolve : Link errors, function complex error
Aishwarya-Tonpe Dec 17, 2025
2f5493a
Resetting the env var cause of failing testcases in the pipeline, tes…
Aishwarya-Tonpe Dec 17, 2025
8398f51
Resolving pipelines errors
Aishwarya-Tonpe Dec 17, 2025
7c5405a
Resolving pipelines errors
Aishwarya-Tonpe Dec 17, 2025
6b51a18
Resolving pipeline issues
Aishwarya-Tonpe Dec 17, 2025
c8ca973
Adding a new test file to cover the code logic in the model_utils file
Aishwarya-Tonpe Dec 17, 2025
7f6bfeb
Resolving pipeline issues
Aishwarya-Tonpe Dec 18, 2025
205934e
Resolving pipeline issues
Aishwarya-Tonpe Dec 18, 2025
3e996f2
resolving pipeline issues
Aishwarya-Tonpe Dec 18, 2025
ea9f6b2
Resolving pipeline failures
Aishwarya-Tonpe Dec 18, 2025
3b31c6a
Fix pipeline issues
Aishwarya-Tonpe Dec 18, 2025
4384412
Minor change
Aishwarya-Tonpe Dec 19, 2025
b5967f7
Merge branch 'main' into aishwaryatonpe/deterministic-training
Aishwarya-Tonpe Jan 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/user-tutorial/benchmarks/model-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,19 @@ For inference, supported percentiles include

**New: Support fp8_hybrid and fp8_e4m3 precision for BERT models.**

**New: Deterministic Training Support**
SuperBench now supports deterministic training to ensure reproducibility across runs. This includes fixed seeds and deterministic algorithms. To enable deterministic training, the following flags and environment variables must be set:

- **Flags:**
- `--enable-determinism`: Enables deterministic computation for reproducible results.
- `--deterministic_seed <seed>`: Sets the seed for reproducibility.
- `--generate_log` : Boolean flag that stores comparison metrics in the results file
- `--compare_log <results_file_path>`: Specifies the path to the reference file for comparison.

- **Environment Variables:**
- (Implicitly set when `enable-determinism` flag is set)
- `CUBLAS_WORKSPACE_CONFIG=:4096:8`: Ensures deterministic behavior in cuBLAS.

#### Metrics

| Name | Unit | Description |
Expand Down
137 changes: 137 additions & 0 deletions examples/benchmarks/pytorch_deterministic_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Unified PyTorch deterministic training example for all supported models.

Deterministic metrics (loss, activation mean) are automatically stored in results.json
when --enable-determinism flag is enabled. Use --compare-log to compare against a reference run.

Commands to run:
Run A (generate reference):

python3 examples/benchmarks/pytorch_deterministic_example.py \
--model <model_from_MODEL_CHOICES> --enable-determinism --deterministic-seed 42

This creates results-0.json with deterministic metrics.

Run B (compare against reference):

python3 examples/benchmarks/pytorch_deterministic_example.py \
--model <model_from_MODEL_CHOICES> --enable-determinism --deterministic-seed 42 --compare-log results-0.json

Note: CUBLAS_WORKSPACE_CONFIG is now automatically set by the code when determinism is enabled.
"""

import argparse
import json
from pathlib import Path
from superbench.benchmarks import BenchmarkRegistry, Framework
from superbench.common.utils import logger

MODEL_CHOICES = [
'bert-large',
'gpt2-small',
'llama2-7b',
'mixtral-8x7b',
'resnet101',
'lstm',
]

DEFAULT_PARAMS = {
'bert-large':
'--batch_size 1 --seq_len 64 --num_warmup 1 --num_steps 200 --precision float32 '
'--model_action train --check_frequency 20',
'gpt2-small':
'--batch_size 1 --num_steps 300 --num_warmup 1 --seq_len 128 --precision float32 '
'--model_action train --check_frequency 20',
'llama2-7b':
'--batch_size 1 --num_steps 300 --num_warmup 1 --seq_len 512 --precision float32 --model_action train '
'--check_frequency 20',
'mixtral-8x7b':
'--hidden_size=4096 --num_hidden_layers=32 --num_attention_heads=32 --intermediate_size=14336 '
'--num_key_value_heads=8 --max_position_embeddings=32768 --router_aux_loss_coef=0.02 '
'--check_frequency 20',
'resnet101':
'--batch_size 1 --precision float32 --num_warmup 1 --num_steps 120 --sample_count 8192 '
'--pin_memory --model_action train --check_frequency 20',
'lstm':
'--batch_size 1 --num_steps 100 --num_warmup 2 --seq_len 64 --precision float16 '
'--model_action train --check_frequency 30',
}


def main():
"""Main function for determinism example file."""
parser = argparse.ArgumentParser(description='Unified PyTorch deterministic training example.')
parser.add_argument('--model', type=str, choices=MODEL_CHOICES, required=True, help='Model to run.')
parser.add_argument(
'--enable-determinism',
'--enable_determinism',
action='store_true',
help='Enable deterministic mode for reproducible results.',
)
parser.add_argument(
'--compare-log',
type=str,
default=None,
help='Path to reference results.json file for deterministic comparison.',
)
parser.add_argument(
'--deterministic-seed',
type=int,
default=None,
help='Seed for deterministic training.',
)
args = parser.parse_args()

parameters = DEFAULT_PARAMS[args.model]
if args.enable_determinism:
parameters += ' --enable-determinism'
if args.deterministic_seed is not None:
parameters += f' --deterministic_seed {args.deterministic_seed}'
if args.compare_log:
parameters += f' --compare-log {args.compare_log}'

context = BenchmarkRegistry.create_benchmark_context(args.model, parameters=parameters, framework=Framework.PYTORCH)
benchmark = BenchmarkRegistry.launch_benchmark(context)
logger.info(f'Benchmark finished. Return code: {benchmark.return_code}')

# Save results to file for comparison
if not args.compare_log:
# Find next available results file name
counter = 0
while Path(f'results-{counter}.json').exists():
counter += 1
results_file = f'results-{counter}.json'

# Parse benchmark results and create nested format like results-summary.json
benchmark_results = json.loads(benchmark.serialized_result)

# Create nested structure: raw_data -> benchmark_name -> metrics
# Extract the benchmark name from the results (e.g., "pytorch-lstm")
benchmark_name = benchmark_results.get('name', args.model)

# Create results in the format expected by comparison logic
nested_results = {
'raw_data': {
f'model-benchmarks:{args.model}/{benchmark_name}': benchmark_results.get('raw_data', {})
}
}

# Write results to file
with open(results_file, 'w') as f:
json.dump(nested_results, f, indent=2)
logger.info(f'Results saved to {results_file}')
logger.info(f'To compare against this run, use: --compare-log {results_file}')
else:
logger.info(f'Comparison completed against {args.compare_log}')

if hasattr(benchmark, '_model_run_metadata'):
logger.info(f'Run metadata: {benchmark._model_run_metadata}')
if hasattr(benchmark, '_model_run_periodic'):
num_checkpoints = len(benchmark._model_run_periodic.get('step', []))
logger.info(f'Periodic fingerprints collected at {num_checkpoints} checkpoints')


if __name__ == '__main__':
main()
64 changes: 60 additions & 4 deletions superbench/benchmarks/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,66 @@ def parse_args(self, ignore_invalid=False):
logger.error('Invalid argument - benchmark: {}, message: {}.'.format(self._name, str(e)))
return False, None, []

ret = True
if args is not None and 'compare_log' in [a.dest for a in self._parser._actions]:
args = self._override_args_with_compare_log(args)

ret = self._check_unknown_args(unknown)

return ret, args, unknown

def _override_args_with_compare_log(self, args):
"""Override arguments with metadata from a compare log file if available.

This is a legacy method. Metadata override is now handled by benchmark-specific
implementations (e.g., pytorch_base.py for PyTorch models).

Args:
args: Parsed arguments.

Returns:
argparse: Arguments (returned unchanged).
"""
return args

def _convert_precision_value(self, value, Precision):
"""Convert precision values to the appropriate format.

Args:
value: The precision value to convert.
Precision: The Precision class or type to convert to.

Returns:
list: A list of converted precision values.
"""
if isinstance(value, list):
converted = []
for v in value:
if isinstance(v, Precision):
converted.append(v)
else:
converted.append(Precision(v))
return converted
else:
if isinstance(value, Precision):
return [value]
else:
return [Precision(value)]

def _check_unknown_args(self, unknown):
"""Check for unknown arguments and log an error if any are found.

Args:
unknown (list): List of unknown arguments.

Returns:
bool: False if unknown arguments are found, True otherwise.
"""
if len(unknown) > 0:
logger.error(
'Unknown arguments - benchmark: {}, unknown arguments: {}'.format(self._name, ' '.join(unknown))
)
ret = False

return ret, args, unknown
return False
return True

def _preprocess(self):
"""Preprocess/preparation operations before the benchmarking.
Expand Down Expand Up @@ -263,6 +315,10 @@ def __check_raw_data(self):
instance of List[List[Number]] or List[str] for BenchmarkType.MICRO.
"""
for metric in self._result.raw_data:
# Skip validation for metadata (dict type used for configuration storage)
if metric.startswith('metadata'):
continue

is_valid = True
if self._benchmark_type == BenchmarkType.MODEL:
is_valid = self.__is_list_list_type(self._result.raw_data[metric], numbers.Number)
Expand Down
17 changes: 17 additions & 0 deletions superbench/benchmarks/model_benchmarks/model_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,17 @@ def _generate_dataset(self):
"""
pass

def set_deterministic_seed(self):
"""Hook to set deterministic RNG state before dataset generation.

Framework-specific subclasses may
override this to apply deterministic RNG settings (for example,
PyTorch benchmarks implement this to call their deterministic setup
when requested). This is called from _preprocess() before
_generate_dataset().
"""
return None

@abstractmethod
def _init_dataloader(self):
"""Initialize the dataloader.
Expand Down Expand Up @@ -221,6 +232,12 @@ def _preprocess(self):
self._result.set_return_code(ReturnCode.DISTRIBUTED_SETTING_INIT_FAILURE)
return False

# Invoke model-specific deterministic seeding hook before dataset generation
try:
self.set_deterministic_seed()
except Exception:
logger.info('set_deterministic_seed() hook failed or not implemented for model: %s', self._name)

# Set sample_count aligned with batch_size.
self._args.sample_count = math.ceil(self._args.sample_count / self._args.batch_size) * self._args.batch_size

Expand Down
Loading
Loading