Echo Workload Tracer is a tool designed for tracing and analyzing the workloads of different deep learning frameworks. Currently, it supports PyTorch, DeepSpeed and Megatron-LM frameworks.
The Echo Workload Tracer focuses on capturing runtime information and generating detailed workload graphs from deep learning training jobs, using only 1 GPU device. This data is essential for analyzing performance, optimizing resource utilization, and simulating distributed training at scale.
- PyTorch Support: Comprehensive tracing for PyTorch models, including:
- Support for HuggingFace Transformers models
- Support for libs (e.g., transformers, torchvision) models and custom PyTorch models
- Support training parallel mode like DDP
- Capturing both forward and backward passes and extracting execution graphs and runtime data
Note:We are developing PyTorch-tracer to support PyTorch native training framework torchtitan. Tracers of Deepspeed and Megatron-LM are under active development. We would keep updating these.
- NVIDIA GPU with CUDA support (at least 1 GPU device)
-
Clone the repository
git clone https://github.com/fwyc0573/Echo-workload-tracer.git cd Echo-workload-tracer export PYTHONPATH=$PYTHONPATH:/path/to/Echo-workload-tracer
-
Setup Conda environment
conda env create -f environment.yaml conda activate simulator_echo
# Basic usage (local model)
./pytorch_tracing_run.sh --model gpt2 --batch_size 32 --sequence_length 512 --num_gpus 1 --model_source local
# Advanced usage (huggingface model)
./pytorch_tracing_run_huggingface.sh --model deepseek-ai/deepseek-coder-1.3b-base --model_source huggingface --batch_size 2 --sequence_length 256 --num_repeats 5 --num_gpus 1
# Advanced usage (ddp mode)
./pytorch_tracing_run_ddp.sh --model gpt2 --batch_size 1 --bucket_cap_mb 10 --sequence_length 512 --num_gpus 2 --model_source local
--framework: Framework to use for workload tracing (choices: 'PyTorch', 'DeepSpeed', 'Megatron-LM', default: 'PyTorch')--base_path: Path to save the output (default: 'output/')--num_gpus: Number of GPUs to use in training (default: 1)
--model: Model to benchmark (default: 'gpt2')--model_source: Model source (choices: 'huggingface', 'local', default: 'local')--batch_size: Batch size for training/inference (default: 16)--sequence_length: Sequence length for input data (default: 512)--num_repeats: Number of repetitions for averaging results (default: 1)--pytorch_ops_profiling: Enable operations profiling for PyTorch workload--pytorch_graph_profiling: Enable graph profiling for PyTorch workload--pytorch_ddp: Enable PyTorch DistributedDataParallel (DDP) mode--bucket_cap_mb: Communication bucket size used in DDP mode (default: 25)
The tracer generates the following outputs:
output/
├── logs/
│ └── PyTorch/
│ ├── config_[model_name]_bs[batch_size]_seq[seq_length].json # Tracing config files for each run
├── PyTorch/
│ ├── pytorch_graph_profiling/
│ │ └── [model_source]/
│ │ └── [model_name]/
│ │ ├── forward_graph_profiling_bs[batch_size]_seq[seq_length].json # Forward graph profiling
│ │ ├── backward_graph_profiling_bs[batch_size]_seq[seq_length].json # Backward graph profiling
│ │ └── global_graph_profiling_bs[batch_size]_seq[seq_length].json # Global graph profiling
│ └── pytorch_ops_profiling/
│ └── [model_source]/
│ └── [model_name]/
│ ├── forward_ops_profiling_bs[batch_size]_seq[seq_length].json # Forward ops profiling
│ ├── backward_ops_profiling_bs[batch_size]_seq[seq_length].json # Backward ops profiling
│ ├── global_ops_profiling_bs[batch_size]_seq[seq_length].json # Global ops profiling
│ └── PyTorch tracer_[timestamp].log # Tracer log files
To add support for additional model types:
- Define the model loading function in the appropriate tracer file
- Register the model source in
tracer_arguments.py - Handle any model-specific operations or patterns
If you use this tool in your research, please cite our paper:
@article{echo2024,
title={Echo: Simulating Distributed Training At Scale},
author={Yicheng Feng, Yuetao Chen, Kaiwen Chen, Jingzong Li, Tianyuan Wu, Peng Cheng, Chuan Wu, Wei Wang, Tsung-Yi Ho, Hong Xu},
journal={arXiv preprint arXiv:2412.12487},
year={2024}
}Please email Yicheng Feng for questions or issues related to this project.
This project is licensed under the MIT license - see the LICENSE file for details.