-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
PySDK Version
- PySDK V2 (2.x)
- PySDK V3 (3.x)
Describe the bug
Pipeline parameters (e.g., ParameterInteger, ParameterString) cannot be used in ModelTrainer hyperparameters because the safe_serialize function in sagemaker-train/src/sagemaker/train/utils.py doesn't handle PipelineVariable objects, causing a TypeError when building the pipeline.
To reproduce
Trying to pass max_depth as a Pipeline parameter to an XGBoost container in ModelTrainer
from sagemaker.core.workflow.parameters import ParameterInteger
from sagemaker.train import ModelTrainer
from sagemaker.core.training.configs import Compute
from sagemaker.core.workflow.pipeline_context import PipelineSession
from sagemaker.mlops.workflow.steps import TrainingStep
from sagemaker.mlops.workflow.pipeline import Pipeline
# Create pipeline session and parameter
pipeline_session = PipelineSession()
max_depth = ParameterInteger(name="MaxDepth", default_value=5)
# Create ModelTrainer with pipeline parameter in hyperparameters
model_trainer = ModelTrainer(
training_image="683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3",
compute=Compute(instance_type="ml.m5.xlarge", instance_count=1),
sagemaker_session=pipeline_session,
role=role,
hyperparameters={
"max_depth": max_depth, # Pipeline parameter
},
)
train_args = model_trainer.train()
step_train = TrainingStep(name="TrainStep", step_args=train_args)
# Create and upsert pipeline
pipeline = Pipeline(
name="test-pipeline",
parameters=[max_depth],
steps=[step_train],
sagemaker_session=pipeline_session,
)
It will fail on this step:
pipeline.upsert(role_arn=role)
Expected behavior
Pipeline parameters should be serialized correctly and the pipeline should be created successfully, allowing hyperparameters to be parameterized at pipeline execution time.
Screenshots or logs
Error logs
│ /opt/conda/lib/python3.12/site-packages/sagemaker/train/utils.py:191 in safe_serialize │
│ │
│ 188 │ try: │
│ 189 │ │ return json.dumps(data) │
│ 190 │ except TypeError: │
│ ❱ 191 │ │ return str(data) │
│ 192 │
│ 193 │
│ 194 def _run_clone_command_silent(repo_url, dest_dir): │
│ │
│ /opt/conda/lib/python3.12/site-packages/sagemaker/core/helper/pipeline_variable.py:38 in __str__ │
│ │
│ 35 │ │
│ 36 │ def __str__(self): │
│ 37 │ │ """Override built-in String function for PipelineVariable""" │
│ ❱ 38 │ │ raise TypeError( │
│ 39 │ │ │ "Pipeline variables do not support __str__ operation. " │
│ 40 │ │ │ "Please use `.to_string()` to convert it to string type in execution time " │
│ 41 │ │ │ "or use `.expr` to translate it to Json for display purpose in Python SDK." │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Pipeline variables do not support __str__ operation. Please use `.to_string()` to convert it to string
type in execution time or use `.expr` to translate it to Json for display purpose in Python SDK.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 3.3.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): XGBoost
- Framework version: 1.0
- Python version: 3.12.9
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.