Prerequisite
Environment
OrderedDict([
('sys.platform', 'linux'),
('Python', '3.11.14'),
('CUDA available', True),
('GPU 0,1', 'NVIDIA GeForce RTX 3090'),
('CUDA_HOME', '/usr/local/cuda'),
('NVCC', 'Cuda compilation tools, release 12.6, V12.6.85'),
('GCC', '9.4.0'),
('PyTorch', '2.1.0+cu121'),
('TorchVision', '0.16.0+cu121'),
('OpenCV', '4.11.0'),
('MMEngine', '0.10.7'),
])
Reproduces the problem - code sample
🧩 Minimal config to reproduce
vis_backends = [
dict(type='LocalVisBackend'),
dict(
type='MLflowVisBackend',
tracking_uri="http://localhost:2222",
exp_name='vitdet_coco',
run_name=None,
),
]
visualizer = dict(
type='DetLocalVisualizer',
vis_backends=vis_backends,
name='visualizer',
)
default_hooks = dict(
logger=dict(type='LoggerHook', interval=50),
visualization=dict(
type='DetVisualizationHook',
draw=True,
interval=1,
),
)
During validation, DetVisualizationHook calls:
visualizer.add_image('val_img', image, step)
- LocalVisBackend → saves
val_img_{step}.png ✅
- MLflowVisBackend → passes
val_img to mlflow.log_image() ❌
Below is a minimal end-to-end reproduction using a tiny COCO subset so it runs quickly, but still triggers a validation visualization step and crashes in MLflowVisBackend.add_image() due to an extension-less image name.
1) Prerequisites
2) Create a minimal debug config
Save this as configs/_debug/mlflow_visbackend_no_ext_repro.py:
# Repro config: triggers MLflowVisBackend crash when name has no extension
_base_ = ['./vitdet_mask-rcnn_vit-b-dinov3.py'] # any base detector config that runs on COCO
# Make it fast: 10 train iters then run val once.
train_cfg = dict(type="IterBasedTrainLoop", max_iters=10, val_interval=10)
# Ensure validation is very small (2 iters) but still calls visualization hook.
train_dataloader = dict(
dataset=dict(
indices=64,
# avoid empty dataset when taking first N
filter_cfg=dict(filter_empty_gt=False, min_size=0),
)
)
val_dataloader = dict(dataset=dict(indices=2))
test_dataloader = val_dataloader
# Enable visualization every iter.
default_hooks = dict(
logger=dict(type='LoggerHook', interval=1),
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=10, save_last=True),
visualization=dict(
type='DetVisualizationHook',
draw=True,
interval=1,
# show=False is default; when show=False, DetVisualizationHook uses an extension-less name
),
)
# Use both LocalVisBackend (works) and MLflowVisBackend (crashes).
vis_backends = [
dict(type='LocalVisBackend'),
dict(
type='MLflowVisBackend',
tracking_uri="http://localhost:2222",
exp_name='vitdet_coco',
run_name=None,
),
]
visualizer = dict(type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')
Reproduces the problem - command or script
3) Run training
python tools/train.py configs/_debug/mlflow_visbackend_no_ext_repro.py
Reproduces the problem - error message
Traceback (most recent call last):
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/PIL/Image.py", line 2526, in save
format = EXTENSION[ext]
~~~~~~~~~^^^^^
KeyError: ''
The above exception was the direct cause of the following exception:
...
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/mmengine/visualization/vis_backend.py", line 784, in add_image
self._mlflow.log_image(image, name)
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/mlflow/tracking/fluent.py", line 1473, in log_image
MlflowClient().log_image(run_id, image, artifact_file, key, step, timestamp, synchronous)
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/mlflow/tracking/client.py", line 2797, in log_image
image.save(tmp_path)
File "/home/kpysanyi/xraivision-backbone/.venv/lib/python3.11/site-packages/PIL/Image.py", line 2529, in save
raise ValueError(msg) from e
ValueError: unknown file extension:
Additional information
🔍 Root cause
Inconsistency between backends:
LocalVisBackend.add_image() forces a valid filename
MLflowVisBackend.add_image() assumes name already includes an extension
step argument is ignored in MLflow backend
This makes MLflowVisBackend fragile and incompatible with existing hooks.
✅ Proposed solutions
Either of the following would fix the issue cleanly:
Option A (configurable)
Add an argument to MLflowVisBackend, e.g.
which would automatically transform names like val_img → val_img_{step}.png.
Option B (default behavior)
Make MLflowVisBackend.add_image() mirror LocalVisBackend behavior:
- If
name has no extension, append _{step}.png by default.
Example logic:
if '.' not in os.path.basename(name):
name = f'{name}_{step}.png'
self._mlflow.log_image(image, name)
This would:
- Make behavior consistent across backends
- Respect the existing
step argument
- Avoid hard-to-debug runtime crashes
I’d be happy to submit a PR implementing this fix (either as a default behavior or a configurable option), if that aligns with the maintainers’ preferences.
Prerequisite
Environment
Reproduces the problem - code sample
🧩 Minimal config to reproduce
During validation,
DetVisualizationHookcalls:val_img_{step}.png✅val_imgtomlflow.log_image()❌Below is a minimal end-to-end reproduction using a tiny COCO subset so it runs quickly, but still triggers a validation visualization step and crashes in
MLflowVisBackend.add_image()due to an extension-less image name.1) Prerequisites
MMDetection checkout with COCO available at
data/coco/:data/coco/annotations/instances_train2017.jsondata/coco/annotations/instances_val2017.jsondata/coco/train2017/*.jpgdata/coco/val2017/*.jpgMLflow tracking server running locally (example):
2) Create a minimal debug config
Save this as
configs/_debug/mlflow_visbackend_no_ext_repro.py:Reproduces the problem - command or script
3) Run training
Reproduces the problem - error message
Additional information
🔍 Root cause
Inconsistency between backends:
LocalVisBackend.add_image()forces a valid filenameMLflowVisBackend.add_image()assumesnamealready includes an extensionstepargument is ignored in MLflow backendThis makes
MLflowVisBackendfragile and incompatible with existing hooks.✅ Proposed solutions
Either of the following would fix the issue cleanly:
Option A (configurable)
Add an argument to
MLflowVisBackend, e.g.which would automatically transform names like
val_img→val_img_{step}.png.Option B (default behavior)
Make
MLflowVisBackend.add_image()mirrorLocalVisBackendbehavior:namehas no extension, append_{step}.pngby default.Example logic:
This would:
stepargumentI’d be happy to submit a PR implementing this fix (either as a default behavior or a configurable option), if that aligns with the maintainers’ preferences.