Summary
During a large‑scale MetaCat stress test (100k files) performed on dtucker@fifeutilgpvm03.fnal.gov, I generated JSON sidecar metadata files that MetaCat accepted without issue. However, when attempting to ingest the same files via declad’s dropbox mechanism on hypotpro@fermicloud848.fnal.gov, declad rejected them.
To proceed, I had to rewrite all JSON sidecars into a different structure — one that declad accepts but MetaCat does not require when ingesting directly.
This indicates a mismatch between:
- the JSON structure MetaCat accepts directly, and
- the JSON structure declad requires before it will ingest and forward metadata to MetaCat.
This mismatch caused ingestion failures and required a script to rewrite the JSON metadata files.
Environment
MetaCat stress test environment
Host: dtucker@fifeutilgpvm03.fnal.gov
MetaCat version: 4.1.?
JSON sidecars accepted directly by MetaCat
declad ingestion environment
- Host:
hypotpro@fermicloud848.fnal.gov
declad version: 2.3.8
declad dropbox ingestion using /home/hypotpro/declad_848/declad_config.yaml
declad rejected the original JSON sidecars
Example of the Mismatch
1. JSON sidecar that MetaCat accepted directly
This file (/home/dtucker/WORK/GitHub/MetacatStressTest/python/synthetic_minimal_n100000/data_ffff9d76-0e95-4062-81a5-edd7d0279791.parquet.json.orig) is representative of the structure used during the MetaCat stress test:
{
"dh.type": "other",
"fn.configuration": "c20240226",
"fn.description": "metacat_stress_test_20260216_5",
"fn.format": "txt",
"fn.owner": "dtucker",
"fn.tier": "etc",
"rs.runs": [
1000002
]
}
MetaCat accepted this structure without requiring additional top‑level fields.
2. Equivalent JSON sidecar required by declad
To make declad ingest the same file, I had to rewrite the JSON into the following structure (see /home/dtucker/WORK/GitHub/MetacatStressTest/python/synthetic_minimal_n100000/data_ffff9d76-0e95-4062-81a5-edd7d0279791.parquet.json:
{
"name": "data_ffff9d76-0e95-4062-81a5-edd7d0279791.parquet",
"namespace": "hypotpro",
"size": 6967,
"checksums": {
"adler32": "b9a2d1fc"
},
"metadata": {
"dh.type": "other",
"fn.configuration": "c20240226",
"fn.description": "metacat_stress_test_20260216_5",
"fn.format": "txt",
"fn.owner": "hypotraw",
"fn.tier": "etc",
"rs.runs": [
1000002
]
}
}
Key differences required by declad:
- Mandatory top‑level fields:
-- name
-- namespace
-- size
-- checksums
- All metadata must be nested under "metadata"
- Dot‑notation keys must be preserved inside "metadata"
rs.runs must be a list, not a scalar
- Missing or differently‑named fields cause declad to reject the file silently or stall
Possible Actions (suggested by Microsoft Copilot)
- Document the required JSON structure for declad dropbox ingestion, including:
-- required top‑level fields
-- required nesting under "metadata"
-- required checksum formats
-- required run‑number fields
- Clarify whether declad should accept the same JSON structure that MetaCat accepts directly, or whether the two systems are intentionally different.
- Improve error reporting when JSON sidecars are malformed or missing required fields.
- (Optional) Add a validation mode to declad that checks JSON sidecars and reports structural issues before ingestion.
(Gory details can be found in this Microsoft Copilot conversation: link )
Summary
During a large‑scale MetaCat stress test (100k files) performed on
dtucker@fifeutilgpvm03.fnal.gov, I generated JSON sidecar metadata files thatMetaCataccepted without issue. However, when attempting to ingest the same files viadeclad’s dropbox mechanism onhypotpro@fermicloud848.fnal.gov,decladrejected them.To proceed, I had to rewrite all JSON sidecars into a different structure — one that
decladaccepts butMetaCatdoes not require when ingesting directly.This indicates a mismatch between:
This mismatch caused ingestion failures and required a script to rewrite the JSON metadata files.
Environment
MetaCat stress test environment
Host:
dtucker@fifeutilgpvm03.fnal.govMetaCatversion: 4.1.?JSON sidecars accepted directly by
MetaCatdeclad ingestion environment
hypotpro@fermicloud848.fnal.govdecladversion: 2.3.8decladdropbox ingestion using/home/hypotpro/declad_848/declad_config.yamldecladrejected the original JSON sidecarsExample of the Mismatch
1. JSON sidecar that MetaCat accepted directly
This file (
/home/dtucker/WORK/GitHub/MetacatStressTest/python/synthetic_minimal_n100000/data_ffff9d76-0e95-4062-81a5-edd7d0279791.parquet.json.orig) is representative of the structure used during the MetaCat stress test:MetaCataccepted this structure without requiring additional top‑level fields.2. Equivalent JSON sidecar required by
decladTo make declad ingest the same file, I had to rewrite the JSON into the following structure (see
/home/dtucker/WORK/GitHub/MetacatStressTest/python/synthetic_minimal_n100000/data_ffff9d76-0e95-4062-81a5-edd7d0279791.parquet.json:Key differences required by
declad:--
name--
namespace--
size--
checksumsrs.runsmust be a list, not a scalarPossible Actions (suggested by Microsoft Copilot)
-- required top‑level fields
-- required nesting under "metadata"
-- required checksum formats
-- required run‑number fields
(Gory details can be found in this Microsoft Copilot conversation: link )