Skip to content

Conversation

@ryuwd
Copy link
Contributor

@ryuwd ryuwd commented Jan 21, 2026

Summary

Adds ReplicaMap, a Pydantic model for mapping Logical File Names (LFNs) to their physical replicas across distributed storage elements.

Description

ReplicaMap provides a structured, validated representation of file replica information intended to be stored in JSON format. It serves as a more user-friendly replacement for Pool XML Catalog.

Key features:

  • LFN/PFN validation: Automatic stripping of LFN: and PFN: prefixes, with path validation
  • Checksum support: Adler-32 (8 hex chars) and GUID (UUID format) checksums with format validation
  • Storage element tracking: Each replica is associated with a storage element identifier
  • Optional metadata: File size in bytes and checksum information

Example usage:

from diracx.core.models.replica_map import ReplicaMap

replica_map = ReplicaMap(root={
    "/lhcb/MC/2024/file.dst": {
        "replicas": [
            {"url": "https://storage1.cern.ch/file.dst", "se": "CERN-DST"},
            {"url": "https://storage2.in2p3.fr/file.dst", "se": "IN2P3-DST"},
        ],
        "size_bytes": 1048576,
        "checksum": {"adler32": "788c5caa"},
    }
})

@ryuwd ryuwd changed the title feat (core): added replica catalog to core feat (core): added replica catalog Jan 21, 2026
@read-the-docs-community
Copy link

read-the-docs-community bot commented Jan 21, 2026

Documentation build overview

📚 diracx | 🛠️ Build #31176956 | 📁 Comparing 39bb470 against latest (a0b3053)


🔍 Preview build

Show files changed (1 files in total): 📝 1 modified | ➕ 0 added | ➖ 0 deleted
File Status
admin/explanations/configuration/index.html 📝 modified

@chrisburr
Copy link
Member

I think we need to decide on the layout of core as replica_catalogue feels to specific for the top level

@aldbr
Copy link
Contributor

aldbr commented Jan 22, 2026

I think we need to decide on the layout of core as replica_catalogue feels to specific for the top level

I guess that, at some point (may be now?), it would make sense to have a core/models directory that would contain a module per type of models (instead of having all our models in the same models.py module). Example:

  • core/models/:
    • auth.py
    • jobs.py
    • sandbox.py
    • metadata.py
    • search.py
    • and replica_catalog.py

Any opinion?

@fstagni
Copy link
Contributor

fstagni commented Jan 22, 2026

I think we need to decide on the layout of core as replica_catalogue feels to specific for the top level

I guess that, at some point (may be now?), it would make sense to have a core/models directory that would contain a module per type of models (instead of having all our models in the same models.py module). Example:

  • core/models/:

    • auth.py
    • jobs.py
    • sandbox.py
    • metadata.py
    • search.py
    • and replica_catalog.py

Any opinion?

It makes sense to me

@ryuwd ryuwd changed the title feat (core): added replica catalog feat (core): added replica catalog / refactor: diracx.core.models into a package Jan 22, 2026
@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch 3 times, most recently from 6a595bb to 4af54f4 Compare January 22, 2026 13:07
@fstagni
Copy link
Contributor

fstagni commented Jan 22, 2026

I would split the refactoring in a separate PR

@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch from 4af54f4 to fd28f0f Compare January 22, 2026 13:21
@ryuwd ryuwd changed the title feat (core): added replica catalog / refactor: diracx.core.models into a package feat (core): added replica catalog Jan 22, 2026
@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch from 04dd4d1 to 1ea1fb2 Compare January 22, 2026 13:26
@ryuwd
Copy link
Contributor Author

ryuwd commented Jan 22, 2026

Will rebase after #746

@aldbr aldbr linked an issue Jan 22, 2026 that may be closed by this pull request
1 task
@chrisburr chrisburr force-pushed the roneil-replica-catalog-json branch from 1ea1fb2 to 9506a40 Compare January 23, 2026 07:59
@chaen
Copy link
Contributor

chaen commented Jan 23, 2026

Historically, the DFC (before LFC) has always been referred to as the Replica Catalog. Do you think you could come up with an alternative name (reflecting for example that it's a local file) ? Otherwise it's not dramatic :-)

@chrisburr
Copy link
Member

My first thought is replica mapping?

@ryuwd
Copy link
Contributor Author

ryuwd commented Jan 23, 2026

I'm happy with ReplicaMapping or ReplicaMap (map sounds nice)

test: add unit tests for ReplicaMap

feat: disallow relative paths with any depth other than zero
@ryuwd ryuwd force-pushed the roneil-replica-catalog-json branch from 9506a40 to 2b54d75 Compare January 24, 2026 15:43
@ryuwd ryuwd changed the title feat (core): added replica catalog feat (core): added ReplicaMap Jan 24, 2026
if "/" in value:
raise ValueError(
"LFN must be an absolute path starting with '/' "
"or have no slashes at all (e.g. refers to a file in the current working directory)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for no slash at all ?
Even if we were to support working directory (which soulds like a bad idea at first sight), that would be more of a URL thing than an LFN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It deals with a Gaudi behaviour: #741 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I had missed it ! It makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess multiple steps in a job e.g. the HLT1 needs to be given a mapping that includes the output of Boole earlier in the job.

root={
"/lhcb/MC/2024/file.dst": {
"replicas": [
{"url": "https://storage1.cern.ch/file.dst", "se": "CERN-DST"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for local file, I wonder if we should enforce the file:// URI schema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. Plus it's easier to add than remove if we change our mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

if "/" in value:
raise ValueError(
"LFN must be an absolute path starting with '/' "
"or have no slashes at all (e.g. refers to a file in the current working directory)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I had missed it ! It makes sense

@aldbr aldbr merged commit 82e7d15 into DIRACGrid:main Jan 29, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Implement a "JSON catalog" input data resolution format

5 participants