Skip to content

[DAPS-1857] python client schema support 2#1895

Merged
JoshuaSBrown merged 36 commits intodevelfrom
1857-DAPS-python-client-schema-support_2
Mar 24, 2026
Merged

[DAPS-1857] python client schema support 2#1895
JoshuaSBrown merged 36 commits intodevelfrom
1857-DAPS-python-client-schema-support_2

Conversation

@JoshuaSBrown
Copy link
Collaborator

@JoshuaSBrown JoshuaSBrown commented Mar 19, 2026

Description

This PR attempts to address concerns raised during code review.

Summary by Sourcery

Add full schema lifecycle support and validation to the Python client, CLI, and backend, including versioned schema IDs and metadata validation, with comprehensive end-to-end tests.

New Features:

  • Expose schema CRUD, search, and metadata validation operations in the Python CommandLib API.
  • Add schema management commands (create, revise, update, delete, search, view, validate) to the CLI.
  • Introduce client-side JSON validation helpers for schema definitions and metadata.
  • Return created and revised schema details from the server instead of simple acknowledgements.

Bug Fixes:

  • Correct schema router handling of versioned schema IDs and ensure IDs returned from create/view include version suffixes.
  • Fix schema deletion semantics to prevent deleting in-use or intermediate revisions and to run checks atomically in a transaction.
  • Align error messages and permission checks in schema delete/update paths.

Enhancements:

  • Refine backend schema API to parse and normalize schema IDs consistently, including separating ID and version components.
  • Improve logging around schema view/delete failures with richer context.
  • Document and work around protobuf 'def' field naming conflict in the Python client.

Tests:

  • Add extensive end-to-end tests for schema CRUD, search, visibility flags, and metadata validation, including error cases and file-based definitions.
  • Extend record end-to-end tests to cover schema-enforced record creation and updates.
  • Register new end-to-end schema test suite in CTest with login fixture dependencies.

JoshuaSBrown and others added 30 commits February 19, 2026 15:46
…RNL/DataFed into 1857-DAPS-python-client-schema-support
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Mar 19, 2026

Reviewer's Guide

Adds full schema management support across the Python client, CLI, server, and Foxx schema router, including JSON validation helpers, schema/metadata operations, stricter schema ID/version handling, richer replies for schema create/revise, and comprehensive end-to-end tests for schemas and schema-enforced records.

Sequence diagram for schema creation via CLI and server

sequenceDiagram
    actor User
    participant CLI as CLI_schema_commands
    participant CmdLib as CommandLib
    participant MAPI as MessageAPI
    participant Server as ClientWorker
    participant DBAPI as DatabaseAPI
    participant Foxx as schema_router
    participant DB as ArangoDB

    User->>CLI: datafed schema create ID --definition-file file.json
    CLI->>CmdLib: schemaCreate(schema_id, definition=None, definition_file, ...)
    CmdLib->>CmdLib: _load_schema_file()
    CmdLib->>CmdLib: _validate_json(definition, Schema definition)
    CmdLib->>MAPI: sendRecv(SchemaCreateRequest)
    MAPI->>Server: SchemaCreateRequest

    Server->>Server: procSchemaCreateRequest()
    Server->>DBAPI: schemaCreate(request, SchemaDataReply)

    DBAPI->>DBAPI: build JSON payload (id, def, desc, pub, sys)
    DBAPI->>Foxx: POST /schema/create

    Foxx->>Foxx: parseSchemaId(req.body.id)
    Foxx->>Foxx: stripSchemaIdVersion(obj) before save
    Foxx->>DB: g_db.sch.save(obj)
    DB-->>Foxx: stored schema (id, ver)
    Foxx->>Foxx: set sch.id = parsed.id:parsed.ver
    Foxx-->>DBAPI: JSON schema array

    DBAPI->>DBAPI: setSchemaDataReply(reply, result)
    DBAPI-->>Server: SchemaDataReply
    Server-->>MAPI: SchemaDataReply
    MAPI-->>CmdLib: SchemaDataReply
    CmdLib-->>CLI: SchemaDataReply
    CLI->>CLI: _generic_reply_handler(reply, _print_schema)
    CLI-->>User: Printed schema with definition and refs
Loading

Class diagram for updated Python client and C++ server schema APIs

classDiagram
    class CommandLib {
        +schemaCreate(schema_id, definition, definition_file, description, public, system)
        +schemaRevise(schema_id, definition, definition_file, description, public, system)
        +schemaUpdate(schema_id, new_id, definition, definition_file, description, public, system)
        +schemaView(schema_id, resolve)
        +schemaSearch(schema_id, text, owner, sort, sort_rev, offset, count)
        +schemaDelete(schema_id)
        +metadataValidate(schema_id, metadata, metadata_file)
        -_load_schema_file(filepath)
        -_validate_json(json_str, label)
        -_mapi
    }

    class MessageAPI {
        +sendRecv(message)
    }

    CommandLib --> MessageAPI : uses

    class ClientWorker {
        +procSchemaCreateRequest(a_uid, msg_request, log_context)
        +procSchemaReviseRequest(a_uid, msg_request, log_context)
        -m_db_client
    }

    class DatabaseAPI {
        +schemaCreate(request, reply, log_context)
        +schemaRevise(request, reply, log_context)
        +schemaView(a_id, result, log_context)
        +schemaUpdate(request, log_context)
    }

    ClientWorker --> DatabaseAPI : uses

    class SchemaCreateRequest {
        +string id
        +string def
        +string desc
        +bool pub
        +bool sys
    }

    class SchemaReviseRequest {
        +string id
        +string def
        +string desc
        +bool pub
        +bool sys
    }

    class SchemaUpdateRequest {
        +string id
        +string id_new
        +string def
        +string desc
        +bool pub
        +bool sys
    }

    class SchemaDataReply {
        +repeated SchemaData schema
    }

    class SchemaData {
        +string id
        +int ver
        +string own_id
        +string own_nm
        +bool pub
        +bool depr
        +int cnt
        +string desc
        +string def
        +repeated SchemaRef uses
        +repeated SchemaRef used_by
    }

    class SchemaRef {
        +string id
        +int ver
    }

    ClientWorker --> SchemaCreateRequest
    ClientWorker --> SchemaReviseRequest
    ClientWorker --> SchemaUpdateRequest
    ClientWorker --> SchemaDataReply
    DatabaseAPI --> SchemaCreateRequest
    DatabaseAPI --> SchemaReviseRequest
    DatabaseAPI --> SchemaUpdateRequest
    DatabaseAPI --> SchemaDataReply
Loading

File-Level Changes

Change Details Files
Introduce Python client schema and metadata validation APIs with JSON validation helpers and file loading utilities.
  • Add schemaCreate, schemaRevise, schemaUpdate, schemaView, schemaSearch, schemaDelete, and metadataValidate methods to the Python CommandLib API, including client-side JSON validation and file-based definition/metadata loading.
  • Implement _load_schema_file helper to read schema JSON from disk and _validate_json helper to validate JSON strings before sending to the server.
  • Handle protobuf fields named 'def' via getattr/setattr to work around Python reserved keyword limitations.
python/datafed_pkg/datafed/CommandLib.py
Add CLI commands for schema lifecycle and metadata validation, with formatted schema output helpers.
  • Register a new top-level 'schema' command group with subcommands for view, create, revise, update, delete, search, and validate, mapping options to the new Python API methods and enforcing mutually exclusive/required options.
  • Implement _print_schema_listing, _print_schema, and _print_metadata_validate to render schema listings, detailed schema info including JSON definitions, and validation results respecting output mode and verbosity.
python/datafed_pkg/datafed/CLI.py
Tighten Foxx schema router ID/version parsing and make schema CRUD responses consistently return versioned IDs.
  • Add parseSchemaId helper to validate and split schema IDs into base ID and version, ensuring at most one colon and an integer version, and stripSchemaIdVersion to remove a composited version from stored IDs.
  • Update schema create, revise, update, delete, and view routes to use parseSchemaId, stripSchemaIdVersion, and to set response schema.id fields to 'id:ver' while storing bare IDs and numeric versions in the database.
  • Wrap schema delete in an Arango transaction, keeping permission, usage, and revision constraints the same but making the operation atomic and improving error logging with extra error details.
core/database/foxx/api/schema_router.js
Change server schema create/revise to return SchemaDataReply populated from Foxx responses instead of a bare AckReply.
  • Update ClientWorker to process SchemaCreateRequest and SchemaReviseRequest as SchemaDataReply messages and pass the reply object through to DatabaseAPI.
  • Extend DatabaseAPI::schemaCreate and schemaRevise to accept a SchemaDataReply, call the Foxx endpoints, and populate the reply via setSchemaDataReply.
  • Adjust server-side tests to expect versioned schema IDs in responses from the schema router.
core/server/ClientWorker.cpp
core/server/DatabaseAPI.cpp
core/server/DatabaseAPI.hpp
core/database/foxx/tests/schema_router.test.js
Add end-to-end tests for schema CRUD, search, metadata validation, and schema-enforced record operations, and wire them into CTest.
  • Introduce test_api_schema.py to cover Python API schemaCreate/view/delete, update, revise, search, public flag, metadataValidate (success/failure, JSON and file error handling), and file-based schema creation, including explicit clean-up.
  • Extend record CRUD end-to-end tests to cover dataCreate/dataUpdate with schema_enforce on/off and pre-validation via metadataValidate, asserting expected failures and md_err_msg behavior.
  • Register the new end-to-end schema test in CMake, ensuring it runs under the login fixture like the other API tests.
tests/end-to-end/test_api_schema.py
tests/end-to-end/test_api_record.py
tests/end-to-end/CMakeLists.txt

Possibly linked issues

  • #[Feature] - Support schema CRUD from the users python client.: The PR implements schema create/view/update/delete and validation in the Python client and CLI as requested.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@JoshuaSBrown JoshuaSBrown self-assigned this Mar 19, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The new helper _load_schema_file in CommandLib.py is indented one level too deep (docstring and body are over-indented relative to the method definition), which will cause a syntax/indentation error; please fix the indentation so the body aligns under the def.
  • In test_api_schema.py, test_schema_create_both_definition_sources is declared twice in succession, which is invalid Python syntax and will prevent the test module from importing; remove the duplicate def line.
  • In schema_router.js the create route sets sch.id = parsed.id + ':' + parsed.ver, but the updated unit test expects id to include the actual stored version (e.g., ...:1); consider using sch.ver instead of parsed.ver when composing the returned id so the ID and version are consistent and match the tests.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new helper `_load_schema_file` in `CommandLib.py` is indented one level too deep (docstring and body are over-indented relative to the method definition), which will cause a syntax/indentation error; please fix the indentation so the body aligns under the `def`.
- In `test_api_schema.py`, `test_schema_create_both_definition_sources` is declared twice in succession, which is invalid Python syntax and will prevent the test module from importing; remove the duplicate `def` line.
- In `schema_router.js` the create route sets `sch.id = parsed.id + ':' + parsed.ver`, but the updated unit test expects `id` to include the actual stored version (e.g., `...:1`); consider using `sch.ver` instead of `parsed.ver` when composing the returned `id` so the ID and version are consistent and match the tests.

## Individual Comments

### Comment 1
<location path="core/database/foxx/api/schema_router.js" line_range="53-62" />
<code_context>
+function parseSchemaId(schId) {
</code_context>
<issue_to_address>
**issue (bug_risk):** parseSchemaId return value contradicts its own doc and downstream checks, and likely never returns `null` for `ver`.

JSDoc documents `ver` as `null` when no version suffix is present, but `parseSchemaId` returns `{ id: schId, ver: 0 }` when there is no colon. Call sites then check `parsed.ver === null` (e.g., revise/update/delete/view routes) to detect a missing version, which can never be true. Please align the implementation with the documented contract by returning `{ id: schId, ver: null }` when `colonCount === 0`, and update the create handler to interpret `null` as “no version provided” (enforcing `0` where appropriate). Right now the implementation contradicts the docstring and makes the `=== null` checks effectively dead code.
</issue_to_address>

### Comment 2
<location path="python/datafed_pkg/datafed/CommandLib.py" line_range="366-367" />
<code_context>
+
+        Returns
+        -------
+        msg : AckReply Google protobuf message
+            Response from DataFed
+
+        Raises
</code_context>
<issue_to_address>
**issue (bug_risk):** The documented reply type for schemaCreate/schemaRevise no longer matches the server-side implementation.

These docstrings still advertise an `AckReply`, but the server handlers now return `SchemaDataReply` populated via `DatabaseAPI::schemaCreate/schemaRevise`. Please update the documented return type, and double‑check all callers (e.g., CLI printing functions) to ensure they correctly handle `SchemaDataReply` or adjust the server to preserve the expected `AckReply` contract.
</issue_to_address>

### Comment 3
<location path="tests/end-to-end/test_api_schema.py" line_range="133" />
<code_context>
+
+        self.assertIn("Must specify", str(ctx.exception))
+
+    def test_schema_create_both_definition_sources(self):
+    def test_schema_create_both_definition_sources(self):
+        """Cannot specify both definition and definition_file."""
</code_context>
<issue_to_address>
**issue (bug_risk):** Duplicate `test_schema_create_both_definition_sources` definition will cause a syntax error and prevent the test module from running.

Only the second `def` has a body, so the first is a stray line. Remove the extra `def test_schema_create_both_definition_sources(self):` so the function is defined once, immediately followed by its docstring and body.
</issue_to_address>

### Comment 4
<location path="tests/end-to-end/test_api_schema.py" line_range="467-470" />
<code_context>
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_schema_search"))
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_schema_public_flag"))
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_metadata_validate_pass"))
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_metadata_validate_fail"))
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_metadata_validate_client_rejects_bad_json"))
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_metadata_validate_requires_input"))
+    suite.addTest(TestDataFedPythonAPISchemaCRUD("test_schema_create_from_file"))
+    runner = unittest.TextTestRunner()
+    result = runner.run(suite)
</code_context>
<issue_to_address>
**issue (testing):** The `test_metadata_validate_metadata_file_cannot_be_opened` test is defined but never added to the explicit test suite.

Because the `unittest.TestSuite` is built manually, `test_metadata_validate_metadata_file_cannot_be_opened` will never run unless it’s added to the suite. Please include it, e.g.:

```python
suite.addTest(TestDataFedPythonAPISchemaCRUD("test_metadata_validate_metadata_file_cannot_be_opened"))
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +366 to +367
msg : AckReply Google protobuf message
Response from DataFed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The documented reply type for schemaCreate/schemaRevise no longer matches the server-side implementation.

These docstrings still advertise an AckReply, but the server handlers now return SchemaDataReply populated via DatabaseAPI::schemaCreate/schemaRevise. Please update the documented return type, and double‑check all callers (e.g., CLI printing functions) to ensure they correctly handle SchemaDataReply or adjust the server to preserve the expected AckReply contract.

@JoshuaSBrown JoshuaSBrown added Component: Python API Relates to Python API Type: Test Related to unit or integration testing Priority: Low Lower priority work. labels Mar 19, 2026
@JoshuaSBrown JoshuaSBrown changed the title 1857 daps python client schema support 2 [DAPS-1857] python client schema support 2 Mar 23, 2026
@JoshuaSBrown JoshuaSBrown linked an issue Mar 23, 2026 that may be closed by this pull request
@JoshuaSBrown JoshuaSBrown merged commit c615fc7 into devel Mar 24, 2026
13 checks passed
@JoshuaSBrown JoshuaSBrown deleted the 1857-DAPS-python-client-schema-support_2 branch March 24, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Python API Relates to Python API Priority: Low Lower priority work. Type: Test Related to unit or integration testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] - Support schema CRUD from the users python client.

2 participants