feature: support schema functions, to hit feature parity with web ser…#1859
feature: support schema functions, to hit feature parity with web ser…#1859JoshuaSBrown merged 29 commits intodevelfrom
Conversation
Reviewer's GuideAdds full client and CLI support for managing JSON-based metadata schemas in DataFed, including CRUD operations, search, validation utilities, and comprehensive end-to-end tests for schema workflows and schema-enforced records. Sequence diagram for schema create via CLIsequenceDiagram
actor User
participant CLI as CLI_schemaCommands
participant CommandLib as CommandLib
participant MAPI as MessageAPI
participant Server as DataFedServer
User->>CLI: datafed schema create ID --definition/--definition-file
CLI->>CLI: validate options (mutual exclusivity, required)
CLI->>CommandLib: schemaCreate(schema_id, definition, definition_file, description, public, system)
alt definition_file provided
CommandLib->>CommandLib: _load_schema_file(definition_file)
CommandLib-->>CommandLib: definition string
end
CommandLib->>CommandLib: _validate_json(definition, "Schema definition")
CommandLib->>MAPI: sendRecv(SchemaCreateRequest)
Note over CommandLib,MAPI: CommandLib sets SchemaCreateRequest.id and def fields
MAPI->>Server: SchemaCreateRequest
Server-->>MAPI: AckReply
MAPI-->>CommandLib: AckReply
CommandLib-->>CLI: AckReply
CLI->>CLI: _generic_reply_handler(_print_ack_reply)
CLI-->>User: display success or error
Sequence diagram for metadata validation via CLIsequenceDiagram
actor User
participant CLI as CLI_schemaCommands
participant CommandLib as CommandLib
participant MAPI as MessageAPI
participant Server as DataFedServer
User->>CLI: datafed schema validate SCHEMA_ID --metadata/--metadata-file
CLI->>CLI: validate options (mutual exclusivity, required)
CLI->>CommandLib: metadataValidate(schema_id, metadata, metadata_file)
alt metadata_file provided
CommandLib->>CommandLib: open(metadata_file), read()
CommandLib-->>CommandLib: metadata string
end
CommandLib->>CommandLib: _validate_json(metadata, "Metadata")
CommandLib->>MAPI: sendRecv(MetadataValidateRequest)
MAPI->>Server: MetadataValidateRequest
Server-->>MAPI: MetadataValidateReply(errors)
MAPI-->>CommandLib: MetadataValidateReply
CommandLib-->>CLI: MetadataValidateReply
CLI->>CLI: _generic_reply_handler(_print_metadata_validate)
CLI-->>User: show validation passed or errors
Class diagram for new schema-related API and CLI structuresclassDiagram
class CommandLib {
+schemaCreate(schema_id, definition, definition_file, description, public, system)
+schemaRevise(schema_id, definition, definition_file, description, public, system)
+schemaUpdate(schema_id, new_id, definition, definition_file, description, public, system)
+schemaView(schema_id, resolve)
+schemaSearch(schema_id, text, owner, sort, sort_rev, offset, count)
+schemaDelete(schema_id)
+metadataValidate(schema_id, metadata, metadata_file)
-_load_schema_file(filepath)
-_validate_json(json_str, label)
-_mapi
}
class CLICommands {
+_schema()
+_schemaView(schema_id, resolve)
+_schemaCreate(schema_id, definition, definition_file, description, public, system)
+_schemaRevise(schema_id, definition, definition_file, description, public, system)
+_schemaUpdate(schema_id, new_id, definition, definition_file, description, public, system)
+_schemaDelete(schema_id, force)
+_schemaSearch(schema_id, text, owner, sort, sort_rev, offset, count)
+_schemaValidate(schema_id, metadata, metadata_file)
-_capi
-_generic_reply_handler(reply_handler)
-_print_schema_listing(message)
-_print_schema(message)
-_print_metadata_validate(message)
}
class SchemaCreateRequest {
+id
+def
+desc
+pub
+sys
}
class SchemaReviseRequest {
+id
+def
+desc
+pub
+sys
}
class SchemaUpdateRequest {
+id
+id_new
+def
+desc
+pub
+sys
}
class SchemaViewRequest {
+id
+resolve
}
class SchemaSearchRequest {
+id
+text
+owner
+sort
+sort_rev
+offset
+count
}
class SchemaDeleteRequest {
+id
}
class MetadataValidateRequest {
+sch_id
+metadata
}
class SchemaDataReply {
+schema
+total
+offset
+count
}
class MetadataValidateReply {
+errors
}
CLICommands --> CommandLib : uses
CommandLib --> SchemaCreateRequest : creates
CommandLib --> SchemaReviseRequest : creates
CommandLib --> SchemaUpdateRequest : creates
CommandLib --> SchemaViewRequest : creates
CommandLib --> SchemaSearchRequest : creates
CommandLib --> SchemaDeleteRequest : creates
CommandLib --> MetadataValidateRequest : creates
SchemaViewRequest --> SchemaDataReply : response
SchemaSearchRequest --> SchemaDataReply : response
MetadataValidateRequest --> MetadataValidateReply : response
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 6 issues, and left some high level feedback:
- In CommandLib,
_load_schema_fileis defined at module scope but is called asself._load_schema_file(...)from instance methods, so it should be indented into the class (like_validate_json) or made a true instance method to avoid attribute errors. - In
test_api_record.py,test_record_with_schema_enforcementis defined at top level (no class indentation) but is added as a method ofTestDataFedPythonAPIRecordCRUDin the test suite, which will cause the suite construction to fail; the method definition needs to be indented into the class. - The CLI
schema reviseandschema updatecommands use--public/--systemas simple flags mapped topublic if public else None, which makes it impossible to explicitly clear a previously-true flag; consider separate--public/--no-public(and analogous system flags) or a tri-state option to allow toggling both ways.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In CommandLib, `_load_schema_file` is defined at module scope but is called as `self._load_schema_file(...)` from instance methods, so it should be indented into the class (like `_validate_json`) or made a true instance method to avoid attribute errors.
- In `test_api_record.py`, `test_record_with_schema_enforcement` is defined at top level (no class indentation) but is added as a method of `TestDataFedPythonAPIRecordCRUD` in the test suite, which will cause the suite construction to fail; the method definition needs to be indented into the class.
- The CLI `schema revise` and `schema update` commands use `--public/--system` as simple flags mapped to `public if public else None`, which makes it impossible to explicitly clear a previously-true flag; consider separate `--public/--no-public` (and analogous system flags) or a tri-state option to allow toggling both ways.
## Individual Comments
### Comment 1
<location> `python/datafed_pkg/datafed/CommandLib.py:3167` </location>
<code_context>
return opts
+
+def _load_schema_file(self, filepath):
+ """
+ Read a schema definition from a local JSON file
</code_context>
<issue_to_address>
**issue (bug_risk):** Make `_load_schema_file` a proper instance method on the class instead of a top-level function taking `self`.
Because `_load_schema_file` is defined at module scope but takes `self`, calls like `self._load_schema_file(...)` will fail with `AttributeError` since it’s not actually a bound method. Move it into the class body next to `_validate_json`, or remove the `self` parameter and call it as a standalone helper without `self.`.
</issue_to_address>
### Comment 2
<location> `python/datafed_pkg/datafed/CommandLib.py:683-687` </location>
<code_context>
+ if not metadata and not metadata_file:
+ raise Exception("Must specify either metadata or metadata_file.")
+
+ if metadata_file:
+ try:
+ f = open(metadata_file, "r")
+ metadata = f.read()
+ f.close()
+ except BaseException:
+ raise Exception(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Reuse the schema file loader and narrow the exception type when reading metadata files.
`metadataValidate` reimplements `_load_schema_file` and uses manual `open`/`close` plus `except BaseException`. Delegate to `_load_schema_file` (after making it a method), use a `with open(...)` context manager, and catch `OSError`/`IOError` instead. That removes duplication and avoids swallowing non-I/O exceptions like `KeyboardInterrupt` or `SystemExit`.
</issue_to_address>
### Comment 3
<location> `tests/end-to-end/test_api_record.py:266-275` </location>
<code_context>
+def test_record_with_schema_enforcement(self):
</code_context>
<issue_to_address>
**issue (testing):** New schema-enforcement test is not defined as a method on the test class, so it will not run correctly.
This test is defined at module scope but expects `self` and is referenced as a method of `TestDataFedPythonAPIRecordCRUD`. In this state, `unittest` will raise `AttributeError: type object 'TestDataFedPythonAPIRecordCRUD' has no attribute 'test_record_with_schema_enforcement'`, so it never runs. Please indent it into the appropriate test class and align its body indentation with the existing methods.
</issue_to_address>
### Comment 4
<location> `tests/end-to-end/test_api_schema.py:227-236` </location>
<code_context>
+ def test_schema_search(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Schema search tests don’t cover invalid sort options or the `sort_rev` constraint, which are important edge cases of the new API.
Please add tests that exercise the validation paths in `CommandLib.schemaSearch`, e.g.: one calling `schemaSearch(sort="bogus")` and asserting it raises `Exception("Invalid sort option.")`, and another calling `schemaSearch(sort="text", sort_rev=True)` and asserting it raises `Exception("Reverse sort option not available for text-relevance sorting.")`. This will lock in the guardrails and protect against regressions in option validation.
Suggested implementation:
```python
# Cleanup
self._df_api.schemaDelete(schema_id)
def test_schema_search(self):
"""Test schema search functionality."""
prefix = "test_search_schema"
schemas_to_cleanup = []
for i in range(3):
sid = "{}_{}".format(prefix, i)
schemas_to_cleanup.append(sid)
self._df_api.schemaCreate(
sid,
```
```python
self._df_api.schemaDelete(schema_id)
def test_schema_search_invalid_sort_option(self):
"""Schema search should reject unknown sort options."""
with self.assertRaisesRegex(Exception, r"Invalid sort option\."):
self._df_api.schemaSearch(sort="bogus")
def test_schema_search_text_sort_rev_not_allowed(self):
"""Schema search should reject sort_rev when using text relevance sorting."""
with self.assertRaisesRegex(
Exception,
r"Reverse sort option not available for text-relevance sorting\.",
):
self._df_api.schemaSearch(sort="text", sort_rev=True)
```
</issue_to_address>
### Comment 5
<location> `tests/end-to-end/test_api_schema.py:340-350` </location>
<code_context>
+
+ self.assertIn("not valid JSON", str(ctx.exception))
+
+ def test_metadata_validate_requires_input(self):
+ """Must provide metadata or metadata_file."""
+
+ with self.assertRaises(Exception) as ctx:
+ self._df_api.metadataValidate("any_schema")
+
+ self.assertIn("Must specify", str(ctx.exception))
+
+ def test_schema_create_from_file(self):
</code_context>
<issue_to_address>
**suggestion (testing):** You test missing metadata/metadata_file for metadataValidate, but not the file-open failure path for metadata_file.
The tests already cover valid/invalid metadata, malformed JSON, and the case where neither `metadata` nor `metadata_file` is provided. Please also add a test for the case where `metadata_file` is set but the file cannot be opened, asserting that `metadataValidate(schema_id, metadata_file="/path/does/not/exist")` raises the expected `"Could not open metadata file: {}"` error. This will ensure the file-based error behavior remains stable.
```suggestion
self.assertIn("not valid JSON", str(ctx.exception))
def test_metadata_validate_requires_input(self):
"""Must provide metadata or metadata_file."""
with self.assertRaises(Exception) as ctx:
self._df_api.metadataValidate("any_schema")
self.assertIn("Must specify", str(ctx.exception))
def test_metadata_validate_metadata_file_cannot_be_opened(self):
"""metadata_file set but file cannot be opened should raise expected error."""
bad_path = "/path/does/not/exist"
with self.assertRaises(Exception) as ctx:
self._df_api.metadataValidate("any_schema", metadata_file=bad_path)
# The client should surface a clear file-open error that includes the path.
self.assertIn("Could not open metadata file:", str(ctx.exception))
self.assertIn(bad_path, str(ctx.exception))
def test_schema_create_from_file(self):
```
</issue_to_address>
### Comment 6
<location> `tests/end-to-end/test_api_schema.py:130-142` </location>
<code_context>
+
+ self.assertIn("Must specify", str(ctx.exception))
+
+ def test_schema_create_both_definition_sources(self):
+ """Cannot specify both definition and definition_file."""
+
+ with self.assertRaises(Exception) as ctx:
+ self._df_api.schemaCreate(
+ "test_both_def",
+ definition='{"type": "object", "properties": {}}',
+ definition_file="/tmp/fake.json",
+ )
+
+ self.assertIn("Cannot specify both", str(ctx.exception))
+
+ def test_schema_update(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Similar mutual-exclusivity checks in schemaUpdate and schemaRevise are not covered by tests.
This test covers the mutual-exclusivity check for `schemaCreate` when both `definition` and `definition_file` are provided. To fully validate the new APIs, please add similar tests for `schemaUpdate` and `schemaRevise` that pass both parameters and assert they raise with the expected error message, so regressions in option handling are caught.
```suggestion
def test_schema_create_both_definition_sources(self):
"""Cannot specify both definition and definition_file."""
with self.assertRaises(Exception) as ctx:
self._df_api.schemaCreate(
"test_both_def",
definition='{"type": "object", "properties": {}}',
definition_file="/tmp/fake.json",
)
self.assertIn("Cannot specify both", str(ctx.exception))
def test_schema_update_both_definition_sources(self):
"""Cannot specify both definition and definition_file for schemaUpdate."""
with self.assertRaises(Exception) as ctx:
self._df_api.schemaUpdate(
"test_update_both_def",
definition='{"type": "object", "properties": {}}',
definition_file="/tmp/fake.json",
)
self.assertIn("Cannot specify both", str(ctx.exception))
def test_schema_revise_both_definition_sources(self):
"""Cannot specify both definition and definition_file for schemaRevise."""
with self.assertRaises(Exception) as ctx:
self._df_api.schemaRevise(
"test_revise_both_def",
definition='{"type": "object", "properties": {}}',
definition_file="/tmp/fake.json",
)
self.assertIn("Cannot specify both", str(ctx.exception))
def test_schema_update(self):
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def test_schema_search(self): | ||
| """Test schema search functionality.""" | ||
|
|
||
| prefix = "test_search_schema" | ||
| schemas_to_cleanup = [] | ||
|
|
||
| for i in range(3): | ||
| sid = "{}_{}".format(prefix, i) | ||
| schemas_to_cleanup.append(sid) | ||
| self._df_api.schemaCreate( |
There was a problem hiding this comment.
suggestion (testing): Schema search tests don’t cover invalid sort options or the sort_rev constraint, which are important edge cases of the new API.
Please add tests that exercise the validation paths in CommandLib.schemaSearch, e.g.: one calling schemaSearch(sort="bogus") and asserting it raises Exception("Invalid sort option."), and another calling schemaSearch(sort="text", sort_rev=True) and asserting it raises Exception("Reverse sort option not available for text-relevance sorting."). This will lock in the guardrails and protect against regressions in option validation.
Suggested implementation:
# Cleanup
self._df_api.schemaDelete(schema_id)
def test_schema_search(self):
"""Test schema search functionality."""
prefix = "test_search_schema"
schemas_to_cleanup = []
for i in range(3):
sid = "{}_{}".format(prefix, i)
schemas_to_cleanup.append(sid)
self._df_api.schemaCreate(
sid, self._df_api.schemaDelete(schema_id)
def test_schema_search_invalid_sort_option(self):
"""Schema search should reject unknown sort options."""
with self.assertRaisesRegex(Exception, r"Invalid sort option\."):
self._df_api.schemaSearch(sort="bogus")
def test_schema_search_text_sort_rev_not_allowed(self):
"""Schema search should reject sort_rev when using text relevance sorting."""
with self.assertRaisesRegex(
Exception,
r"Reverse sort option not available for text-relevance sorting\.",
):
self._df_api.schemaSearch(sort="text", sort_rev=True)…RNL/DataFed into 1857-DAPS-python-client-schema-support
22fe865 to
f5df7f2
Compare
* refactor: core server files refactored to support re-use in the mock implementation * refactor: changed mock core server to work with proto3 and reusue abstractions from core server. * fix: prevent defaults being set to undefined, and interpret numbers a… (#1861) * fix: prevent defaults being set to undefined, and interpret numbers and enums as strings. * chore: Auto-format JavaScript files with Prettier * fix: version numbers from proto3 messages follow camel case. (#1868) * docs: fix jsdoc error. * [DAPS-1862] - fix allocation change failure (#1864) * [DAPS-1663] Adding LogContext to dbGetRaw, Correlation_ID to dbMaintenance, metricThread and task_worker (#1885) Co-authored-by: Joshua S Brown <joshbro42867@yahoo.com> Co-authored-by: Joshua S Brown <brownjs@ornl.gov> * fix: mock_core server build (#1890) * [DAPS-1887] - refactor: facility fuse, remove dead code. (#1888) * [DAPS-1857] - feature: python client, tests, support schema functions, to hit feature parity with web ser… (#1859) * [DAPS-1855] - feature, core, add schema factory to decouple schema logic (#1891) * [DAPS-1896] - feature: foxx, schema format and type added to schema create API (#1897) * [DAPS-1893] - feature: common, proto3 add fields for schema type, format, and metadata format (#1894) * [DAPS-1830-1] - core foxx refactored json schema integration 1 (#1892) * [DAPS-1857-2] python client schema support (#1895) Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * [DAPS-1830-2] - core foxx refactored json schema integration (#1899) * [DAPS-1902] - core, common, feat: get schema api client into compliance with API spec file. (#1903) * [DAPS-1906-1] - feature: add core unit testing to CI (#1907) Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * [DAPS-1830-3] - core foxx refactored json schema integration 3 1827 (#1898) * [DAPS-1910] - upgrade: playwright 1.51.1 version. (#1911) * [DAPS-1914] - bug, web schema id name version mismatch (#1915) * [DAPS-1913] - refactor: foxx ci test scripts consolidate database name and default to different database for tests (#1917) * [DAPS-1916] - tests add integration test for schema client handler to cmake (#1918) Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> --------- Co-authored-by: JoshuaSBrown <brownjs@ornl.gov> Co-authored-by: Joshua S Brown <joshbro42867@yahoo.com> Co-authored-by: Austin Hampton <44103380+megatnt1122@users.noreply.github.com> Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
* [DAPS-1663] - feature: core, LogContext to dbGetRaw, Correlation_ID to dbMaintenance, metricThread and task_worker (#1885) * [DAPS-1890] - fix: test, mock_core server build (#1890) * [DAPS-1887] - refactor: facility fuse, remove dead code. (#1888) * [DAPS-1857] - feature: python client, tests, support schema functions, to hit feature parity with web ser… (#1859) * [DAPS-1855] - feature, core, add schema factory to decouple schema logic (#1891) * [DAPS-1896] - feature: foxx, schema format and type added to schema create API (#1897) * [DAPS-1893] - feature: common, proto3 add fields for schema type, format, and metadata format (#1894) * [DAPS-1830-1] - refactor: core foxx refactored json schema integration 1 (#1892) * [DAPS-1857-2] - feature: python client schema support (#1895) * [DAPS-1830-2] - refactor: core foxx refactored json schema integration (#1899) * [DAPS-1902] - feature: core, common, get schema api client into compliance with API spec file. (#1903) * [DAPS-1906-1] - feature: add core unit testing to CI (#1907) * [DAPS-1830-3] - refactor: core foxx refactored json schema integration 3 1827 (#1898) * [DAPS-1910] - upgrade: playwright 1.51.1 version. (#1911) * [DAPS-1914] - fix: web, bug in schema id name version mismatch (#1915) * [DAPS-1913] - refactor: foxx ci test scripts consolidate database name and default to different database for tests (#1917) * [DAPS-1916] - tests: add integration test for schema client handler to cmake (#1918) * [DAPS-1912] - refactor: register linkml storage engine with schema handler (#1912) * [DAPS-1919] - feature: python, web support linkml schema (#1921) * [DAPS-1923] - update: version numbers bumped. (#1923) Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Co-authored-by: Austin Hampton <amh107@latech.edu> Co-authored-by: Blake Nedved <blakeanedved@gmail.com> Co-authored-by: Polina Shpilker <infinite.loopholes@gmail.com> Co-authored-by: JoshuaSBrown <brownjs@ornl.gov>
…ver.
Ticket
Description
How Has This Been Tested?
Artifacts (if appropriate):
Tasks
Summary by Sourcery
Add client and CLI support for managing metadata schemas and validating record metadata against schemas, including tests and end-to-end coverage.
New Features:
Enhancements:
Tests: