[rocpdsna] Add v4 schema support with versioned writer#5347
[rocpdsna] Add v4 schema support with versioned writer#5347anujshuk-amd wants to merge 8 commits into
Conversation
## Summary Adds a GitHub Actions workflow that enforces formatting on the rocpdsna project. First in a planned series of CI/CD PRs against `sna-develop`. Three jobs run on every PR/push that touches `projects/rocpdsna/**`: | Job | Tool | Scope | |-----|------|-------| | `clang-format` | clang-format-18 | 77 files in `source/`, `include/`, `tests/` (`.cpp` `.hpp` `.h` `.cc`) | | `cmake-format` | gersemi 0.25.1 | 17 `CMakeLists.txt` and `*.cmake` files (excluding `build/`, `external/`, `_deps/`) | | `markdown-lint` | `DavidAnson/markdownlint-cli2-action@v10.0.1` | All `*.md` under `projects/rocpdsna/` | ## Technical Details - Triggers: `pull_request` (any target) + `push` to `develop` and `sna-develop`, scoped via path filters to `projects/rocpdsna/**`. Excludes `build/` and `docs/graphs/` (binary diagrams). - Concurrency cancellation: superseded runs cancelled on new push. - Sparse checkout: every job only fetches `projects/rocpdsna/` + `.github/workflows/` for speed. - Read-only permissions. - Includes one prep commit (`0aed2ea`) that formats `cmake/rocpdsna-config.cmake.in` with gersemi so the workflow is green from day one. Patterns follow existing `rocprofiler-systems-formatting.yml` (clang-format / gersemi job structure) and `hipfile-clang-format.yml` (focused single-purpose workflow naming). ## Test Plan - [x] Local `clang-format-18 -output-replacements-xml` on all 77 source files: 0 replacements - [x] Local `gersemi --check` on all 17 CMake files: clean (after the prep commit) - [x] YAML syntax validated (`python3 -c "import yaml; yaml.safe_load(...)"`) - [ ] First CI run on this PR confirms all 3 jobs pass
There was a problem hiding this comment.
Pull request overview
This PR extends rocpdsna to support both ROCPD schema v3 (3.0.0) and v4 (4.0.0) at runtime by introducing v4 writers, versioned schema initialization, and additional unit/benchmark coverage.
Changes:
- Added schema v4 writer implementation (policy + writers + common insert ops) and expanded writer API for v4-only info types.
- Implemented runtime schema selection (v3 vs v4) in the writer via a
std::variantcore and versionedsqlite_backend::initialize_schema(version). - Added/updated schema SQL (3.0.0 + 4.0.0) and expanded unit tests/benchmarks to exercise versioned behavior and FK toggling.
Reviewed changes
Copilot reviewed 57 out of 58 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/rocpdsna/tests/unit/database_test.cpp | Parameterized schema init tests for v3/v4; adds FK PRAGMA behavior tests |
| projects/rocpdsna/tests/unit/CMakeLists.txt | Adds new unit test sources and passes schema-selection compile defs |
| projects/rocpdsna/tests/benchmarks/writer_bench.cpp | Benchmarks now construct storage/writer with an explicit schema version and FK toggle |
| projects/rocpdsna/tests/benchmarks/writer_parallel_bench.cpp | Parallel benchmarks updated for versioned storage and FK toggle; expands event-count args |
| projects/rocpdsna/source/writers/schema_v4/writer_policy.hpp | Introduces v4 writer policy (schema tag + writer types + insert statements binding) |
| projects/rocpdsna/source/writers/schema_v4/region_writer.hpp | v4 region writer using track_id + timestamp IDs + event correlation |
| projects/rocpdsna/source/writers/schema_v4/kernel_dispatch_writer.hpp | v4 kernel dispatch writer using track_id + timestamp IDs |
| projects/rocpdsna/source/writers/schema_v4/memory_copy_writer.hpp | v4 memory copy writer using track_id + timestamp IDs |
| projects/rocpdsna/source/writers/schema_v4/memory_alloc_writer.hpp | v4 memory allocation writer using track_id + timestamp IDs and type/level validation |
| projects/rocpdsna/source/writers/schema_v4/pmc_event_writer.hpp | v4 PMC event writer; inserts event/sample via v4 normalized model |
| projects/rocpdsna/source/writers/schema_v4/info_registration_writer.hpp | v4 info registration including v4-only tables (category/address/source/pc) |
| projects/rocpdsna/source/writers/schema_v4/common_insert_operations.hpp | v4 common ops: timestamp table, track resolution, normalized call-stack/line-info materialization |
| projects/rocpdsna/source/writers/schema_v3/info_registration_writer.hpp | Adds v4+ registration API stubs (no-op / string-only) for v3 writers |
| projects/rocpdsna/source/writers/schema_v3/common_insert_operations.hpp | Extends maybe_insert_sample signature to pass extdata |
| projects/rocpdsna/source/writers/interfaces/info_registration_writer_interface.hpp | Adds v4+ public registration methods (category/address/source/pc) |
| projects/rocpdsna/source/writer_impl.hpp | Switches writer core to runtime-selected variant; exposes FK toggling and v4 APIs |
| projects/rocpdsna/source/writer_impl.cpp | Implements versioned schema init and policy selection (v3 vs v4) |
| projects/rocpdsna/source/writer.cpp | Plumbs new writer APIs and FK toggle to implementation |
| projects/rocpdsna/source/storage_impl.hpp | Adds storage impl ctor overload accepting schema version |
| projects/rocpdsna/source/storage_impl.cpp | Stores default schema version (3.0.0) and versioned ctor path |
| projects/rocpdsna/source/storage.cpp | Adds versioned storage_t constructor |
| projects/rocpdsna/source/primary_key_providers.hpp | Adds v4-only key providers (timestamp/category/callstack/lineinfo/address/source/pc) |
| projects/rocpdsna/source/entity_utility.hpp | Changes missing-key behavior to throw clearer runtime errors |
| projects/rocpdsna/source/entity_registry.hpp | Adds v4-only registries for category/address/source/pc |
| projects/rocpdsna/source/data_storage/schema_version.hpp | Adds schema_v4_tag for compile-time tagging |
| projects/rocpdsna/source/data_storage/schema_v3/insert_statements.hpp | Annotates/realigns v3 insert statement parameter lists for the versioned backend schema |
| projects/rocpdsna/source/data_storage/read_statements.hpp | Fixes include path to sqlite backend header |
| projects/rocpdsna/source/data_storage/backends/sqlite_backend.hpp | Makes schema init versioned; adds FK PRAGMA toggle API |
| projects/rocpdsna/source/data_storage/backends/sqlite_backend.cpp | Implements versioned schema loading, local v3/v4 schema selection, and FK PRAGMA toggle |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/rocpd_tables.sql | Adds v4 DDL with normalized timestamps/tracks/callstack/lineinfo and new info tables |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/rocpd_views.sql | Adds v4 “pass-through” views to uuid-suffixed tables |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/data_views.sql | Adds v4 joined/query-friendly views (tracks/events/kernels/mem/etc.) |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/summary_views.sql | Adds v4 summary views (top_kernels/busy/top) |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/rocpd_metadata.sql | Adds v4 metadata insertion script (schema version + creation time) |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/rocpd_indexes.sql | Adds v4 index DDL |
| projects/rocpdsna/source/data_storage/backends/schema/4.0.0/rocpd_shema.in | Template for generating v4 schema header constants |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/rocpd_tables.sql | Adds v3 DDL as versioned backend input |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/rocpd_views.sql | Adds v3 “pass-through” views |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/data_views.sql | Adds v3 joined/query-friendly views |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/summary_views.sql | Adds v3 summary views |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/rocpd_indexes.sql | Adds v3 index file (currently mostly commented) |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/marker_views.sql | Adds v3 marker views placeholder |
| projects/rocpdsna/source/data_storage/backends/schema/3.0.0/rocpd_shema.in | Template for generating v3 schema header constants |
| projects/rocpdsna/source/common/types.hpp | Modifies header prologue (copyright/SPDX removal) |
| projects/rocpdsna/source/common/string_conversions.hpp | Adds get_key/to_string for new v4 info structs |
| projects/rocpdsna/include/writer.hpp | Exposes v4-only registration APIs + SQLite FK toggle API |
| projects/rocpdsna/include/storage_types.hpp | Adds comparisons/helpers on version_t |
| projects/rocpdsna/include/storage.hpp | Adds versioned storage constructor |
| projects/rocpdsna/include/shared_types.hpp | Adds v4 mapping notes and extends line info entry with extdata |
| projects/rocpdsna/cmake/rocprofiler-sdk-rocpd.cmake | Generates schema headers per version; detects SDK schema-version API availability |
| projects/rocpdsna/CMakeLists.txt | Propagates ROCPD_SQL_HAS_SCHEMA_VERSION define based on SDK detection |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #if ROCPD_SQL_HAS_SCHEMA_VERSION | ||
| schema_kinds.push_back(ROCPD_SQL_SCHEMA_ROCPD_METADATA); | ||
| #else | ||
| // Old SDK has MARKER_VIEWS instead of METADATA | ||
| schema_kinds.push_back(ROCPD_SQL_SCHEMA_ROCPD_MARKER_VIEWS); | ||
| #endif |
There was a problem hiding this comment.
initialize_schema() unconditionally references ROCPD_SQL_SCHEMA_ROCPD_MARKER_VIEWS when ROCPD_SQL_HAS_SCHEMA_VERSION is 0, but the locally-defined rocpd_sql_schema_kind_t enum (used when USE_SCHEMA_FROM_ROCPROFILER_SDK_ROCPD=0) does not define ROCPD_SQL_SCHEMA_ROCPD_MARKER_VIEWS. This will not compile in the local-schema build. Also, versioned initialization should fail fast when a v4 schema is requested but the linked SDK API cannot select schema versions, to avoid creating a v3 schema while the writers use v4 insert statements.
| LEFT JOIN `rocpd_info_process` P ON P.pid = T.pid | ||
| AND P.guid = T.guid | ||
| LEFT JOIN `rocpd_info_thread` TH ON TH.tid = T.tid |
There was a problem hiding this comment.
In the v4 schema, rocpd_track.pid and rocpd_track.tid are foreign keys to rocpd_info_process.id and rocpd_info_thread.id (see the table DDL), but this view joins them against the OS pid/tid columns (P.pid and TH.tid). This breaks the tracks view (and every downstream view that uses it) unless DB primary keys happen to match OS ids. Join on the referenced primary key columns instead (e.g., P.id = T.pid, TH.id = T.tid).
| LEFT JOIN `rocpd_info_process` P ON P.pid = T.pid | |
| AND P.guid = T.guid | |
| LEFT JOIN `rocpd_info_thread` TH ON TH.tid = T.tid | |
| LEFT JOIN `rocpd_info_process` P ON P.id = T.pid | |
| AND P.guid = T.guid | |
| LEFT JOIN `rocpd_info_thread` TH ON TH.id = T.tid |
| `rocpd_info_code_object` CO | ||
| INNER JOIN `rocpd_info_agent` A ON CO.agent_id = A.id | ||
| AND CO.guid = A.guid | ||
| INNER JOIN `rocpd_info_process` P ON CO.pid = P.pid |
There was a problem hiding this comment.
The code_objects view joins rocpd_info_code_object.pid against rocpd_info_process.pid, but rocpd_info_code_object.pid is a foreign key to rocpd_info_process.id in the v4 DDL. This join will return incorrect results unless PK ids match OS pids. Join on the referenced PK (P.id).
| INNER JOIN `rocpd_info_process` P ON CO.pid = P.pid | |
| INNER JOIN `rocpd_info_process` P ON CO.pid = P.id |
| JSON_EXTRACT(KS.extdata, '$.kernel_address.handle') AS kernel_address | ||
| FROM | ||
| `rocpd_info_kernel_symbol` KS | ||
| INNER JOIN `rocpd_info_process` P ON KS.pid = P.pid |
There was a problem hiding this comment.
The kernel_symbols view joins rocpd_info_kernel_symbol.pid against rocpd_info_process.pid, but rocpd_info_kernel_symbol.pid is a foreign key to rocpd_info_process.id in the v4 DDL. This should join on P.id to preserve referential correctness.
| INNER JOIN `rocpd_info_process` P ON KS.pid = P.pid | |
| INNER JOIN `rocpd_info_process` P ON KS.pid = P.id |
| # Copyright(c) 2025 Advanced Micro Devices, Inc. All Rights Reserved. | ||
| # | ||
| # Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| # of this software and associated documentation files (the "Software"), to deal | ||
| # of this software and associated documentation files(the "Software"), to deal | ||
| # in the Software without restriction, including without limitation the rights | ||
| # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| # to use, copy, modify, merge, publish, distribute, sublicense, and / or sell | ||
| # copies of the Software, and to permit persons to whom the Software is |
There was a problem hiding this comment.
The MIT license header text was modified (e.g., Copyright(c), missing spaces in files(the, and and / or). License blocks should remain verbatim to avoid legal ambiguity and to keep consistency with other files.
| #include <cstddef> | ||
| #include <functional> | ||
| #include <string> |
There was a problem hiding this comment.
This header had its copyright/SPDX lines removed, which makes it inconsistent with other project headers and may violate the repository’s licensing/header requirements. Restore the standard header block (or apply whatever the project’s current header convention is) rather than deleting it.
| CREATE INDEX `rocpd_timestamp{{uuid}}_track_id_idx` ON `rocpd_timestamp{{uuid}}` ("track_id"); | ||
|
|
||
| -- CREATE INDEX `rocpd_kernel_dispatch{{uuid}}_guid_pid_tid_idx` ON `rocpd_kernel_dispatch{{uuid}}` ("guid", "pid", "tid"); | ||
| CREATE INDEX `rocpd_memory_copy{{uuid}}_guid_pid_tid_idx` ON `rocpd_memory_copy{{uuid}}` ("guid", "pid", "tid"); |
There was a problem hiding this comment.
This v4 index targets columns pid and tid on rocpd_memory_copy{{uuid}}, but the v4 table definition uses track_id and does not have pid/tid columns. Applying this schema will fail at init time when SQLite executes the index DDL.
| CREATE INDEX `rocpd_memory_copy{{uuid}}_guid_pid_tid_idx` ON `rocpd_memory_copy{{uuid}}` ("guid", "pid", "tid"); | |
| CREATE INDEX `rocpd_memory_copy{{uuid}}_guid_track_id_idx` ON `rocpd_memory_copy{{uuid}}` ("guid", "track_id"); |
| INNER JOIN `rocpd_info_process` P ON P.pid = T.pid | ||
| AND N.guid = T.guid |
There was a problem hiding this comment.
The threads view join condition references N.guid before N is joined, which makes the SQL invalid. Additionally, it joins P.pid = T.pid, but rocpd_info_thread.pid is a foreign key to rocpd_info_process.id in v4, so it should join on P.id (and P.guid = T.guid).
| INNER JOIN `rocpd_info_process` P ON P.pid = T.pid | |
| AND N.guid = T.guid | |
| INNER JOIN `rocpd_info_process` P ON P.id = T.pid | |
| AND P.guid = T.guid |
| sqlite3_open(m_database_path.c_str(), &raw); | ||
| sqlite3_prepare_v2( | ||
| raw, ("SELECT COUNT(*) FROM " + table).c_str(), -1, &stmt, nullptr); | ||
| if(sqlite3_step(stmt) == SQLITE_ROW) |
There was a problem hiding this comment.
count_rows() uses sqlite3_open / sqlite3_prepare_v2 / sqlite3_step without checking return codes. If any of these fail, the test can dereference a null/invalid statement handle and crash (masking the real failure). Add assertions on the SQLite return values (and handle stmt == nullptr) so failures are reported cleanly.
|
This pull request has been inactive for 25 days and will be marked as stale. If you would like to keep this PR open, please:
This PR will be automatically closed in 5 days if no further activity occurs. |
|
This pull request has been automatically closed due to inactivity (30 days with no updates). If you'd like to continue working on this, feel free to reopen the PR or create a new one. |
Motivation
Rocpdsna currently only supports the V3 (3.0.0) schema. As the profiling ecosystem evolves, a newer V4 (4.0.0) schema is needed to support runtime selection of schema version. Changes support...
Timestamp normalization: V4 replaces direct start/end BIGINT columns with foreign keys to a rocpd_timestamp table, enabling deduplication and cross-table timestamp correlation.
Track-based context: V4 consolidates per-row nid/pid/tid/agent_id/queue_id/stream_id columns into a single track_id FK, reducing redundancy and simplifying queries.
Normalized event metadata: V4 moves embedded JSONB call_stack and line_info out of rocpd_event into dedicated rocpd_call_stack and rocpd_line_info tables with proper FK relationships.
New V4-only tables : rocpd_info_category, rocpd_info_address_range, rocpd_info_source_code, and rocpd_info_pc support richer profiling data.
Runtime schema selection: Users can choose V3 or V4 at runtime without recompilation.
This PR implements the V4 schema writers, adds comprehensive test coverage for both schemas, and cleans up the codebase.
Technical Details
The rocpdsna depends on [rocpd] Schema Updates #347 PR, the latest version of rocprofiler-sdk schema changes.
JIRA ID
TBA
Test Plan
cmake --buildTest Result
rocprofiler-sdk-rocpdwith v4 schema support