Skip to content

Feat: Introduce query layer for launch-event search across NDJSON, gzip, and CLP inputs. #1

@Bill-hbrhbr

Description

@Bill-hbrhbr

Background:

Right now, kernel and launch-event querying lives under the info subcommand of the TritonParse CLI, implemented in tritonparse/info.

In the current search flow, launch events are first narrowed by kernel name, then iterated over to check whether each event matches the conditions from the --args-list query. For each matching launch, the output reports its per-kernel launch ID, its event index within the full trace, and its recorded launch grid:

  ...
  id=1054  line  1058  grid=[1]
  id=1055  line  1059  grid=[1]
  id=1056  line  1060  grid=[1]
  id=1057  line  1061  grid=[1]
  ...

Areas for potential improvements:

  • Gzip archives must first be decompressed before they can be searched.
  • The current query format only supports equality conditions joined by AND, so it is less expressive than a SQL-style or KQL-style query.
  • Test coverage is uneven: the info subcommand is covered by test_info_cli.py and test_kernel_query.py, but there is currently no test coverage for --args-list filtering.
  • The current search output is not schema-aware: it supports filtering launches, but does not project or display event fields from the matched launch records, especially those referenced in the query.

Proposed next steps (PR scope):

  • Introduce a dedicated Python query class, such as KqlQuery or LaunchQuery, that can be shared between TritonParse and clp-ffi.
  • Add a temporary adapter that translates --args-list into this query object. CLP archives could then be searched through the query object, while NDJSON and gzip inputs can continue using the existing filtering path.
  • Alternatively, the CLI could also accept KQL-style query input for CLP archives. This should be mutually exclusive with --args-list.
  • Tests should be added to ensure both paths produce identical search results when the query is strictly translatable from --args-list.

Longer-term considerations:
Both query parsing and execution could be consolidated into a shared query layer rather than being split across cli.py and kernel_query.py.
Eventually, the advantages of using CLP for storage would be:

  • Unlike gzip, archive search would not require decompression.
  • Query parsing and execution are already abstracted by the CLP engine.
  • More expressive queries are supported beyond simple equality, including timestamp ranges, logical operators, pattern matching, and projection.
    So TritonParse could focus on defining the launch-event schema and higher-level query semantics.

Finally, one important question is whether there are any plans to support querying directly within the TritonParse UI?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions