Skip to content

feat:Implement perf event groups, scaled reads, and group snapshots#22

Open
SiyuanSun0736 wants to merge 3 commits into
multikernel:mainfrom
SiyuanSun0736:perf-group
Open

feat:Implement perf event groups, scaled reads, and group snapshots#22
SiyuanSun0736 wants to merge 3 commits into
multikernel:mainfrom
SiyuanSun0736:perf-group

Conversation

@SiyuanSun0736
Copy link
Copy Markdown
Contributor

Overview

This PR introduces the ability to group multiple perf metrics (e.g., cache misses, branch misses, cycles) into a single scheduling group. This ensures that counters observing the same workload are started and stopped together, solving the issue of misaligned results from independently managed counters.

Additionally, it brings comprehensive multiplex-aware read APIs, static PMU slot limit validations, and fixes several internal userspace codegen edges to stabilize snapshot data consumption.

Key Features & User-Facing Changes

1. High-Level Grouping API

  • New group field: Added a high-level group field in perf_options to easily attach members to a leader.
var cache = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses }, 0)
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, group: cache }, 0)

  • Compatibility: The lower-level group_fd: leader.perf_fd approach is preserved for backward compatibility.

2. Multiplex-Aware Read APIs

  • read(att): Now returns scaled values by default, corrected via time_enabled / time_running when PMU multiplexing occurs. (Matches raw count if no multiplexing happens).
  • read_raw(att): Returns the uncorrected, raw counter values.
  • read_details(att): Returns a struct containing raw, scaled, time_enabled, and time_running—ideal for manual delta or rate calculations.
  • read_group(leader): Captures an atomic snapshot of the entire group. Returns up to 16 ID/Value pairs (where values[] are pre-scaled according to snapshot timing) and snapshot time fields.

3. Group Lifecycle Management

  • Group Restarts: Dynamically attaching a new member to an existing active group now triggers a disable/reset/enable sequence on the whole group, ensuring counters start from zero together.
  • Cascading Detach: Detaching a group leader no longer conservatively rejects the operation. It now cascades and automatically detaches all active members.

4. Compile-Time PMU Slot Validation

  • Statically visible perf groups are now evaluated during the type-checking phase to calculate hardware PMU slot consumption.
  • Compilation will fail early if the group is too large. The limit defaults to 4 (or dynamically probes sysfs), and can be overridden via the KERNELSCRIPT_PERF_GROUP_MAX_EVENTS environment variable.
  • perf_type_software and perf_type_tracepoint are correctly excluded from hardware PMU slot counts.

Internal & Codegen Improvements

  • Array IR Lowering: Fixed array indexing and dereferencing in IR lowering to ensure user-space C code generates correctly when iterating over read_group() snapshot arrays (snapshot.ids[i] / snapshot.values[i]).
  • Array Initialization: Modified non-literal array initializations to "declare first, then memcpy", preventing invalid C generation from snapshot struct fields.
  • Variable Declarations: Fixed an issue where reused for loop counters and subsequent variables of the same name produced duplicate function-level C declarations.
  • Read Helpers: Added raw/details/group perf read helpers, leveraging 128-bit intermediate values for safe multiplex scaling.

Documentation & Examples

  • examples/perf_cache_miss.ks: Refactored to use the new group API. Added demonstrations of read_details() for rate calculation and read_group() for iterating through snapshot id/value pairs.
  • examples/perf_page_fault.ks: Extended to demonstrate updated perf read semantics.
  • Docs: Updated README.md, SPEC.md, and BUILTINS.md to reflect group semantics, read interfaces, and PMU slot constraints.

Test Coverage

  • Added IR and codegen assertions for both group_fd and high-level group paths.
  • Covered member-attach group restarts, ioctl generation, and cascading leader detaches.
  • Covered multiplex scaling fast/slow paths for read(), and helper generation for read_raw(), read_details(), and read_group().
  • Covered oversized static group validation during compilation.
  • Added regression tests for for loop counter variable reuse in userspace codegen.

- Introduced `group_fd` field in the perf options structure to allow
  attaching BPF programs to a group of perf events.
- Updated the `ks_open_perf_event` function to accept `group_fd` and
  handle group event management.
- Implemented helper functions for managing active members of perf
  event groups, ensuring that group leaders cannot be detached while
  active members exist.
- Enhanced the generated code to include necessary checks and
  structures for handling multiplexed perf events.
- Added tests to validate the new group management features and
  ensure correct code generation for group-related operations.
- Introduced functions to manage performance event groups, including detection of maximum events and validation of static groups.
- Added support for new performance read functions: `read_raw`, `read_details`, and `read_group`, along with their corresponding structures and handling in the code generation.
- Enhanced the type checker to validate performance event group attachments and ensure no cycles exist in group leader relationships.
- Updated userspace code generation to track usage of new performance read functions and manage group attachments.
- Added tests for new functionality, including validation of oversized static performance event groups and code generation for new read functions.
…erformance event groups; added snapshot index printing functionality; updated userspace code generation tests to verify variable reuse logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant