Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion TESTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ DriverGetApiVersion|measures time spent in zeDriverGetApiVersion call on CPU.|<u
DriverGetProperties|measures time spent in zeDriverGetProperties call on CPU.|<ul></ul>|:heavy_check_mark:|:x:|
EnqueueNdrNullLws|measures time spent in clEnqueueNDRangeKernel on CPU. Null LWS is provided, which causes driver to calculate it|<ul><li>--event Pass output event to the enqueue call (0 or 1)</li><li>--gws Global work size</li><li>--ooq Use out of order queue (0 or 1)</li><li>--profiling Creating a profiling queue (0 or 1)</li></ul>|:x:|:heavy_check_mark:|
EnqueueNdrTime|measures time spent in clEnqueueNDRangeKernel on CPU.|<ul><li>--event Pass output event to the enqueue call (0 or 1)</li><li>--ooq Use out of order queue (0 or 1)</li><li>--profiling Creating a profiling queue (0 or 1)</li><li>--wgc Workgroup count</li><li>--wgs Workgroup size</li></ul>|:x:|:heavy_check_mark:|
EventCreation|measures time spent to create event|<ul><li>--eventCount Number of events to create</li><li>--hostVisible Event will set host visible flag (0 or 1)</li><li>--signal Type of signal scope (subdevice or device or host or none)</li><li>--useProfiling Event will use profiling (0 or 1)</li><li>--wait Type of wait scope (subdevice or device or host or none)</li></ul>|:heavy_check_mark:|:x:|
EventCreation|measures time spent to create event|<ul><li>--eventCount Number of events to create</li><li>--hostVisible Event will set host visible flag (0 or 1)</li><li>--signal Type of signal scope (subdevice or device or host or none)</li><li>--useCbe Use Counter Based Events (0 or 1)</li><li>--useProfiling Event will use profiling (0 or 1)</li><li>--wait Type of wait scope (subdevice or device or host or none)</li></ul>|:heavy_check_mark:|:x:|
EventQueryStatus|Measures time spent to query event status|<ul><li>--eventSignaled Event will be set as signaled (0 or 1)</li></ul>|:heavy_check_mark:|:x:|
ExecImmCopy|measures time spent in appending memory copy for immediate command list on CPU with Copy Queue.|<ul><li>--CopyOffload Enable driver copy offload (only valid for L0) (0 or 1)</li><li>--IsCopyOnly If true, Copy Engine is selected. If false, Compute Engine is selected (0 or 1)</li><li>--MeasureCompletionTime Measures time taken to complete the submission (default is to measure only Immediate call) (0 or 1)</li><li>--dst Placement of the destination buffer (Device or Host or Shared or non-USM-mapped or non-USMmisaligned or non-USM4KBAligned or non-USM2MBAligned or non-USMmisaligned-imported or non-USM4KBAligned-imported or non-USM2MBAligned-imported or non-USM)</li><li>--ioq Use In order queue (0 or 1)</li><li>--size Size of the buffer</li><li>--src Placement of the source buffer (Device or Host or Shared or non-USM-mapped or non-USMmisaligned or non-USM4KBAligned or non-USM2MBAligned or non-USMmisaligned-imported or non-USM4KBAligned-imported or non-USM2MBAligned-imported or non-USM)</li></ul>|:heavy_check_mark:|:x:|
ExecImmediate|measures time spent in appending launch kernel for immediate command list on CPU.|<ul><li>--BarrierSynchro Uses barrier synchronization instead of waiting for event from last kernel (0 or 1)</li><li>--CallsCount amount of calls that is being meassured</li><li>--EventSync If true, use events to synchronize with host. If false, use zeCommandListHostSynchronize (0 or 1)</li><li>--KernelExecTime How long a single kernel executes, in us</li><li>--MeasureCompletion Measures time taken to complete the submission (default is to measure only Immediate call) (0 or 1)</li><li>--Profiling Pass a profiling ze_event_t to the API call (0 or 1)</li><li>--ioq Use In order queue (0 or 1)</li></ul>|:heavy_check_mark:|:x:|
Expand All @@ -41,6 +41,7 @@ FlushTime|measures time spent in clEnqueueNDRangeKernel on CPU.|<ul><li>--event
GetMemoryProperties|measures time spent in zeMemGetAllocProperties on CPU when driver is queried for memory properties.|<ul><li>--AmountOfUsmAllocations Amount of USM allocations that are present in system</li></ul>|:heavy_check_mark:|:x:|
GetMemoryPropertiesWithModifiedAllocations|measures time spent in zeMemGetAllocProperties on CPU, when allocations are modified between each iteration.|<ul><li>--AmountOfUsmAllocations Amount of USM allocations that are present in system</li></ul>|:heavy_check_mark:|:x:|
GetMemoryPropertiesWithOffsetedPointer|measures time spent in zeMemGetAllocProperties on CPU when the pointer passed is an offset from the base address.|<ul><li>--AmountOfUsmAllocations Amount of USM allocations that are present in system</li></ul>|:heavy_check_mark:|:x:|
InOrderWaitAppend|Measures time spent to append wait on event command to in-order command list.|<ul><li>--counterBasedEvents Use Counter Based Events (0 or 1)</li></ul>|:heavy_check_mark:|:x:|
KernelSetArgumentValueImmediate|measures time spent in zeKernelSetArgumentValue for immediate arguments on CPU.|<ul><li>--argSize Kernel argument size in bytes</li><li>--differentValues Use different values for arguments each iteration (0 or 1)</li></ul>|:heavy_check_mark:|:x:|
LifecycleCommandList|measures time spent in zeCommandListCreate + Close + Execute on CPU.|<ul><li>--CmdListCount Number of cmdlists to create</li><li>--CopyOnly Create copy only cmdlist (0 or 1)</li></ul>|:heavy_check_mark:|:x:|
MemGetIpcHandle|measures time spent in zeMemGetIpcHandle on CPU.|<ul><li>--AmountOfUsmAllocations Amount of USM allocations that are present in system</li><li>--src Placement of the source buffer (Device or Host or Shared or non-USM-mapped or non-USMmisaligned or non-USM4KBAligned or non-USM2MBAligned or non-USMmisaligned-imported or non-USM4KBAligned-imported or non-USM2MBAligned-imported or non-USM)</li></ul>|:heavy_check_mark:|:x:|
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (C) 2022-2023 Intel Corporation
* Copyright (C) 2022-2026 Intel Corporation
*
* SPDX-License-Identifier: MIT
*
Expand All @@ -17,13 +17,15 @@ struct EventTimeArguments : TestCaseArgumentContainer {
EventScopeArgument signalScope;
EventScopeArgument waitScope;
Uint32Argument eventCount;
BooleanArgument counterBasedEvents;

EventTimeArguments()
: useProfiling(*this, "useProfiling", "Event will use profiling"),
hostVisible(*this, "hostVisible", "Event will set host visible flag"),
signalScope(*this, "signal", "Type of signal scope"),
waitScope(*this, "wait", "Type of wait scope"),
eventCount(*this, "eventCount", "Number of events to create") {}
eventCount(*this, "eventCount", "Number of events to create"),
counterBasedEvents(*this, "useCbe", "Use Counter Based Events") {}
};

struct EventTime : TestCase<EventTimeArguments> {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright (C) 2022-2026 Intel Corporation
*
* SPDX-License-Identifier: MIT
*
*/

#pragma once

#include "framework/argument/basic_argument.h"
#include "framework/argument/enum/event_scope_argument.h"
#include "framework/test_case/test_case.h"

struct InOrderWaitAppendArguments : TestCaseArgumentContainer {
BooleanArgument counterBasedEvents;

InOrderWaitAppendArguments() : counterBasedEvents(*this, "counterBasedEvents", "Use Counter Based Events") {}
};

struct InOrderWaitAppend : TestCase<InOrderWaitAppendArguments> {
using TestCase<InOrderWaitAppendArguments>::TestCase;

std::string getTestCaseName() const override {
return "InOrderWaitAppend";
}

std::string getHelp() const override {
return "Measures time spent to append signaled wait on event command to in-order command list.";
}
};
7 changes: 5 additions & 2 deletions source/benchmarks/api_overhead_benchmark/gtest/event_time.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

[[maybe_unused]] static const inline RegisterTestCase<EventTime> registerTestCase{};

class EventTimeTest : public ::testing::TestWithParam<std::tuple<bool, bool, EventScope, EventScope, uint32_t>> {
class EventTimeTest : public ::testing::TestWithParam<std::tuple<bool, bool, EventScope, EventScope, uint32_t, bool>> {
};

TEST_P(EventTimeTest, Test) {
Expand All @@ -24,6 +24,7 @@ TEST_P(EventTimeTest, Test) {
args.signalScope = std::get<2>(GetParam());
args.waitScope = std::get<3>(GetParam());
args.eventCount = std::get<4>(GetParam());
args.counterBasedEvents = std::get<5>(GetParam());

EventTime test;
test.run(args);
Expand All @@ -37,4 +38,6 @@ INSTANTIATE_TEST_SUITE_P(
::testing::Values(false, true),
::testing::ValuesIn(EventScopeArgument::enumValues),
::testing::ValuesIn(EventScopeArgument::enumValues),
::testing::Values(1000u)));
::testing::Values(1000u),
::testing::Values(false, true)
));
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* Copyright (C) 2022-2023 Intel Corporation
*
* SPDX-License-Identifier: MIT
*
*/

#include "definitions/in_order_wait_append.h"

#include "framework/test_case/register_test_case.h"

#include <gtest/gtest.h>

[[maybe_unused]] static const inline RegisterTestCase<InOrderWaitAppend> registerTestCase{};

class InOrderWaitAppendTest : public ::testing::TestWithParam<std::tuple<bool>> {
};

TEST_P(InOrderWaitAppendTest, Test) {
InOrderWaitAppendArguments args{};
args.api = Api::L0;
args.counterBasedEvents = std::get<0>(GetParam());
InOrderWaitAppend test;
test.run(args);
}

INSTANTIATE_TEST_SUITE_P(
InOrderWaitAppendTest,
InOrderWaitAppendTest,
::testing::Combine(
::testing::Values(false, true)));
Original file line number Diff line number Diff line change
Expand Up @@ -23,42 +23,84 @@ static TestResult run(const EventTimeArguments &arguments, Statistics &statistic
}

// Setup
LevelZero levelzero;
ExtensionProperties extensionProperties = ExtensionProperties::create();
if(arguments.counterBasedEvents) {
extensionProperties.setCounterBasedCreateFunctions(true);
}
LevelZero levelzero{extensionProperties};
Timer timer;

// Create event if necessary
ze_event_pool_desc_t eventPoolDesc = {ZE_STRUCTURE_TYPE_EVENT_POOL_DESC, nullptr, 0, arguments.eventCount};
auto eventPoolFlags = arguments.hostVisible * ZE_EVENT_POOL_FLAG_HOST_VISIBLE | arguments.useProfiling * ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP;
eventPoolDesc.flags = eventPoolFlags;

// Traditional event descriptor (non-counter-based)
ze_event_desc_t eventDesc = {ZE_STRUCTURE_TYPE_EVENT_DESC, nullptr, 0, 0, 0};
if (arguments.signalScope == EventScope::scopeSubDevice) {
eventDesc.signal = ZE_EVENT_SCOPE_FLAG_SUBDEVICE;
} else if (arguments.signalScope == EventScope::scopeDevice) {
eventDesc.signal = ZE_EVENT_SCOPE_FLAG_DEVICE;
} else if (arguments.signalScope == EventScope::scopeHost) {
eventDesc.signal = ZE_EVENT_SCOPE_FLAG_HOST;
}
if (!arguments.counterBasedEvents) {
if (arguments.signalScope == EventScope::scopeSubDevice) {
eventDesc.signal = ZE_EVENT_SCOPE_FLAG_SUBDEVICE;
} else if (arguments.signalScope == EventScope::scopeDevice) {
eventDesc.signal = ZE_EVENT_SCOPE_FLAG_DEVICE;
} else if (arguments.signalScope == EventScope::scopeHost) {
eventDesc.signal = ZE_EVENT_SCOPE_FLAG_HOST;
}

if (arguments.waitScope == EventScope::scopeSubDevice) {
eventDesc.wait = ZE_EVENT_SCOPE_FLAG_SUBDEVICE;
} else if (arguments.signalScope == EventScope::scopeDevice) {
eventDesc.wait = ZE_EVENT_SCOPE_FLAG_DEVICE;
} else if (arguments.signalScope == EventScope::scopeHost) {
eventDesc.wait = ZE_EVENT_SCOPE_FLAG_HOST;
if (arguments.waitScope == EventScope::scopeSubDevice) {
eventDesc.wait = ZE_EVENT_SCOPE_FLAG_SUBDEVICE;
} else if (arguments.waitScope == EventScope::scopeDevice) {
eventDesc.wait = ZE_EVENT_SCOPE_FLAG_DEVICE;
} else if (arguments.waitScope == EventScope::scopeHost) {
eventDesc.wait = ZE_EVENT_SCOPE_FLAG_HOST;
}
}

ze_event_pool_handle_t eventPool{};
// Counter-based event descriptor
zex_counter_based_event_desc_t eventDescCBE{ZEX_STRUCTURE_COUNTER_BASED_EVENT_DESC};
if (arguments.counterBasedEvents) {

if (levelzero.counterBasedEventCreate2 == nullptr) {
return TestResult::DeviceNotCapable;
}

eventDescCBE.flags = ZEX_COUNTER_BASED_EVENT_FLAG_IMMEDIATE |
arguments.hostVisible * ZEX_COUNTER_BASED_EVENT_FLAG_HOST_VISIBLE |
arguments.useProfiling * ZEX_COUNTER_BASED_EVENT_FLAG_KERNEL_TIMESTAMP;

if (arguments.signalScope == EventScope::scopeSubDevice) {
eventDescCBE.signalScope = ZE_EVENT_SCOPE_FLAG_SUBDEVICE;
} else if (arguments.signalScope == EventScope::scopeDevice) {
eventDescCBE.signalScope = ZE_EVENT_SCOPE_FLAG_DEVICE;
} else if (arguments.signalScope == EventScope::scopeHost) {
eventDescCBE.signalScope = ZE_EVENT_SCOPE_FLAG_HOST;
}

if (arguments.waitScope == EventScope::scopeSubDevice) {
eventDescCBE.waitScope = ZE_EVENT_SCOPE_FLAG_SUBDEVICE;
} else if (arguments.waitScope == EventScope::scopeDevice) {
eventDescCBE.waitScope = ZE_EVENT_SCOPE_FLAG_DEVICE;
} else if (arguments.waitScope == EventScope::scopeHost) {
eventDescCBE.waitScope = ZE_EVENT_SCOPE_FLAG_HOST;
}
}
std::vector<ze_event_handle_t> events(arguments.eventCount);
ASSERT_ZE_RESULT_SUCCESS(zeEventPoolCreate(levelzero.context, &eventPoolDesc, 0, nullptr, &eventPool));

ze_event_pool_handle_t eventPool{}; // only for non-CBE
if (!arguments.counterBasedEvents) {
// Create event if necessary
ze_event_pool_desc_t eventPoolDesc = {ZE_STRUCTURE_TYPE_EVENT_POOL_DESC, nullptr, 0, arguments.eventCount};
auto eventPoolFlags = arguments.hostVisible * ZE_EVENT_POOL_FLAG_HOST_VISIBLE | arguments.useProfiling * ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP;
eventPoolDesc.flags = eventPoolFlags;
ASSERT_ZE_RESULT_SUCCESS(zeEventPoolCreate(levelzero.context, &eventPoolDesc, 0, nullptr, &eventPool));
}

// Benchmark
for (auto i = 0u; i < arguments.iterations; i++) {

timer.measureStart();
for (auto j = 0u; j < arguments.eventCount; ++j) {
eventDesc.index = j;
ASSERT_ZE_RESULT_SUCCESS(zeEventCreate(eventPool, &eventDesc, &events[j]));
if (!arguments.counterBasedEvents) {
eventDesc.index = j;
ASSERT_ZE_RESULT_SUCCESS(zeEventCreate(eventPool, &eventDesc, &events[j]));
} else {
ASSERT_ZE_RESULT_SUCCESS(levelzero.counterBasedEventCreate2(levelzero.context, levelzero.device, &eventDescCBE, &events[j]));
}
}
timer.measureEnd();
for (auto j = 0u; j < arguments.eventCount; ++j) {
Expand All @@ -67,7 +109,9 @@ static TestResult run(const EventTimeArguments &arguments, Statistics &statistic
statistics.pushValue(timer.get() / arguments.eventCount, typeSelector.getUnit(), typeSelector.getType());
}

ASSERT_ZE_RESULT_SUCCESS(zeEventPoolDestroy(eventPool));
if (!arguments.counterBasedEvents) {
ASSERT_ZE_RESULT_SUCCESS(zeEventPoolDestroy(eventPool));
}
return TestResult::Success;
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Copyright (C) 2022-2026 Intel Corporation
*
* SPDX-License-Identifier: MIT
*
*/

#include "framework/l0/levelzero.h"
#include "framework/l0/utility/usm_helper.h"
#include "framework/test_case/register_test_case.h"
#include "framework/utility/file_helper.h"
#include "framework/utility/timer.h"

#include "definitions/in_order_wait_append.h"

#include <gtest/gtest.h>

static TestResult run(const InOrderWaitAppendArguments &arguments, Statistics &statistics) {
MeasurementFields typeSelector(MeasurementUnit::Microseconds, MeasurementType::Cpu);

if (isNoopRun()) {
statistics.pushUnitAndType(typeSelector.getUnit(), typeSelector.getType());
return TestResult::Nooped;
}

// Setup
ExtensionProperties extensionProperties = ExtensionProperties::create();
if (arguments.counterBasedEvents) {
extensionProperties.setCounterBasedCreateFunctions(true);
}
LevelZero levelzero{extensionProperties};
Timer timer;

// Create in-order immediate command list
ze_command_queue_desc_t commandQueueDesc = {ZE_STRUCTURE_TYPE_COMMAND_QUEUE_DESC};
commandQueueDesc.flags = ZE_COMMAND_QUEUE_FLAG_IN_ORDER;
ze_command_list_handle_t commandList{};
ASSERT_ZE_RESULT_SUCCESS(zeCommandListCreateImmediate(levelzero.context, levelzero.device, &commandQueueDesc, &commandList));

ze_event_handle_t event{};
ze_event_pool_handle_t eventPool{};

// Create event (either regular or counter-based)
if (!arguments.counterBasedEvents) {
ze_event_pool_desc_t eventPoolDesc = {ZE_STRUCTURE_TYPE_EVENT_POOL_DESC, nullptr, 0, 1u};
ASSERT_ZE_RESULT_SUCCESS(zeEventPoolCreate(levelzero.context, &eventPoolDesc, 0, nullptr, &eventPool));
ze_event_desc_t eventDesc = {ZE_STRUCTURE_TYPE_EVENT_DESC};
eventDesc.index = 0;
ASSERT_ZE_RESULT_SUCCESS(zeEventCreate(eventPool, &eventDesc, &event));
} else {
zex_counter_based_event_desc_t eventDescCBE{ZEX_STRUCTURE_COUNTER_BASED_EVENT_DESC};
eventDescCBE.stype = ZEX_STRUCTURE_COUNTER_BASED_EVENT_DESC;
eventDescCBE.pNext = nullptr;
ASSERT_ZE_RESULT_SUCCESS(levelzero.counterBasedEventCreate2(levelzero.context, levelzero.device, &eventDescCBE, &event));
}

// Create dummy buffers for memcpy operations (to have actual work)
constexpr size_t bufferSize = 64;
void *srcBuffer{}, *dstBuffer{};
ASSERT_ZE_RESULT_SUCCESS(UsmHelper::allocate(UsmMemoryPlacement::Device, levelzero, bufferSize, &srcBuffer));
ASSERT_ZE_RESULT_SUCCESS(UsmHelper::allocate(UsmMemoryPlacement::Device, levelzero, bufferSize, &dstBuffer));

// Benchmark
if (!arguments.counterBasedEvents) {
ASSERT_ZE_RESULT_SUCCESS(zeEventHostReset(event));
}
ASSERT_ZE_RESULT_SUCCESS(zeCommandListAppendMemoryCopy(commandList, dstBuffer, srcBuffer, bufferSize, event, 0, nullptr));
ASSERT_ZE_RESULT_SUCCESS(zeEventHostSynchronize(event, std::numeric_limits<uint64_t>::max())); //wait untile the event is signaled

for (auto i = 0u; i < arguments.iterations; i++) {
timer.measureStart();
ASSERT_ZE_RESULT_SUCCESS(zeCommandListAppendWaitOnEvents(commandList, 1, &event)); //wait on signaled event, should be immediate
timer.measureEnd();

statistics.pushValue(timer.get(), typeSelector.getUnit(), typeSelector.getType());
}

// Cleanup
ASSERT_ZE_RESULT_SUCCESS(UsmHelper::deallocate(UsmMemoryPlacement::Device, levelzero, srcBuffer));
ASSERT_ZE_RESULT_SUCCESS(UsmHelper::deallocate(UsmMemoryPlacement::Device, levelzero, dstBuffer));
ASSERT_ZE_RESULT_SUCCESS(zeEventDestroy(event));
if (eventPool) {
ASSERT_ZE_RESULT_SUCCESS(zeEventPoolDestroy(eventPool));
}
ASSERT_ZE_RESULT_SUCCESS(zeCommandListDestroy(commandList));

return TestResult::Success;
}

static RegisterTestCaseImplementation<InOrderWaitAppend> registerTestCase(run, Api::L0);