-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf_tests: add linux_perf_event::user_cpu_cycles_retired #2261
perf_tests: add linux_perf_event::user_cpu_cycles_retired #2261
Conversation
There's no reason t count user instructions when idle so use the appropriate flag for that. https://github.com/torvalds/linux/blob/29c73fc794c83505066ee6db893b2a83ac5fac63/include/uapi/linux/perf_event.h#L421 ``` exclude_idle : 1, /* don't count when idle */ ``` This doesn't really fix anything in practice since the perf benchmarks using the event, like scylla perf-simple-query, are never idle. Signed-off-by: Benny Halevy <[email protected]>
Add a linux perf event to sample `PERF_COUNT_HW_CPU_CYCLES`. To be used by scylla perf-simple-query in addition to user_instructions_retired to measure performance in a way that also takes into account other factors beyond cpy cycles, like caching (both instruction cache and data cache). Note that typically cycles are more sensistive to background noise so if other processes run on the same test machine or if its cpu/bus frequencies change dynamically for whatever reasons, you'll see fluctuations in cycles. Signed-off-by: Benny Halevy <[email protected]>
Make use of the recently added `cpu_cycles_retired_counter`. Example output from tests/perf/allocator_perf --blocked-reactor-notify-ms=1000000: ``` single run iterations: 0 single run duration: 1.000s number of runs: 5 number of cores: 32 random seed: 3842059098 test iterations median mad min max allocs tasks inst cycles alloc_bench.malloc_only 63527000 8.893ns 0.255ns 8.632ns 9.497ns 1.000 0.000 43.3 13.6 alloc_bench.free_only 65545000 5.820ns 0.038ns 5.765ns 6.009ns 0.000 0.000 27.3 7.8 alloc_bench.malloc_free 51283000 14.776ns 0.145ns 14.593ns 14.993ns 1.000 0.000 70.6 22.0 alloc_bench.op_new_only 68012000 8.630ns 0.126ns 8.445ns 8.836ns 1.000 0.000 41.3 13.7 alloc_bench.op_delete_only 68054000 5.877ns 0.050ns 5.797ns 5.927ns 0.000 0.000 27.3 7.9 alloc_bench.op_new_delete 52444000 14.442ns 0.082ns 14.132ns 14.526ns 1.000 0.000 68.6 21.3 alloc_bench.new_array_only 68676000 8.692ns 0.007ns 8.651ns 8.748ns 1.000 0.000 42.3 13.7 alloc_bench.delete_array_only 69221000 5.765ns 0.028ns 5.727ns 5.864ns 0.000 0.000 29.3 7.6 alloc_bench.array_new_delete 50895000 14.952ns 0.071ns 14.684ns 15.023ns 1.000 0.000 71.6 21.3 alloc_bench.alloc_only_large 20392000 28.326ns 0.090ns 27.729ns 28.481ns 1.000 0.000 201.2 53.5 alloc_bench.free_only_large 20668000 20.227ns 0.269ns 19.434ns 20.497ns 0.000 0.000 137.2 36.6 alloc_bench.alloc_free_large 18993000 47.542ns 0.230ns 47.312ns 48.205ns 1.000 0.000 338.3 90.1 alloc_bench.single_alloc_and_free_small_many 10790191 92.200ns 0.203ns 91.839ns 92.479ns 10.000 0.000 710.0 192.1 alloc_bench.single_alloc_and_free_small_many_cross_page 21399784 47.149ns 1.048ns 45.682ns 48.197ns 5.000 0.000 360.0 98.1 alloc_bench.single_alloc_and_free_small_many_cross_page_alloc_more 557930 1.772us 6.331ns 1.751us 1.785us 101.000 0.000 12837.0 3704.1 random_sampling.exp_dist 41050000 24.108ns 0.039ns 23.847ns 24.218ns 0.000 0.000 120.8 50.5 random_sampling.geo_dist 37260000 26.713ns 0.044ns 26.564ns 26.896ns 0.000 0.000 129.8 55.9 ``` Signed-off-by: Benny Halevy <[email protected]>
@avikivity can you please review/merge? |
@@ -90,5 +90,6 @@ linux_perf_event::user_instructions_retired() { | |||
.disabled = 1, | |||
.exclude_kernel = 1, | |||
.exclude_hv = 1, | |||
.exclude_idle = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is meaningful.
exclude_idle
If set, don't count when the CPU is running the idle task. While you can currently enable this for any event type, it is ignored for all but software events.
But, we're already tied to a thread, so we aren't executing the idle task.
@@ -45,22 +45,25 @@ public: | |||
uint64_t allocations = 0; | |||
uint64_t tasks_executed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoying that we get 3-4 IPC in benchmarks but <1 in real life.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
This series adds a new linux_perf_event: user_cpu_cycles_retired
and makes use of it in the general perf tests framework.
Example output for
tests/perf/allocator_perf --blocked-reactor-notify-ms=1000000
: