Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_tests: add linux_perf_event::user_cpu_cycles_retired #2261

Conversation

bhalevy
Copy link
Member

@bhalevy bhalevy commented May 22, 2024

This series adds a new linux_perf_event: user_cpu_cycles_retired
and makes use of it in the general perf tests framework.

Example output for tests/perf/allocator_perf --blocked-reactor-notify-ms=1000000:

single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          32
random seed:              3842059098

test                                                                iterations      median         mad         min         max      allocs       tasks        inst      cycles
alloc_bench.malloc_only                                               63527000     8.893ns     0.255ns     8.632ns     9.497ns       1.000       0.000        43.3        13.6
alloc_bench.free_only                                                 65545000     5.820ns     0.038ns     5.765ns     6.009ns       0.000       0.000        27.3         7.8
alloc_bench.malloc_free                                               51283000    14.776ns     0.145ns    14.593ns    14.993ns       1.000       0.000        70.6        22.0
alloc_bench.op_new_only                                               68012000     8.630ns     0.126ns     8.445ns     8.836ns       1.000       0.000        41.3        13.7
alloc_bench.op_delete_only                                            68054000     5.877ns     0.050ns     5.797ns     5.927ns       0.000       0.000        27.3         7.9
alloc_bench.op_new_delete                                             52444000    14.442ns     0.082ns    14.132ns    14.526ns       1.000       0.000        68.6        21.3
alloc_bench.new_array_only                                            68676000     8.692ns     0.007ns     8.651ns     8.748ns       1.000       0.000        42.3        13.7
alloc_bench.delete_array_only                                         69221000     5.765ns     0.028ns     5.727ns     5.864ns       0.000       0.000        29.3         7.6
alloc_bench.array_new_delete                                          50895000    14.952ns     0.071ns    14.684ns    15.023ns       1.000       0.000        71.6        21.3
alloc_bench.alloc_only_large                                          20392000    28.326ns     0.090ns    27.729ns    28.481ns       1.000       0.000       201.2        53.5
alloc_bench.free_only_large                                           20668000    20.227ns     0.269ns    19.434ns    20.497ns       0.000       0.000       137.2        36.6
alloc_bench.alloc_free_large                                          18993000    47.542ns     0.230ns    47.312ns    48.205ns       1.000       0.000       338.3        90.1
alloc_bench.single_alloc_and_free_small_many                          10790191    92.200ns     0.203ns    91.839ns    92.479ns      10.000       0.000       710.0       192.1
alloc_bench.single_alloc_and_free_small_many_cross_page               21399784    47.149ns     1.048ns    45.682ns    48.197ns       5.000       0.000       360.0        98.1
alloc_bench.single_alloc_and_free_small_many_cross_page_alloc_more      557930     1.772us     6.331ns     1.751us     1.785us     101.000       0.000     12837.0      3704.1
random_sampling.exp_dist                                              41050000    24.108ns     0.039ns    23.847ns    24.218ns       0.000       0.000       120.8        50.5
random_sampling.geo_dist                                              37260000    26.713ns     0.044ns    26.564ns    26.896ns       0.000       0.000       129.8        55.9

bhalevy added 3 commits May 22, 2024 14:04
There's no reason t count user instructions when idle
so use the appropriate flag for that.

https://github.com/torvalds/linux/blob/29c73fc794c83505066ee6db893b2a83ac5fac63/include/uapi/linux/perf_event.h#L421
```
				exclude_idle   :  1, /* don't count when idle */
```

This doesn't really fix anything in practice since
the perf benchmarks using the event, like scylla
perf-simple-query, are never idle.

Signed-off-by: Benny Halevy <[email protected]>
Add a linux perf event to sample `PERF_COUNT_HW_CPU_CYCLES`.
To be used by scylla perf-simple-query in addition
to user_instructions_retired to measure performance
in a way that also takes into account other factors
beyond cpy cycles, like caching (both instruction cache
and data cache).

Note that typically cycles are more sensistive to background
noise so if other processes run on the same test machine
or if its cpu/bus frequencies change dynamically for whatever
reasons, you'll see fluctuations in cycles.

Signed-off-by: Benny Halevy <[email protected]>
Make use of the recently added `cpu_cycles_retired_counter`.

Example output from
tests/perf/allocator_perf --blocked-reactor-notify-ms=1000000:
```
single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          32
random seed:              3842059098

test                                                                iterations      median         mad         min         max      allocs       tasks        inst      cycles
alloc_bench.malloc_only                                               63527000     8.893ns     0.255ns     8.632ns     9.497ns       1.000       0.000        43.3        13.6
alloc_bench.free_only                                                 65545000     5.820ns     0.038ns     5.765ns     6.009ns       0.000       0.000        27.3         7.8
alloc_bench.malloc_free                                               51283000    14.776ns     0.145ns    14.593ns    14.993ns       1.000       0.000        70.6        22.0
alloc_bench.op_new_only                                               68012000     8.630ns     0.126ns     8.445ns     8.836ns       1.000       0.000        41.3        13.7
alloc_bench.op_delete_only                                            68054000     5.877ns     0.050ns     5.797ns     5.927ns       0.000       0.000        27.3         7.9
alloc_bench.op_new_delete                                             52444000    14.442ns     0.082ns    14.132ns    14.526ns       1.000       0.000        68.6        21.3
alloc_bench.new_array_only                                            68676000     8.692ns     0.007ns     8.651ns     8.748ns       1.000       0.000        42.3        13.7
alloc_bench.delete_array_only                                         69221000     5.765ns     0.028ns     5.727ns     5.864ns       0.000       0.000        29.3         7.6
alloc_bench.array_new_delete                                          50895000    14.952ns     0.071ns    14.684ns    15.023ns       1.000       0.000        71.6        21.3
alloc_bench.alloc_only_large                                          20392000    28.326ns     0.090ns    27.729ns    28.481ns       1.000       0.000       201.2        53.5
alloc_bench.free_only_large                                           20668000    20.227ns     0.269ns    19.434ns    20.497ns       0.000       0.000       137.2        36.6
alloc_bench.alloc_free_large                                          18993000    47.542ns     0.230ns    47.312ns    48.205ns       1.000       0.000       338.3        90.1
alloc_bench.single_alloc_and_free_small_many                          10790191    92.200ns     0.203ns    91.839ns    92.479ns      10.000       0.000       710.0       192.1
alloc_bench.single_alloc_and_free_small_many_cross_page               21399784    47.149ns     1.048ns    45.682ns    48.197ns       5.000       0.000       360.0        98.1
alloc_bench.single_alloc_and_free_small_many_cross_page_alloc_more      557930     1.772us     6.331ns     1.751us     1.785us     101.000       0.000     12837.0      3704.1
random_sampling.exp_dist                                              41050000    24.108ns     0.039ns    23.847ns    24.218ns       0.000       0.000       120.8        50.5
random_sampling.geo_dist                                              37260000    26.713ns     0.044ns    26.564ns    26.896ns       0.000       0.000       129.8        55.9
```

Signed-off-by: Benny Halevy <[email protected]>
@bhalevy
Copy link
Member Author

bhalevy commented May 26, 2024

@avikivity can you please review/merge?

@@ -90,5 +90,6 @@ linux_perf_event::user_instructions_retired() {
.disabled = 1,
.exclude_kernel = 1,
.exclude_hv = 1,
.exclude_idle = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is meaningful.

       exclude_idle
              If set, don't count when the CPU is running the idle task.  While you can currently enable this for any event type, it is ignored for all but software events.

But, we're already tied to a thread, so we aren't executing the idle task.

@@ -45,22 +45,25 @@ public:
uint64_t allocations = 0;
uint64_t tasks_executed = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying that we get 3-4 IPC in benchmarks but <1 in real life.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@avikivity avikivity closed this in 7cd0cab May 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants