-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GPUToolbox.jl #538
Use GPUToolbox.jl #538
Conversation
5101d90
to
f856cb9
Compare
f856cb9
to
6982cd8
Compare
6982cd8
to
930663c
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #538 +/- ##
==========================================
+ Coverage 71.04% 74.88% +3.84%
==========================================
Files 36 57 +21
Lines 1143 2708 +1565
==========================================
+ Hits 812 2028 +1216
- Misses 331 680 +349 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: 930663c | Previous: bec8c71 | Ratio |
---|---|---|---|
private array/construct |
24524.25 ns |
24993.166666666668 ns |
0.98 |
private array/broadcast |
457958 ns |
465708 ns |
0.98 |
private array/random/randn/Float32 |
812625 ns |
758541.5 ns |
1.07 |
private array/random/randn!/Float32 |
636083 ns |
628833.5 ns |
1.01 |
private array/random/rand!/Int64 |
579229.5 ns |
561125 ns |
1.03 |
private array/random/rand!/Float32 |
592041 ns |
594167 ns |
1.00 |
private array/random/rand/Int64 |
769854.5 ns |
782979.5 ns |
0.98 |
private array/random/rand/Float32 |
607312.5 ns |
601042 ns |
1.01 |
private array/copyto!/gpu_to_gpu |
656583 ns |
666666 ns |
0.98 |
private array/copyto!/cpu_to_gpu |
708083 ns |
617709 ns |
1.15 |
private array/copyto!/gpu_to_cpu |
813333 ns |
504083 ns |
1.61 |
private array/accumulate/1d |
1344041 ns |
1350208 ns |
1.00 |
private array/accumulate/2d |
1476875 ns |
1390417 ns |
1.06 |
private array/iteration/findall/int |
2091500 ns |
2066458 ns |
1.01 |
private array/iteration/findall/bool |
1848166.5 ns |
1826625 ns |
1.01 |
private array/iteration/findfirst/int |
1696375 ns |
1694583 ns |
1.00 |
private array/iteration/findfirst/bool |
1657333.5 ns |
1661917 ns |
1.00 |
private array/iteration/scalar |
3903458 ns |
3825541.5 ns |
1.02 |
private array/iteration/logical |
3174750 ns |
3192917 ns |
0.99 |
private array/iteration/findmin/1d |
1748729.5 ns |
1768354 ns |
0.99 |
private array/iteration/findmin/2d |
1353208 ns |
1347270.5 ns |
1.00 |
private array/reductions/reduce/1d |
1031833.5 ns |
1037417 ns |
0.99 |
private array/reductions/reduce/2d |
668250 ns |
664166 ns |
1.01 |
private array/reductions/mapreduce/1d |
1044541.5 ns |
1038687.5 ns |
1.01 |
private array/reductions/mapreduce/2d |
661334 ns |
667729.5 ns |
0.99 |
private array/permutedims/4d |
2543770.5 ns |
2537291.5 ns |
1.00 |
private array/permutedims/2d |
1026375 ns |
1025229 ns |
1.00 |
private array/permutedims/3d |
1591645.5 ns |
1574209 ns |
1.01 |
private array/copy |
549166.5 ns |
569958 ns |
0.96 |
latency/precompile |
9100731166 ns |
9097851875 ns |
1.00 |
latency/ttfp |
3692994125 ns |
3676407792 ns |
1.00 |
latency/import |
1252891896 ns |
1241771750 ns |
1.01 |
integration/metaldevrt |
713271 ns |
701916 ns |
1.02 |
integration/byval/slices=1 |
1542375 ns |
1647333.5 ns |
0.94 |
integration/byval/slices=3 |
9548084 ns |
10016083 ns |
0.95 |
integration/byval/reference |
1580687.5 ns |
1621729 ns |
0.97 |
integration/byval/slices=2 |
2599250 ns |
2740896 ns |
0.95 |
kernel/indexing |
453334 ns |
467166.5 ns |
0.97 |
kernel/indexing_checked |
449250 ns |
472958 ns |
0.95 |
kernel/launch |
9708.333333333334 ns |
8042 ns |
1.21 |
metal/synchronization/stream |
14917 ns |
14166 ns |
1.05 |
metal/synchronization/context |
15167 ns |
15292 ns |
0.99 |
shared array/construct |
23263.833333333332 ns |
24368 ns |
0.95 |
shared array/broadcast |
466083 ns |
462917 ns |
1.01 |
shared array/random/randn/Float32 |
807833 ns |
765792 ns |
1.05 |
shared array/random/randn!/Float32 |
633750 ns |
636834 ns |
1.00 |
shared array/random/rand!/Int64 |
573917 ns |
575708 ns |
1.00 |
shared array/random/rand!/Float32 |
586917 ns |
597854.5 ns |
0.98 |
shared array/random/rand/Int64 |
784166.5 ns |
791979 ns |
0.99 |
shared array/random/rand/Float32 |
646416 ns |
628937.5 ns |
1.03 |
shared array/copyto!/gpu_to_gpu |
80625 ns |
82917 ns |
0.97 |
shared array/copyto!/cpu_to_gpu |
84458 ns |
82542 ns |
1.02 |
shared array/copyto!/gpu_to_cpu |
76292 ns |
81958 ns |
0.93 |
shared array/accumulate/1d |
1370042 ns |
1344458 ns |
1.02 |
shared array/accumulate/2d |
1394625.5 ns |
1398542 ns |
1.00 |
shared array/iteration/findall/int |
1847500 ns |
1827729.5 ns |
1.01 |
shared array/iteration/findall/bool |
1603167 ns |
1595416 ns |
1.00 |
shared array/iteration/findfirst/int |
1429542 ns |
1402854 ns |
1.02 |
shared array/iteration/findfirst/bool |
1367042 ns |
1374000 ns |
0.99 |
shared array/iteration/scalar |
160542 ns |
152895.5 ns |
1.05 |
shared array/iteration/logical |
2975229 ns |
2984625 ns |
1.00 |
shared array/iteration/findmin/1d |
1462875 ns |
1465333 ns |
1.00 |
shared array/iteration/findmin/2d |
1362125 ns |
1383958 ns |
0.98 |
shared array/reductions/reduce/1d |
724708 ns |
735375 ns |
0.99 |
shared array/reductions/reduce/2d |
672750 ns |
675500 ns |
1.00 |
shared array/reductions/mapreduce/1d |
744167 ns |
739416.5 ns |
1.01 |
shared array/reductions/mapreduce/2d |
665375 ns |
678250 ns |
0.98 |
shared array/permutedims/4d |
2547042 ns |
2409542 ns |
1.06 |
shared array/permutedims/2d |
1025375 ns |
1023021 ns |
1.00 |
shared array/permutedims/3d |
1590250 ns |
1582666 ns |
1.00 |
shared array/copy |
247167 ns |
241792 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
I'll merge this one since the only code from GPUToolbox that this uses is |
Should be gtg if tests pass
TODO:
[sources]
entry