Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GPUToolbox.jl #538

Merged
merged 1 commit into from
Feb 15, 2025
Merged

Conversation

christiangnrd
Copy link
Contributor

@christiangnrd christiangnrd commented Feb 7, 2025

Should be gtg if tests pass

TODO:

  • Remove [sources] entry
  • Add compat entry

@christiangnrd christiangnrd marked this pull request as draft February 7, 2025 17:00
@christiangnrd christiangnrd changed the title Use GPUUtils.jl Use GPUToolbox.jl Feb 11, 2025
@christiangnrd christiangnrd marked this pull request as ready for review February 15, 2025 02:33
Copy link

codecov bot commented Feb 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.88%. Comparing base (52d7056) to head (930663c).
Report is 421 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
+ Coverage   71.04%   74.88%   +3.84%     
==========================================
  Files          36       57      +21     
  Lines        1143     2708    +1565     
==========================================
+ Hits          812     2028    +1216     
- Misses        331      680     +349     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 930663c Previous: bec8c71 Ratio
private array/construct 24524.25 ns 24993.166666666668 ns 0.98
private array/broadcast 457958 ns 465708 ns 0.98
private array/random/randn/Float32 812625 ns 758541.5 ns 1.07
private array/random/randn!/Float32 636083 ns 628833.5 ns 1.01
private array/random/rand!/Int64 579229.5 ns 561125 ns 1.03
private array/random/rand!/Float32 592041 ns 594167 ns 1.00
private array/random/rand/Int64 769854.5 ns 782979.5 ns 0.98
private array/random/rand/Float32 607312.5 ns 601042 ns 1.01
private array/copyto!/gpu_to_gpu 656583 ns 666666 ns 0.98
private array/copyto!/cpu_to_gpu 708083 ns 617709 ns 1.15
private array/copyto!/gpu_to_cpu 813333 ns 504083 ns 1.61
private array/accumulate/1d 1344041 ns 1350208 ns 1.00
private array/accumulate/2d 1476875 ns 1390417 ns 1.06
private array/iteration/findall/int 2091500 ns 2066458 ns 1.01
private array/iteration/findall/bool 1848166.5 ns 1826625 ns 1.01
private array/iteration/findfirst/int 1696375 ns 1694583 ns 1.00
private array/iteration/findfirst/bool 1657333.5 ns 1661917 ns 1.00
private array/iteration/scalar 3903458 ns 3825541.5 ns 1.02
private array/iteration/logical 3174750 ns 3192917 ns 0.99
private array/iteration/findmin/1d 1748729.5 ns 1768354 ns 0.99
private array/iteration/findmin/2d 1353208 ns 1347270.5 ns 1.00
private array/reductions/reduce/1d 1031833.5 ns 1037417 ns 0.99
private array/reductions/reduce/2d 668250 ns 664166 ns 1.01
private array/reductions/mapreduce/1d 1044541.5 ns 1038687.5 ns 1.01
private array/reductions/mapreduce/2d 661334 ns 667729.5 ns 0.99
private array/permutedims/4d 2543770.5 ns 2537291.5 ns 1.00
private array/permutedims/2d 1026375 ns 1025229 ns 1.00
private array/permutedims/3d 1591645.5 ns 1574209 ns 1.01
private array/copy 549166.5 ns 569958 ns 0.96
latency/precompile 9100731166 ns 9097851875 ns 1.00
latency/ttfp 3692994125 ns 3676407792 ns 1.00
latency/import 1252891896 ns 1241771750 ns 1.01
integration/metaldevrt 713271 ns 701916 ns 1.02
integration/byval/slices=1 1542375 ns 1647333.5 ns 0.94
integration/byval/slices=3 9548084 ns 10016083 ns 0.95
integration/byval/reference 1580687.5 ns 1621729 ns 0.97
integration/byval/slices=2 2599250 ns 2740896 ns 0.95
kernel/indexing 453334 ns 467166.5 ns 0.97
kernel/indexing_checked 449250 ns 472958 ns 0.95
kernel/launch 9708.333333333334 ns 8042 ns 1.21
metal/synchronization/stream 14917 ns 14166 ns 1.05
metal/synchronization/context 15167 ns 15292 ns 0.99
shared array/construct 23263.833333333332 ns 24368 ns 0.95
shared array/broadcast 466083 ns 462917 ns 1.01
shared array/random/randn/Float32 807833 ns 765792 ns 1.05
shared array/random/randn!/Float32 633750 ns 636834 ns 1.00
shared array/random/rand!/Int64 573917 ns 575708 ns 1.00
shared array/random/rand!/Float32 586917 ns 597854.5 ns 0.98
shared array/random/rand/Int64 784166.5 ns 791979 ns 0.99
shared array/random/rand/Float32 646416 ns 628937.5 ns 1.03
shared array/copyto!/gpu_to_gpu 80625 ns 82917 ns 0.97
shared array/copyto!/cpu_to_gpu 84458 ns 82542 ns 1.02
shared array/copyto!/gpu_to_cpu 76292 ns 81958 ns 0.93
shared array/accumulate/1d 1370042 ns 1344458 ns 1.02
shared array/accumulate/2d 1394625.5 ns 1398542 ns 1.00
shared array/iteration/findall/int 1847500 ns 1827729.5 ns 1.01
shared array/iteration/findall/bool 1603167 ns 1595416 ns 1.00
shared array/iteration/findfirst/int 1429542 ns 1402854 ns 1.02
shared array/iteration/findfirst/bool 1367042 ns 1374000 ns 0.99
shared array/iteration/scalar 160542 ns 152895.5 ns 1.05
shared array/iteration/logical 2975229 ns 2984625 ns 1.00
shared array/iteration/findmin/1d 1462875 ns 1465333 ns 1.00
shared array/iteration/findmin/2d 1362125 ns 1383958 ns 0.98
shared array/reductions/reduce/1d 724708 ns 735375 ns 0.99
shared array/reductions/reduce/2d 672750 ns 675500 ns 1.00
shared array/reductions/mapreduce/1d 744167 ns 739416.5 ns 1.01
shared array/reductions/mapreduce/2d 665375 ns 678250 ns 0.98
shared array/permutedims/4d 2547042 ns 2409542 ns 1.06
shared array/permutedims/2d 1025375 ns 1023021 ns 1.00
shared array/permutedims/3d 1590250 ns 1582666 ns 1.00
shared array/copy 247167 ns 241792 ns 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd
Copy link
Contributor Author

I'll merge this one since the only code from GPUToolbox that this uses is SimpleVersion and I'm very confident that it's the same code.

@christiangnrd christiangnrd merged commit 4324871 into JuliaGPU:main Feb 15, 2025
7 checks passed
@christiangnrd christiangnrd deleted the simpleversion branch February 15, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant