Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cr_log from core-math #326

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Oct 27, 2024

Don't look too close, this is intentionally terrible code (line for line)

ci: allow-regressions

@tgross35
Copy link
Contributor Author

Initial pass with things looking correct shows a 15x regression for softfloat mode

icount::icount_bench_log_group::icount_bench_log logspace:setup_log()
Performance has regressed: Instructions (526320 > 34702) regressed by +1416.68% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                      526320|34702                (+1416.68%) [+15.1668x]
  L1 Hits:                           586739|41280                (+1321.36%) [+14.2136x]
  L2 Hits:                                3|2                    (+50.0000%) [+1.50000x]
  RAM Hits:                             175|17                   (+929.412%) [+10.2941x]
  Total read+write:                  586917|41299                (+1321.14%) [+14.2114x]
  Estimated Cycles:                  592879|41885                (+1315.49%) [+14.1549x]

Overflow

Formatting

Fix dint arithmetic

fix another case

fix another case, quick tests passing

clippy

cleanup

more tests passing
@tgross35
Copy link
Contributor Author

Version with fma is much more tolerable

icount::icount_bench_log_group::icount_bench_log logspace:setup_log()
Performance has regressed: Instructions (74217 > 34702) regressed by +113.870% (>+5.00000)
  Baselines:                      hardfloat|hardfloat
  Instructions:                       74217|34702                (+113.870%) [+2.13870x]
  L1 Hits:                           107651|41281                (+160.776%) [+2.60776x]
  L2 Hits:                                5|2                    (+150.000%) [+2.50000x]
  RAM Hits:                             158|16                   (+887.500%) [+9.87500x]
  Total read+write:                  107814|41299                (+161.057%) [+2.61057x]
  Estimated Cycles:                  113206|41851                (+170.498%) [+2.70498x]

@tgross35
Copy link
Contributor Author

Fyi @beetrees, this PR and #322 are probably reasonably close to merging, after a bit of cleanup. I think there is some restructuring to be done here and some of the perf might be possible to get back, but I'd rather just do that cleanup after a working baseline is in tree.

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 12, 2025

Hm, the LUTs here have a huge impact on binary size. Thinking maybe I should keep the current implementations around behind a feature gate. Edit: or maybe just have an option to only use the fast method without cr_log_accurate? That eliminates two tables.

@tgross35
Copy link
Contributor Author

I should try marking the accurate versions #[cold] (note to self)

@beetrees
Copy link
Contributor

Hm, the LUTs here have a huge impact on binary size. Thinking maybe I should keep the current implementations around behind a feature gate.

I think the ideal API surface would be the libm::log function is always correctly rounded, and then there could be a libm::log_approx function (name bikeshedable) which is faster and/or has a smaller binary footprint but has less accuracy. I think LTO would be able to remove the unused functions/LUTs if they have no users? If so then I think relying on LTO is preferable to having Cargo features as it can remove all the libm functions that an application doesn't use (I doubt most applications use every single libm function) without having to have a feature per function.

Edit: or maybe just have an option to only use the fast method without cr_log_accurate? That eliminates two tables.

That does seem more maintainable than having two completely separate implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants