Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

bjacob · 2025-02-24T16:54:38Z

The implementation of the --iree-codegen-gpu-native-math-precision had a longstanding bug where both branches (for that option being true or false) were performing approximation of the math.erf function.

Recent refactorings preserved that behavior bug-for-bug.

Unfortunately, that meant that users passing --iree-codegen-gpu-native-math-precision were not getting faster math.erf, even on ROCm where we enabled the native call by default.

when --iree-codegen-gpu-native-math-precision is passed. Signed-off-by: Benoit Jacob <[email protected]>

kuhar

I benchmarked this and confirm it helps with perf

bjacob · 2025-02-24T20:08:17Z

Summary of debugging of the CI failures so far:

The difference in IR without/with this PR is exactly as expected: it causes us to use the __ocml_erf_f16 function instead of the polynomial approximation. https://gist.github.com/bjacob/247e46bc587f2c5e089fe67d8897fb49

The implementation of __ocml_erf_f16 is also embedded in the above IR diff. It uses a different approximation, but that is just as accurate. Both are accurate to < 1e-7 so shouldn't explain a large numerical difference like we are seeing here.

There is however one major difference:

Without this PR, we are performing the approximation in f16.
With this PR, the __ocml_erf_f16 function which we are calling is upcasting to f32 and performing the approximation in f32 and casting the result down to f16. It's all in the above IR diff.

So I actually think that the code with this PR is actually more accurate.

It is a little surprising that the numerical different e2e is so large, but we need assessment of the actual accuracy metric, not those direct output-activations comparisons.

nithinsubbiah · 2025-02-24T20:25:56Z

I verified the numerics with this patch by comparing the image we generate from SDXL and it looks good. We need to update the golden values for SDXL in CI with this update.

Drop a bug-for-bug forcing of approx for math.erf

68b7f53

when --iree-codegen-gpu-native-math-precision is passed. Signed-off-by: Benoit Jacob <[email protected]>

bjacob marked this pull request as ready for review February 24, 2025 16:57

bjacob requested a review from hanhanW as a code owner February 24, 2025 16:57

bjacob requested review from kuhar and Groverkss February 24, 2025 16:57

Groverkss approved these changes Feb 24, 2025

View reviewed changes

kuhar approved these changes Feb 24, 2025

View reviewed changes

This was referenced Feb 25, 2025

New flag: iree-codegen-math-transform-tweaks #20042

Open

[AMDGPU] Do not rewrite or approximate math functions on ROCm #19970

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

bjacob commented Feb 24, 2025

kuhar left a comment

bjacob commented Feb 24, 2025

nithinsubbiah commented Feb 24, 2025

Stop forcing approximation of math.erf with --iree-codegen-gpu-native-math-precision #20074

Are you sure you want to change the base?

Stop forcing approximation of math.erf with --iree-codegen-gpu-native-math-precision #20074

Conversation

bjacob commented Feb 24, 2025

kuhar left a comment

Choose a reason for hiding this comment

bjacob commented Feb 24, 2025

nithinsubbiah commented Feb 24, 2025

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074

Stop forcing approximation of `math.erf` with `--iree-codegen-gpu-native-math-precision` #20074