-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for NVFP8/6/4 in <cuda/std/cmath>
pt. 1
#3843
base: main
Are you sure you want to change the base?
Conversation
846978f
to
0301e58
Compare
0301e58
to
b3974b8
Compare
b3974b8
to
64527b7
Compare
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, although I am unsure about the __nv prefix
#endif // _LIBCUDACXX_HAS_NVFP16 | ||
|
||
#if defined(_LIBCUDACXX_HAS_NVBF16) | ||
_CCCL_NODISCARD _LIBCUDACXX_HIDE_FROM_ABI constexpr bool isinf(__nv_bfloat16 __x) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extended floating point types are not literal types so we cannot mark these functions as constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what prevents __nv_bfloat16
to be used here? __nv_bfloat16
can be constructed in a constexpr
function by using __nv_bfloat16_raw
. Both default and copy ctors are constexpr
if __CPP_VERSION_AT_LEAST_11_BF16
is defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we should test with the conversion and arithmetic operations disabled because our code must work without them
🟨 CI finished in 1h 28m: Pass: 30%/158 | Total: 2d 16h | Avg: 24m 28s | Max: 1h 19m | Hits: 36%/60534
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
// -*- C++ -*- | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. | ||
// | ||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically none of those are part of libc++ so we should probalby use the "Part of libu++" license?
/ok to test |
#include "test_macros.h" | ||
|
||
template <class T> | ||
__host__ __device__ void test_fpclassify(T val, int expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, can you please check whether we can run some of those tests with the disabled conversions / constructors
We might need to add a header with generator functions but it would be awesome if we could get those tests to work without assuming functionality that might be user disabled
/ok to test |
97bbd64
to
75792eb
Compare
/ok to test |
🟨 CI finished in 1h 55m: Pass: 74%/158 | Total: 3d 01h | Avg: 27m 58s | Max: 1h 19m | Hits: 46%/168911
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
/ok to test |
/ok to test |
🟨 CI finished in 1h 29m: Pass: 62%/158 | Total: 2d 13h | Avg: 23m 25s | Max: 1h 16m | Hits: 38%/146302
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
/ok to test |
🟨 CI finished in 1h 42m: Pass: 68%/158 | Total: 2d 17h | Avg: 24m 53s | Max: 1h 22m | Hits: 46%/160518
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
This PR implements several functions from
<cuda/std/cmath>
for NVFP8/6/4 types.