Releases: cupy/cupy
v12.0.0a2
This is the release note of v12.0.0a2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Increased cupyx.scipy
APIs (#6773, #6990, #7014, #7015, #7036)
The coverage of SciPy interpolate
& special
APIs has increased. (Thanks @khushi-411 & @1MrEnot!)
Initial support for ufunc
methods (#7049)
Starting from v12, CuPy will support the corresponding NumPy ufunc
methods.
This release adds compatibility with ufunc.outer
. Check the tracking issue (#7082) for detailed information.
Changes
New Features
- Add
cupyx.scipy.special.logsumexp
(#6773) - Add
cupyx.scipy.interpolate.KroghInterpolator
(#6990) - Add
scipy.special.expi
andscipy.special.exp1
(#7014) - Add
cupy.byte_bounds
(#7015) - Adds
cupyx.scipy.special.k0
,cupyx.scipy.special.k1
,cupyx.scipy.special.k0e
,cupyx.scipy.special.k1e
(#7036) - Add
ufunc.outer
(#7049) - Expose pairwise distance functions (#7063)
Enhancements
- Support NCCL 2.12 ~ 2.14 (#6534)
- Support cuDNN 8.5 (#7008)
- Fix
cupy.apply_along_axis
for tuple retval (#7068) - Add wrapper for
cutensorPermutation
(#7070)
Bug Fixes
- Fix JIT for scalar argument (#6948)
- Make sparse argmin/max return a scalar array containing the index (#6976)
- Fix
csrsm2
memory leak (#7039) - Make sure weibull distribution support ndarrays (#7048)
- Fix bessel test to pass ROCm CI (#7081)
Code Fixes
- Cosmetic change in
_routine_indexing.pyx
(#7053)
Documentation
Tests
- CI: Create a status for FlexCI dashboard (#7024)
- CI: Migrate to GAR from GCR (#7064)
- CI: tentatively fix hypothesis version (#7072)
Others
- Introduce pre-commit (#6987)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@1MrEnot @andfoy @asi1024 @betatim @khushi-411 @kmaehashi @leofang @maronuu @takagi @wyli
v11.2.0
This is the release note of v11.2.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Changes
Enhancements
Bug Fixes
- Fix
csrsm2
memory leak (#7041) - Fix JIT for scalar argument (#7043)
- Make sparse argmin/max return a scalar array containing the index (#7057)
Code Fixes
- Cosmetic change in
_routine_indexing.pyx
(#7056)
Documentation
Installation
- Remove use of
distutils.utils
(#7009)
Tests
- CI: Create a status for FlexCI dashboard (#7034)
- CI: Migrate to GAR from GCR (#7066)
- CI: tentatively fix hypothesis version (#7073)
Others
- Introduce pre-commit (#7067)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v12.0.0a1
This is the release note of v12.0.0a1. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Increased cupyx.scipy
APIs (#6823, #6849, #6855, #6890, #6958, #6971)
The coverage of SciPy interpolate
, stats
& special
APIs has increased. (Thanks @khushi-411 & @andoorve!)
Jetson AGX Orin Support (#6876)
Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64
Changes
New Features
- Add
cupy.heaviside
api. (#6798) - Add
cupyx.scipy.special.log_softmax
(#6823) - Add
cupyx.scipy.stats.boxcox_llf
(#6849) - Add
cupyx.scipy.stats.{zmap, zscore}
(#6855) - Add
cupyx.scipy.special.softmax
(#6890) - Add
dtype
,fweights
,aweights
tocupy.cov
(#6892) - Add
cupyx.scipy.interpolate.BarycentricInterpolator
(#6958) - Add
scipy.special.cosm1
tocupyx
(#6971)
Enhancements
- Enhance JIT error message when
__device__
option is missing (#6837) - Fix
augassign
target is evaluated twice in JIT (#6844) - JIT: Add type annotation in
_compile.py
(#6859) - Add complex support for
nanvar
andnanstd
(#6869) - Update
cupy.array_api
(#6871) - Accept
kind
insort
/argsort
and fixcupy.array_api.{sort,argsort}
accordingly (#6872) - Add CC 8.7 for Jetson Orin (#6876)
- Update
cupy-wheel
for v11 (#6903) - Support
deg
incupy.angle
(#6905) - Make sure that uniform sampling respects broadcasting (#6928)
- Update
cupy.array_api
(cont'd) (#6932) - Support SciPy 1.9 (#6962)
- Make testing decorators able to use with
@pytest.mark.parametrize
in some cases (#6984) - Relaxed C-contiguous requirement for changing dtype of different size (#6848)
- Support
keepdims
parameter foraverage
(#6852) - Support
equal_nan
parameter forunique
(#6853)
Performance Improvements
- Efficiency improvements in
cupyx.scipy.ndimage
utilities (#6953)
Bug Fixes
- Generate CUBIN for all supported GPUs at build time (#6875)
- Fix
boxcox_llf
(#6884) - Fix real and imag in subclass (#6896)
- Fix
cupy.clip
to match numpy (#6920) - Let
argpartition
use the kth argument properly (#6921) - Fix cuTensorNet shim layer (#6934)
- Fix occasional hang in sparse distributed (#6942)
- Fix SciPy dependency leak (#6947)
- Fix CUB reduction with zero-size arrays (#6960)
Code Fixes
- Fix function names (#6877)
- Remove proxy functions for softlink (#6879)
- Suppress nvcc warning (#6954)
Documentation
- Bump documentation build requirements (#6825)
- Reverting to v10 installation instruction until v11 stable release (#6836)
- Fix ROCm supported versions in compat matrix (#6846)
- Generate docs for private classes in one location (#6857)
- Expand breaking change & best practice on device management (#6883)
- Update installation guide for v11 (aarch64) (#6888)
- Update install instructions on README (#6889)
- Document matmul supports out (#6898)
- Fix docs build failure (#6955)
Installation
- Reorganize build scripts: define compile options declaratively (#6911)
- Parallelize Cythonize (#6975)
- Remove use of
distutils.utils
(#7006)
Examples
- Make matrix in CG example positive definite (#6939)
Tests
- Update tags for FlexCI projects (#6814)
- Add config for
cupy.win.cuda117
(#6880) - Fix
XFAIL
fortests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py
whenscipy>=1.9.0rc2
(#6894) - Use ubuntu-22.04 as GitHub Actions runner image (#6988)
- Revert comment fix (#6995)
- Filter warnings from setuptools 65 (#7000)
- CI: bump CUDA version used in cuda-python test (#7022)
- CI: Add ROCm 5.1 and 5.2 (#6828)
- CI: Show all errors when doc build fail (#6910)
Others
- Bump version to v12.0.0a1 (#7027)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@andfoy @andoorve @asi1024 @BasLaa @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @pri1311 @takagi @tom24d @toslunar @tpkessler
v11.1.0
This is the release note of v11.1.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Jetson AGX Orin Support (#6876)
Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64
Changes
New Features
- Add
cupyx.scipy.special.log_softmax
(#6966)
Enhancements
- Update
cupy.array_api
(#6929) - Add CC 8.7 for Jetson Orin (#6950)
- Accept
kind
insort
/argsort
and fixcupy.array_api.{sort,argsort}
accordingly (#6951) - Fix
augassign
target is evaluated twice in JIT (#6964) - Update
cupy.array_api
(cont'd) (#6973) - Support SciPy 1.9 (#6981)
- Enhance JIT error message when
__device__
option is missing (#6991) - JIT: Add type annotation in
_compile.py
(#6993) - Make testing decorators able to use with
@pytest.mark.parametrize
in some cases (#7010) - Support
keepdims
parameter foraverage
(#6897) - Support
equal_nan
parameter forunique
(#6904)
Bug Fixes
- Fix CUB reduction with zero-size arrays (#6968)
- Fix cuTensorNet shim layer (#6979)
- Fix SciPy dependency leak (#6980)
- Fix occasional hang in sparse distributed (#6997)
- Let
argpartition
use the kth argument properly (#7020)
Code Fixes
Documentation
- Document
matmul
supportsout
(#6899) - Bump documentation build requirements (#6930)
- Expand breaking change & best practice on device management (#6952)
- Fix docs build failure (#6967)
Tests
- Fix
XFAIL
fortests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py
whenscipy>=1.9.0rc2
(#6963) - Use ubuntu-22.04 as GitHub Actions runner image (#6992)
- Revert comment fix (#6996)
- Filter warnings from setuptools 65 (#7004)
- CI: bump CUDA version used in cuda-python test (#7023)
- CI: Show all errors when doc build fail (#6945)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @emcastillo @khushi-411 @kmaehashi @leofang @takagi @toslunar
v11.0.0
This is the release note of v11.0.0. See here for the complete list of solved issues and merged PRs.
This release note only covers changes made since v11.0.0rc1 release. Check out our blog for highlights in the v11 release!
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
cupy-wheel
package
Currently, downstream projects depending on CuPy had a hard time specifying a binary wheel as a dependency, and it was the users’ responsibility to install the correct package in their environments. CuPy v10 introduced the experimental cupy-wheel
meta-package. In this release, we declare this feature ready for production environments. cupy-wheel
will examine the users’ environment and automatically select the matching CuPy binary wheel to be installed.
Changes
For all changes in v11, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).
Enhancements
- Support
deg
incupy.angle
(#6909) - Update
cupy-wheel
for v11 (#6913) - Relaxed C-contiguous requirement for changing
dtype
of different size (#6850)
Bug Fixes
Code Fixes
- Fix function names (#6878)
Documentation
- Fix ROCm supported versions in compat matrix (#6851)
- Generate docs for private classes in one location (#6858)
Installation
- Bump version to v11.0.0 (#6915)
Tests
- Update tags for FlexCI projects (#6860)
- CI: Add ROCm 5.1 and 5.2 (#6861)
- Add config for
cupy.win.cuda117
(#6885)
Others
- Bump branch version to v11 (#6845)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v11.0.0rc1
This is the release note of v11.0.0rc1. See here for the complete list of solved issues and merged PRs.
We are going to release v11.0.0 on July 28th. Please start testing your workload with this release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
). See the Upgrade Guide for the list of possible breaking changes.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support CUDA 11.7 (#6767)
Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
Unified Binary Package for CUDA 11.2 or later (#6730)
CuPy v11 provides a unified binary package named cupy-cuda11x
that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112
, cupy-cuda113
, …, cupy-cuda117
) provided in CuPy v10 or earlier.
Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102
, cupy-cuda110
, and cupy-cuda111
will be provided for CUDA 10.2, 11.0, and 11.1, respectively.
Binary Package for Arm Platform (#6705)
CuPy v11 provides cupy-cuda11x
binary package built for aarch64, which supports CUDA 11.2+ Arm SBSA and JetPack 5.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64
Support for ndarray
subclassing (#6720, #6755)
This release allows users to subclass cupy.ndarray
, using the same protocol as NumPy:
class C(cupy.ndarray):
def __new__(cls, *args, info=None, **kwargs):
obj = super().__new__(cls, *args, **kwargs)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None:
return
self.info = getattr(obj, 'info', None)
a = C([0, 1, 2, 3], info='information')
assert type(a) is C
assert issubclass(type(a), cupy.ndarray)
assert a.info == 'information'
Note that view casting and new from template mechanisms are also supported as described by the NumPy documentation.
Add Collective Communication APIs in cupyx.distributed
for Sparse Matrices
All the collective calls implemented for dense matrices now support sparse matrices. Users interested in this feature should install mpi4py
in order to perform an efficient metadata exchange.
Google Summer of Code 2022
We would like to give a warm welcome to @khushi-411 who will be working in adding support for the cupyx.scipy.interpolate
APIs as part of her GSoC internship!
Changes without compatibility
Bump base Docker image to the latest supported one (#6802)
CuPy official Docker images have been upgraded. Users relying on these images may suffer from compatibility issues with preinstalled tools or libraries.
Changes
New Features
- Add
cupy.setxor1d
(#6582) - Add initial
cupyx.spatial.distance
support from pylibraft (#6690) - Support
cupy.ndarray
subclassing - Part 2 - View casting (#6720) - Add sparse
broadcast
(#6758) - Add sparse
reduce
(#6761) - Add sparse
all_reduce
and minor fixes (#6762) - Add sparse
all_to_all
,reduce_scatter
,send_recv
(#6765) - Subclass
cupy.ndarray
subclassing - Part 3 - New from template (ufunc) (#6775) - Add
cupyx.scipy.special.log_ndtr
(#6776) - Add
cupyx.scipy.special.expn
(#6790)
Enhancements
- Utilize CUDA Enhanced Compatibility (#6730)
- Fix to return correct CUDA version when in CUDA Python mode (#6736)
- Support CUDA 11.7 (#6767)
- Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6791)
- Utilize CUDA Enhanced Compatibility in all wrappers (#6799)
- Add support for
cupy-cuda11x
wheel (#6800) - Bump base Docker image to the latest supported one (#6802)
- Remove
CUPY_CUDA_VERSION
as much as possible (#6810) - Raise UserWarning in
cupy.cuda.compile_with_cache
(#6818) - cupy-wheel: Use NVRTC to infer the toolkit version (#6819)
- Support NumPy 1.23 (#6820)
- Fix for NumPy 1.23 (#6807)
Performance Improvements
- Improved integer matrix multiplication performance by modifying tuning parameters (#6703)
- Use fast convolution algorithm in
cupy.poly1d.__pow__
(#6770)
Bug Fixes
- Fix polynomial tests (#6721)
- Fix batched matmul for integral numbers (#6725)
- Fix
cupy.median
for NaN inputs (#6759) - Fix required cusparse symbol not loaded in CUDA 11.1.1 (#6806)
Code Fixes
- Add type annotation in
_cuda_types.py
(#6726) - Subclass rename (#6746)
- Add type annotation to JIT internal types (#6778)
Documentation
- Add CUDA 11.7 on documents (#6768)
- Improved NVTX documentation (#6774)
- Fix docs to hide
ndarray_base
(#6782) - Update docs for
cupy-cuda11x
wheel (#6803) - Bump NumPy version used in docs (#6824)
- Add upgrade guide for CuPy v11 (#6826)
Tests
- Fix mempool tests (#6591)
- CI: Fix prep script to show build failure details (#6781)
- Fix a potential variable misuse bug (#6786)
- Fix CI Docker image build failing in head test (#6804)
- Tiny clean up in CI script (#6809)
Others
- Fix docker workflow to push to latest image (#6832)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@andoorve @asi1024 @asmeurer @cjnolet @emcastillo @khushi-411 @kmaehashi @leofang @LostBenjamin @pri1311 @rietmann-nv @takagi
v10.6.0
This is the release note of v10.6.0. See here for the complete list of solved issues and merged PRs.
This is the last planned release for CuPy v10 series. We are going to release v11.0.0 on July 28th. Please start testing your workload with the v11 release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre
). See the Upgrade Guide for the list of possible breaking changes in v11.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support CUDA 11.7 (#6767)
Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install cupy-cuda117
Changes without compatibility
Changes
Enhancements
- Improve warning message in sparse (#6675)
- Support CUDA 11.7 (#6794)
- Make the warning for
cupy.array_api
say "cupy" instead of "numpy" (#6795) - cupy-wheel: Use NVRTC to infer the toolkit version (#6831)
Bug Fixes
Documentation
- Add CUDA 11.7 on documents (#6801)
Tests
- Fix Dockerfile broken for array-api tests (#6518)
- Skip
ndimage.filter
tests for ROCm 4.0 (#6676) - Xfail a test of LOBPCG on ROCm 5.0+ (#6733)
- CI: Fix prep script to show build failure details (#6784)
- Fix a potential variable misuse bug (#6788)
- Fix CI Docker image build failing in head test (#6808)
- Skip
ndimage.filter
tests for ROCm 4.0 (#6676)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @asmeurer @emcastillo @kmaehashi @LostBenjamin @takagi
v11.0.0b3
This is the release note of v11.0.0b3. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
Support cuTensorNet as an einsum
backend (#6677) (thanks @leofang!)
A new accelerator for CuPy has been added (CUPY_ACCELERATORS=cutensornet
).
This feature requires cuquantum-python >= 22.03
and cuTENSOR >= 1.5.0
. And is used to accelerate and support large array sizes in the cupy.linalg.einsum
API.
Changes without compatibility
Drop Support for ROCm 4.2 (#6734)
CuPy v11 will drop support for ROCm 4.2. We recommend users to use ROCm 4.3 or 5.0 instead.
Drop Support for NumPy 1.18/1.19 and SciPy 1.4/1.5 (#6735)
As per NEP29, NumPy 1.18/1.9 support has been dropped on 2021. SciPy supported versions are the one released close to NumPy supported ones.
Changes
New Features
- Support cuTensorNet (from cuQuantum) as an
einsum
backend (#6677) - Add
cupy.poly
(#6697) - Support cupy.ndarray subclassing - Part 1 - Direct constructor call (#6716)
Enhancements
- Support cuDNN 8.4 (#6641)
- Support cuTENSOR 1.5.0 (#6665)
- JIT: Use C++14 (#6670)
- Support cuTENSOR 1.5.0 (#6722)
- Drop support for ROCm 4.2 in CuPy v11 (#6734)
- Drop support for NumPy 1.18/1.19 and SciPy 1.4/1.5 in CuPy v11 (#6735)
- Fix compilation warning caused by
ifdef
(#6739)
Performance Improvements
- Accelerate
bincount
,histogram2d
,histogramdd
with CUB (#6701)
Bug Fixes
- Fix memory leak in the FFT plan cache during multi-threading (#6704)
- Fix
ifdef
for ROCm >= 4.2 (#6750)
Code Fixes
- JIT: Cosmetic change of
Dim3
class (#6644)
Documentation
- Fix imports of
scatter_add
example (#6696) - Minor improvement on the array API docs (#6706)
- Document the returned benchmark object (#6712)
- Use exposed name in user guide (#6718)
Tests
- Xfail a test of
LOBPCG
on ROCm 5.0+ (#6603) - CI: Update repo for libcudnn7 in cuda10.2 (#6708)
- Bump pinned mypy version (#6710)
- Follow
scipy==1.8.1
sparse dot bugfix (#6727) - Support testing CUDA 11.6+ in FlexCI (#6731)
- Fix GPG key issue in FlexCI base image (#6738)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @Dahlia-Chehata @emcastillo @kmaehashi @leofang @takagi
v10.5.0
This is the release note of v10.5.0. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Update (2022-06-17): Wheels for CUDA 11.5 Arm SBSA are now available in the Assets section below. (#6705)
Changes
Enhancements
Bug Fixes
- Fix memory leak in the FFT plan cache during multi-threading (#6732)
- Fix
ifdef
for ROCm >= 4.2 (#6751)
Documentation
Tests
- CI: Update repo for libcudnn7 in cuda10.2 (#6709)
- Pin mypy version in setup.py (#6711)
- Follow
scipy==1.8.1
sparse dot bugfix (#6728) - Support testing CUDA 11.6+ in FlexCI (#6737)
- Fix GPG key issue in FlexCI base image (#6743)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
v11.0.0b2
This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.
We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!
Highlights
JIT Improvements (#6620, #6640, #6649, #6668)
CuPy JIT has been further enhanced thanks to @leofang and @eternalphane!
It is now possible to use CUDA cooperative groups and access .shape
and .strides
attributes of ndarrays.
import cupy
from cupyx import jit
@jit.rawkernel()
def kernel(x, y):
size = x.shape[0]
ntid = jit.gridDim.x * jit.blockDim.x
tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
for i in range(tid, size, ntid):
y[i] = x[i]
g = jit.cg.this_thread_block()
g.sync()
x = cupy.arange(200, dtype=cupy.int64)
y = cupy.zeros((200,), dtype=cupy.int64)
kernel[2, 32](x, y)
print(kernel.cached_code)
The above program emits the CUDA code as follows:
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) {
ptrdiff_t i;
ptrdiff_t size = thrust::get<0>(x.get_shape());
unsigned int ntid = (gridDim.x * blockDim.x);
unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x);
for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) {
i = __it;
y[i] = x[i];
}
cg::thread_block g = cg::this_thread_block();
g.sync();
}
Initial MPI and sparse matrix support in cupyx.distributed
(#6628, #6658)
CuPy v10 added the cupyx.distributed
API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse
. Currently only send
/recv
primitives are supported but we will be adding support for collective calls in the following releases.
Additionally, now it is possible to use MPI (through the mpi4py
python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.
# run with mpiexec -n N python …
import mpi4py
comm = mpi4py.MPI.COMM_WORLD
workers = comm.Get_size()
rank = comm.Get_rank()
comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)
Announcements
Introduction of generic cupy-wheel
(EXPERIMENTAL) (#6012)
We have added a new package in the PyPI called cupy-wheel
. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.
pip install cupy-wheel
This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.
This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)
Changes
New Features
- Support cooperative group in JIT compiler (#6620)
- Add support for sparse matrices in
cupyx.distributed
(#6628) - JIT: Support compile-time for-loop unrolling (#6649)
- JIT: Support
.shape
and.strides
(#6668)
Enhancements
- Add a few driver/runtime/nvrtc API wrappers (#6604)
- Implement
flatten(order)
(#6613) - Implemented a
__repr__
forcupyx.profiler._time._PerfCaseResult
(#6617) - JIT: Avoid calling default constructor if possible (#6619)
- Add missing
cudaDevAttrMemoryPoolsSupported
to hip (#6621) - Add CC 3.2 to Tegra arch list (#6631)
- JIT: Add more cooperative group APIs (#6640)
- JIT: Add
kernel.cached_code
test (#6643) - Use MPI for management in
cupyx.distributed
(#6658) - Improve warning message in sparse (#6669)
Performance Improvements
Bug Fixes
- Define
float16::operator-()
only for ROCm 5.0+ (#6624) - JIT: fix access to cached codes (#6639)
- Fix cuda python CI (#6652)
- Fix int64 overflow in
cupy.polyval
(#6664) - JIT: Disable
memcpy_async
on CUDA 11.0 (#6671)
Documentation
- Add
--pre
option to instructions installing pre-releases (#6612) - JIT: fix function signatures in the docs (#6648)
- Fix typo in performance guide (#6657)
Installation
- Add universal CuPy package (#6012)
Tests
- Run daily benchmark with head branch against latest release (#6598)
- CI: Trigger FlexCI for hotfix branches (#6625)
- Remove
jenkins
requirements (#6632) - Fix
TestIncludesCompileCUDA
for HEAD tests (#6646) - Trigger CUDA Python tests with
/test mini
(#6653) - Fix missing f prefix on f-strings fix (#6674)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar