06 Oct 07:15

f47577b

v12.0.0a2 Pre-release

Pre-release

This is the release note of v12.0.0a2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Increased `cupyx.scipy` APIs (#6773, #6990, #7014, #7015, #7036)

The coverage of SciPy interpolate & special APIs has increased. (Thanks @khushi-411 & @1MrEnot!)

Initial support for `ufunc` methods (#7049)

Starting from v12, CuPy will support the corresponding NumPy ufunc methods.
This release adds compatibility with ufunc.outer. Check the tracking issue (#7082) for detailed information.

Changes

New Features

Add cupyx.scipy.special.logsumexp (#6773)
Add cupyx.scipy.interpolate.KroghInterpolator (#6990)
Add scipy.special.expi and scipy.special.exp1 (#7014)
Add cupy.byte_bounds (#7015)
Adds cupyx.scipy.special.k0, cupyx.scipy.special.k1, cupyx.scipy.special.k0e, cupyx.scipy.special.k1e (#7036)
Add ufunc.outer (#7049)
Expose pairwise distance functions (#7063)

Enhancements

Support NCCL 2.12 ~ 2.14 (#6534)
Support cuDNN 8.5 (#7008)
Fix cupy.apply_along_axis for tuple retval (#7068)
Add wrapper for cutensorPermutation (#7070)

Bug Fixes

Fix JIT for scalar argument (#6948)
Make sparse argmin/max return a scalar array containing the index (#6976)
Fix csrsm2 memory leak (#7039)
Make sure weibull distribution support ndarrays (#7048)
Fix bessel test to pass ROCm CI (#7081)

Code Fixes

Cosmetic change in _routine_indexing.pyx (#7053)

Documentation

Fixes docstring for interpolation prefiltering (#6998)
Typo fix (#7045)

Tests

CI: Create a status for FlexCI dashboard (#7024)
CI: Migrate to GAR from GCR (#7064)
CI: tentatively fix hypothesis version (#7072)

Others

Introduce pre-commit (#6987)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@1MrEnot @andfoy @asi1024 @betatim @khushi-411 @kmaehashi @leofang @maronuu @takagi @wyli

Contributors

wyli, kmaehashi, and 8 other contributors

Assets 50

06 Oct 07:15

kmaehashi

v11.2.0

af6da44

v11.2.0

This is the release note of v11.2.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Changes

Enhancements

Support NCCL 2.12 ~ 2.14 (#7069)
Support cuDNN 8.5 (#7071)

Bug Fixes

Fix csrsm2 memory leak (#7041)
Fix JIT for scalar argument (#7043)
Make sparse argmin/max return a scalar array containing the index (#7057)

Code Fixes

Cosmetic change in _routine_indexing.pyx (#7056)

Documentation

Fixes docstring for interpolation prefiltering (#7037)
Typo fix (#7047)

Installation

Remove use of distutils.utils (#7009)

Tests

CI: Create a status for FlexCI dashboard (#7034)
CI: Migrate to GAR from GCR (#7066)
CI: tentatively fix hypothesis version (#7073)

Others

Introduce pre-commit (#7067)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @asi1024 @betatim @kmaehashi @leofang @takagi @wyli

Contributors

wyli, kmaehashi, and 5 other contributors

Assets 50

01 Sep 06:33

emcastillo

v12.0.0a1

1760207

v12.0.0a1 Pre-release

Pre-release

This is the release note of v12.0.0a1. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Increased `cupyx.scipy` APIs (#6823, #6849, #6855, #6890, #6958, #6971)

The coverage of SciPy interpolate, stats & special APIs has increased. (Thanks @khushi-411 & @andoorve!)

Jetson AGX Orin Support (#6876)

Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64

Changes

New Features

Add cupy.heaviside api. (#6798)
Add cupyx.scipy.special.log_softmax (#6823)
Add cupyx.scipy.stats.boxcox_llf (#6849)
Add cupyx.scipy.stats.{zmap, zscore} (#6855)
Add cupyx.scipy.special.softmax (#6890)
Add dtype, fweights, aweights to cupy.cov (#6892)
Add cupyx.scipy.interpolate.BarycentricInterpolator (#6958)
Add scipy.special.cosm1 to cupyx (#6971)

Enhancements

Enhance JIT error message when __device__ option is missing (#6837)
Fix augassign target is evaluated twice in JIT (#6844)
JIT: Add type annotation in _compile.py (#6859)
Add complex support for nanvar and nanstd (#6869)
Update cupy.array_api (#6871)
Accept kind in sort/argsort and fix cupy.array_api.{sort,argsort} accordingly (#6872)
Add CC 8.7 for Jetson Orin (#6876)
Update cupy-wheel for v11 (#6903)
Support deg in cupy.angle (#6905)
Make sure that uniform sampling respects broadcasting (#6928)
Update cupy.array_api (cont'd) (#6932)
Support SciPy 1.9 (#6962)
Make testing decorators able to use with @pytest.mark.parametrize in some cases (#6984)
Relaxed C-contiguous requirement for changing dtype of different size (#6848)
Support keepdims parameter for average (#6852)
Support equal_nan parameter for unique (#6853)

Performance Improvements

Efficiency improvements in cupyx.scipy.ndimage utilities (#6953)

Bug Fixes

Generate CUBIN for all supported GPUs at build time (#6875)
Fix boxcox_llf (#6884)
Fix real and imag in subclass (#6896)
Fix cupy.clip to match numpy (#6920)
Let argpartition use the kth argument properly (#6921)
Fix cuTensorNet shim layer (#6934)
Fix occasional hang in sparse distributed (#6942)
Fix SciPy dependency leak (#6947)
Fix CUB reduction with zero-size arrays (#6960)

Code Fixes

Fix function names (#6877)
Remove proxy functions for softlink (#6879)
Suppress nvcc warning (#6954)

Documentation

Bump documentation build requirements (#6825)
Reverting to v10 installation instruction until v11 stable release (#6836)
Fix ROCm supported versions in compat matrix (#6846)
Generate docs for private classes in one location (#6857)
Expand breaking change & best practice on device management (#6883)
Update installation guide for v11 (aarch64) (#6888)
Update install instructions on README (#6889)
Document matmul supports out (#6898)
Fix docs build failure (#6955)

Installation

Reorganize build scripts: define compile options declaratively (#6911)
Parallelize Cythonize (#6975)
Remove use of distutils.utils (#7006)

Examples

Make matrix in CG example positive definite (#6939)

Tests

Update tags for FlexCI projects (#6814)
Add config for cupy.win.cuda117 (#6880)
Fix XFAIL for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py when scipy>=1.9.0rc2 (#6894)
Use ubuntu-22.04 as GitHub Actions runner image (#6988)
Revert comment fix (#6995)
Filter warnings from setuptools 65 (#7000)
CI: bump CUDA version used in cuda-python test (#7022)
CI: Add ROCm 5.1 and 5.2 (#6828)
CI: Show all errors when doc build fail (#6910)

Others

Bump version to v12.0.0a1 (#7027)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andfoy @andoorve @asi1024 @BasLaa @emcastillo @grlee77 @khushi-411 @kmaehashi @leofang @pri1311 @takagi @tom24d @toslunar @tpkessler

Contributors

kmaehashi, takagi, and 12 other contributors

Assets 50

01 Sep 06:33

emcastillo

v11.1.0

7f145cf

v11.1.0

This is the release note of v11.1.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Jetson AGX Orin Support (#6876)

Arm (aarch64) wheels are now compiled with support for compute capability 8.7.
These wheels are available through our Pip index: pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64

Changes

New Features

Add cupyx.scipy.special.log_softmax (#6966)

Enhancements

Update cupy.array_api (#6929)
Add CC 8.7 for Jetson Orin (#6950)
Accept kind in sort/argsort and fix cupy.array_api.{sort,argsort} accordingly (#6951)
Fix augassign target is evaluated twice in JIT (#6964)
Update cupy.array_api (cont'd) (#6973)
Support SciPy 1.9 (#6981)
Enhance JIT error message when __device__ option is missing (#6991)
JIT: Add type annotation in _compile.py (#6993)
Make testing decorators able to use with @pytest.mark.parametrize in some cases (#7010)
Support keepdims parameter for average (#6897)
Support equal_nan parameter for unique (#6904)

Bug Fixes

Fix CUB reduction with zero-size arrays (#6968)
Fix cuTensorNet shim layer (#6979)
Fix SciPy dependency leak (#6980)
Fix occasional hang in sparse distributed (#6997)
Let argpartition use the kth argument properly (#7020)

Code Fixes

Remove proxy functions for softlink (#6946)
Suppress nvcc warning (#6970)

Documentation

Document matmul supports out (#6899)
Bump documentation build requirements (#6930)
Expand breaking change & best practice on device management (#6952)
Fix docs build failure (#6967)

Tests

Fix XFAIL for tests/cupyx_tests/scipy_tests/sparse_tests/test_coo.py when scipy>=1.9.0rc2 (#6963)
Use ubuntu-22.04 as GitHub Actions runner image (#6992)
Revert comment fix (#6996)
Filter warnings from setuptools 65 (#7004)
CI: bump CUDA version used in cuda-python test (#7023)
CI: Show all errors when doc build fail (#6945)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @khushi-411 @kmaehashi @leofang @takagi @toslunar

Contributors

kmaehashi, takagi, and 5 other contributors

Assets 50

28 Jul 07:58

asi1024

v11.0.0

9bddf35

v11.0.0

This is the release note of v11.0.0. See here for the complete list of solved issues and merged PRs.

This release note only covers changes made since v11.0.0rc1 release. Check out our blog for highlights in the v11 release!

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

`cupy-wheel` package

Currently, downstream projects depending on CuPy had a hard time specifying a binary wheel as a dependency, and it was the users’ responsibility to install the correct package in their environments. CuPy v10 introduced the experimental cupy-wheel meta-package. In this release, we declare this feature ready for production environments. cupy-wheel will examine the users’ environment and automatically select the matching CuPy binary wheel to be installed.

Changes

For all changes in v11, please refer to the release notes of the pre-releases (alpha1, alpha2, beta1, beta2, beta3, rc1).

Enhancements

Support deg in cupy.angle (#6909)
Update cupy-wheel for v11 (#6913)
Relaxed C-contiguous requirement for changing dtype of different size (#6850)

Bug Fixes

Generate CUBIN for all supported GPUs at build time (#6881)
Fix real and imag in subclass (#6907)

Code Fixes

Fix function names (#6878)

Documentation

Fix ROCm supported versions in compat matrix (#6851)
Generate docs for private classes in one location (#6858)

Installation

Bump version to v11.0.0 (#6915)

Tests

Update tags for FlexCI projects (#6860)
CI: Add ROCm 5.1 and 5.2 (#6861)
Add config for cupy.win.cuda117 (#6885)

Others

Bump branch version to v11 (#6845)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@emcastillo @kmaehashi @takagi

Contributors

kmaehashi, takagi, and emcastillo

Assets 50

30 Jun 06:53

kmaehashi

v11.0.0rc1

15b74fe

v11.0.0rc1 Pre-release

Pre-release

This is the release note of v11.0.0rc1. See here for the complete list of solved issues and merged PRs.

We are going to release v11.0.0 on July 28th. Please start testing your workload with this release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre). See the Upgrade Guide for the list of possible breaking changes.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support CUDA 11.7 (#6767)

Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre

Unified Binary Package for CUDA 11.2 or later (#6730)

CuPy v11 provides a unified binary package named cupy-cuda11x that supports all CUDA 11.2+ releases. This replaces per-CUDA version binary packages (cupy-cuda112, cupy-cuda113, …, cupy-cuda117) provided in CuPy v10 or earlier.

Note that CUDA 11.1 or earlier still requires per-CUDA version binary packages. cupy-cuda102, cupy-cuda110, and cupy-cuda111 will be provided for CUDA 10.2, 11.0, and 11.1, respectively.

Binary Package for Arm Platform (#6705)

CuPy v11 provides cupy-cuda11x binary package built for aarch64, which supports CUDA 11.2+ Arm SBSA and JetPack 5.
These wheels are available through our Pip index: pip install --pre cupy-cuda11x -f https://pip.cupy.dev/aarch64

Support for `ndarray` subclassing (#6720, #6755)

This release allows users to subclass cupy.ndarray, using the same protocol as NumPy:

class C(cupy.ndarray):

    def __new__(cls, *args, info=None, **kwargs):
        obj = super().__new__(cls, *args, **kwargs)
        obj.info = info
        return obj

    def __array_finalize__(self, obj):
        if obj is None:
            return
        self.info = getattr(obj, 'info', None)

a = C([0, 1, 2, 3], info='information')
assert type(a) is C
assert issubclass(type(a), cupy.ndarray)
assert a.info == 'information'

Note that view casting and new from template mechanisms are also supported as described by the NumPy documentation.

Add Collective Communication APIs in `cupyx.distributed` for Sparse Matrices

All the collective calls implemented for dense matrices now support sparse matrices. Users interested in this feature should install mpi4py in order to perform an efficient metadata exchange.

Google Summer of Code 2022

We would like to give a warm welcome to @khushi-411 who will be working in adding support for the cupyx.scipy.interpolate APIs as part of her GSoC internship!

Changes without compatibility

Bump base Docker image to the latest supported one (#6802)

CuPy official Docker images have been upgraded. Users relying on these images may suffer from compatibility issues with preinstalled tools or libraries.

Changes

New Features

Add cupy.setxor1d (#6582)
Add initial cupyx.spatial.distance support from pylibraft (#6690)
Support cupy.ndarray subclassing - Part 2 - View casting (#6720)
Add sparse broadcast (#6758)
Add sparse reduce (#6761)
Add sparse all_reduce and minor fixes (#6762)
Add sparse all_to_all, reduce_scatter, send_recv (#6765)
Subclass cupy.ndarray subclassing - Part 3 - New from template (ufunc) (#6775)
Add cupyx.scipy.special.log_ndtr (#6776)
Add cupyx.scipy.special.expn (#6790)

Enhancements

Utilize CUDA Enhanced Compatibility (#6730)
Fix to return correct CUDA version when in CUDA Python mode (#6736)
Support CUDA 11.7 (#6767)
Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6791)
Utilize CUDA Enhanced Compatibility in all wrappers (#6799)
Add support for cupy-cuda11x wheel (#6800)
Bump base Docker image to the latest supported one (#6802)
Remove CUPY_CUDA_VERSION as much as possible (#6810)
Raise UserWarning in cupy.cuda.compile_with_cache (#6818)
cupy-wheel: Use NVRTC to infer the toolkit version (#6819)
Support NumPy 1.23 (#6820)
Fix for NumPy 1.23 (#6807)

Performance Improvements

Improved integer matrix multiplication performance by modifying tuning parameters (#6703)
Use fast convolution algorithm in cupy.poly1d.__pow__ (#6770)

Bug Fixes

Fix polynomial tests (#6721)
Fix batched matmul for integral numbers (#6725)
Fix cupy.median for NaN inputs (#6759)
Fix required cusparse symbol not loaded in CUDA 11.1.1 (#6806)

Code Fixes

Add type annotation in _cuda_types.py (#6726)
Subclass rename (#6746)
Add type annotation to JIT internal types (#6778)

Documentation

Add CUDA 11.7 on documents (#6768)
Improved NVTX documentation (#6774)
Fix docs to hide ndarray_base (#6782)
Update docs for cupy-cuda11x wheel (#6803)
Bump NumPy version used in docs (#6824)
Add upgrade guide for CuPy v11 (#6826)

Tests

Fix mempool tests (#6591)
CI: Fix prep script to show build failure details (#6781)
Fix a potential variable misuse bug (#6786)
Fix CI Docker image build failing in head test (#6804)
Tiny clean up in CI script (#6809)

Others

Fix docker workflow to push to latest image (#6832)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@andoorve @asi1024 @asmeurer @cjnolet @emcastillo @khushi-411 @kmaehashi @leofang @LostBenjamin @pri1311 @rietmann-nv @takagi

Contributors

asmeurer, kmaehashi, and 10 other contributors

Assets 50

30 Jun 06:53

kmaehashi

v10.6.0

18fab32

v10.6.0

This is the release note of v10.6.0. See here for the complete list of solved issues and merged PRs.

This is the last planned release for CuPy v10 series. We are going to release v11.0.0 on July 28th. Please start testing your workload with the v11 release candidate (pip install --pre cupy-cuda11x -f https://pip.cupy.dev/pre). See the Upgrade Guide for the list of possible breaking changes in v11.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support CUDA 11.7 (#6767)

Full support for CUDA 11.7 has been added as of this release. Binary packages can be installed with the following command: pip install cupy-cuda117

Changes without compatibility

Changes

Enhancements

Improve warning message in sparse (#6675)
Support CUDA 11.7 (#6794)
Make the warning for cupy.array_api say "cupy" instead of "numpy" (#6795)
cupy-wheel: Use NVRTC to infer the toolkit version (#6831)

Bug Fixes

Fix cupy.median for NaN inputs (#6760)
Fix batched matmul for integral numbers (#6777)

Documentation

Add CUDA 11.7 on documents (#6801)

Tests

Fix Dockerfile broken for array-api tests (#6518)
Skip ndimage.filter tests for ROCm 4.0 (#6676)
Xfail a test of LOBPCG on ROCm 5.0+ (#6733)
CI: Fix prep script to show build failure details (#6784)
Fix a potential variable misuse bug (#6788)
Fix CI Docker image build failing in head test (#6808)
Skip ndimage.filter tests for ROCm 4.0 (#6676)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @asmeurer @emcastillo @kmaehashi @LostBenjamin @takagi

Contributors

asmeurer, kmaehashi, and 4 other contributors

Assets 98

26 May 07:35

emcastillo

v11.0.0b3

1c55be4

v11.0.0b3 Pre-release

Pre-release

This is the release note of v11.0.0b3. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

Support cuTensorNet as an `einsum` backend (#6677) (thanks @leofang!)

A new accelerator for CuPy has been added (CUPY_ACCELERATORS=cutensornet).
This feature requires cuquantum-python >= 22.03 and cuTENSOR >= 1.5.0. And is used to accelerate and support large array sizes in the cupy.linalg.einsum API.

Changes without compatibility

Drop Support for ROCm 4.2 (#6734)

CuPy v11 will drop support for ROCm 4.2. We recommend users to use ROCm 4.3 or 5.0 instead.

Drop Support for NumPy 1.18/1.19 and SciPy 1.4/1.5 (#6735)

As per NEP29, NumPy 1.18/1.9 support has been dropped on 2021. SciPy supported versions are the one released close to NumPy supported ones.

Changes

New Features

Support cuTensorNet (from cuQuantum) as an einsum backend (#6677)
Add cupy.poly (#6697)
Support cupy.ndarray subclassing - Part 1 - Direct constructor call (#6716)

Enhancements

Support cuDNN 8.4 (#6641)
Support cuTENSOR 1.5.0 (#6665)
JIT: Use C++14 (#6670)
Support cuTENSOR 1.5.0 (#6722)
Drop support for ROCm 4.2 in CuPy v11 (#6734)
Drop support for NumPy 1.18/1.19 and SciPy 1.4/1.5 in CuPy v11 (#6735)
Fix compilation warning caused by ifdef (#6739)

Performance Improvements

Accelerate bincount, histogram2d, histogramdd with CUB (#6701)

Bug Fixes

Fix memory leak in the FFT plan cache during multi-threading (#6704)
Fix ifdef for ROCm >= 4.2 (#6750)

Code Fixes

JIT: Cosmetic change of Dim3 class (#6644)

Documentation

Fix imports of scatter_add example (#6696)
Minor improvement on the array API docs (#6706)
Document the returned benchmark object (#6712)
Use exposed name in user guide (#6718)

Tests

Xfail a test of LOBPCG on ROCm 5.0+ (#6603)
CI: Update repo for libcudnn7 in cuda10.2 (#6708)
Bump pinned mypy version (#6710)
Follow scipy==1.8.1 sparse dot bugfix (#6727)
Support testing CUDA 11.6+ in FlexCI (#6731)
Fix GPG key issue in FlexCI base image (#6738)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @Dahlia-Chehata @emcastillo @kmaehashi @leofang @takagi

Contributors

kmaehashi, takagi, and 4 other contributors

Assets 78

26 May 07:35

emcastillo

v10.5.0

f0d0e7b

v10.5.0

This is the release note of v10.5.0. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Update (2022-06-17): Wheels for CUDA 11.5 Arm SBSA are now available in the Assets section below. (#6705)

Changes

Enhancements

Fix compilation warning caused by ifdef (#6740)
Support cuDNN 8.4 (#6741)

Bug Fixes

Fix memory leak in the FFT plan cache during multi-threading (#6732)
Fix ifdef for ROCm >= 4.2 (#6751)

Documentation

Minor improvement on the array API docs (#6714)
Document the returned benchmark object (#6742)

Tests

CI: Update repo for libcudnn7 in cuda10.2 (#6709)
Pin mypy version in setup.py (#6711)
Follow scipy==1.8.1 sparse dot bugfix (#6728)
Support testing CUDA 11.6+ in FlexCI (#6737)
Fix GPG key issue in FlexCI base image (#6743)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @emcastillo @kmaehashi @leofang @takagi

Contributors

kmaehashi, takagi, and 3 other contributors

Assets 74

27 Apr 07:44

asi1024

v11.0.0b2

88138d3

v11.0.0b2 Pre-release

Pre-release

This is the release note of v11.0.0b2. See here for the complete list of solved issues and merged PRs.

We are running a Gitter chat for general discussions and quick questions. Feel free to join the channel to talk with developers and users!

Highlights

JIT Improvements (#6620, #6640, #6649, #6668)

CuPy JIT has been further enhanced thanks to @leofang and @eternalphane!
It is now possible to use CUDA cooperative groups and access .shape and .strides attributes of ndarrays.

import cupy
from cupyx import jit

@jit.rawkernel()
def kernel(x, y):
    size = x.shape[0]
    ntid = jit.gridDim.x * jit.blockDim.x
    tid = jit.blockIdx.x * jit.blockDim.x + jit.threadIdx.x
    for i in range(tid, size, ntid):
        y[i] = x[i]
    g = jit.cg.this_thread_block()
    g.sync()

x = cupy.arange(200, dtype=cupy.int64)
y = cupy.zeros((200,), dtype=cupy.int64)
kernel[2, 32](x, y)

print(kernel.cached_code)

The above program emits the CUDA code as follows:

#include <cooperative_groups.h>
namespace cg = cooperative_groups;

extern "C" __global__ void kernel(CArray<long long, 1, true, true> x, CArray<long long, 1, true, true> y) {
  ptrdiff_t i;
  ptrdiff_t size = thrust::get<0>(x.get_shape());
  unsigned int ntid = (gridDim.x * blockDim.x);
  unsigned int tid = ((blockIdx.x * blockDim.x) + threadIdx.x);
  for (ptrdiff_t __it = tid, __stop = size, __step = ntid; __it < __stop; __it += __step) {
    i = __it;
    y[i] = x[i];
  }
  cg::thread_block g = cg::this_thread_block();
  g.sync();
}

Initial MPI and sparse matrix support in `cupyx.distributed` (#6628, #6658)

CuPy v10 added the cupyx.distributed API to perform interprocess communication using NCCL in a way similar to MPI. In CuPy v11 we are extending this API to support sparse matrices as defined in cupyx.scipy.sparse. Currently only send/recv primitives are supported but we will be adding support for collective calls in the following releases.

Additionally, now it is possible to use MPI (through the mpi4py python package) to initialize the NCCL communicator. This prevents from launching the TCP server used for communication exchange of CPU values. Moreover, we recommend to enable MPI for sparse matrices communication as this requires to exchange metadata per each communication call that lead to device synchronization if MPI is not enabled.

# run with mpiexec -n N python …

import mpi4py
comm = mpi4py.MPI.COMM_WORLD
workers = comm.Get_size()
rank = comm.Get_rank()

comm = cupyx.distributed.init_process_group(workers, rank, use_mpi=True)

Announcements

Introduction of generic `cupy-wheel` (EXPERIMENTAL) (#6012)

We have added a new package in the PyPI called cupy-wheel. This meta package allows other libraries to add a dependency to CuPy with the ability to transparently install the exact CuPy binary wheel matching the user environment. Users can also install CuPy using this package instead of manually specifying a CUDA/ROCm version.

pip install cupy-wheel

This package is only available for the stable release as the current pre-release wheels are not hosted in PyPI.

This feature is currently experimental and subject to change so we recommend users not to distribute packages relying on it for now. Your suggestions or comments are highly welcomed (please visit #6688.)

Changes

New Features

Support cooperative group in JIT compiler (#6620)
Add support for sparse matrices in cupyx.distributed (#6628)
JIT: Support compile-time for-loop unrolling (#6649)
JIT: Support .shape and .strides (#6668)

Enhancements

Add a few driver/runtime/nvrtc API wrappers (#6604)
Implement flatten(order) (#6613)
Implemented a __repr__ for cupyx.profiler._time._PerfCaseResult (#6617)
JIT: Avoid calling default constructor if possible (#6619)
Add missing cudaDevAttrMemoryPoolsSupported to hip (#6621)
Add CC 3.2 to Tegra arch list (#6631)
JIT: Add more cooperative group APIs (#6640)
JIT: Add kernel.cached_code test (#6643)
Use MPI for management in cupyx.distributed (#6658)
Improve warning message in sparse (#6669)

Performance Improvements

Improve copy and assign operation (#6181)
Performance improvement of cupy.intersect1d (#6586)

Bug Fixes

Define float16::operator-() only for ROCm 5.0+ (#6624)
JIT: fix access to cached codes (#6639)
Fix cuda python CI (#6652)
Fix int64 overflow in cupy.polyval (#6664)
JIT: Disable memcpy_async on CUDA 11.0 (#6671)

Documentation

Add --pre option to instructions installing pre-releases (#6612)
JIT: fix function signatures in the docs (#6648)
Fix typo in performance guide (#6657)

Installation

Add universal CuPy package (#6012)

Tests

Run daily benchmark with head branch against latest release (#6598)
CI: Trigger FlexCI for hotfix branches (#6625)
Remove jenkins requirements (#6632)
Fix TestIncludesCompileCUDA for HEAD tests (#6646)
Trigger CUDA Python tests with /test mini (#6653)
Fix missing f prefix on f-strings fix (#6674)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@asi1024 @code-review-doctor @danielg1111 @davidegavio @emcastillo @eternalphane @kmaehashi @leofang @okuta @takagi @toslunar

Contributors

kmaehashi, takagi, and 9 other contributors

Assets 82

Releases: cupy/cupy

v12.0.0a2

Highlights

Increased cupyx.scipy APIs (#6773, #6990, #7014, #7015, #7036)

Initial support for ufunc methods (#7049)

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Others

Contributors

Contributors

v11.2.0

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Contributors

Contributors

v12.0.0a1

Highlights

Increased cupyx.scipy APIs (#6823, #6849, #6855, #6890, #6958, #6971)

Jetson AGX Orin Support (#6876)

Changes

New Features

Enhancements

Performance Improvements

Bug Fixes

Code Fixes

Documentation

Installation

Examples

Tests

Others

Contributors

Contributors

v11.1.0

Highlights

Jetson AGX Orin Support (#6876)

Changes

New Features

Enhancements

Bug Fixes

Code Fixes

Documentation

Tests

Contributors

Contributors

v11.0.0

Highlights

cupy-wheel package

Changes

Enhancements

Bug Fixes

Code Fixes

Documentation

Installation

Tests

Others

Contributors

Contributors

v11.0.0rc1

Highlights

Support CUDA 11.7 (#6767)

Unified Binary Package for CUDA 11.2 or later (#6730)

Binary Package for Arm Platform (#6705)

Support for ndarray subclassing (#6720, #6755)

Add Collective Communication APIs in cupyx.distributed for Sparse Matrices

Google Summer of Code 2022

Changes without compatibility

Bump base Docker image to the latest supported one (#6802)

Changes

New Features

Increased `cupyx.scipy` APIs (#6773, #6990, #7014, #7015, #7036)

Initial support for `ufunc` methods (#7049)

Increased `cupyx.scipy` APIs (#6823, #6849, #6855, #6890, #6958, #6971)

`cupy-wheel` package

Support for `ndarray` subclassing (#6720, #6755)

Add Collective Communication APIs in `cupyx.distributed` for Sparse Matrices

Support cuTensorNet as an `einsum` backend (#6677) (thanks @leofang!)

Initial MPI and sparse matrix support in `cupyx.distributed` (#6628, #6658)

Introduction of generic `cupy-wheel` (EXPERIMENTAL) (#6012)