Torch7 FFI bindings for NVidia NCCL library.
- Install NCCL from
- Have at least Cuda 7.0
- Have in your library path
- allReduce
- reduce
- broadcast
- allGather
Argument to the collective call should be a table of contiguous tensors located on the different devices. Example: perform in-place allReduce on the table of tensors:
require 'nccl'
where inputs is a table of contiguous tensors of the same size located on the different devices.