Use thrust::inclusive_scan for 1D cumsum/cumprod #742

colesbury · 2017-04-04T17:53:58Z

For large 1D tensors thrust::inclusive_scan is much faster than our current implementation.

For 4 million elements thrust takes about 430µs vs 200ms for the current implementation.

For large 1D tensors thrust::inclusive_scan is much faster than our current implementation.

pavanky · 2017-04-04T18:36:42Z

May be consider using thrust + for loop for scanning along inner most dimension if the current version is really that slow.

colesbury · 2017-04-05T01:12:57Z

@pavanky, there are cases where the current implementation would be faster for 2D+ tensors so I didn't want to completely switch to that (if the outer dimension is large enough to parallelize over). If the 2D+ case is a bottleneck we may want to revisit this and possibly pull in segmented scan from moderngpu.

wickedfoo · 2017-04-05T02:10:09Z

This is the general tradeoff that much of cutorch makes for anything that is a reduction of sorts. Many cutorch reduction functions operate under the assumption that there is a large enough batch (you are doing something along dimension D of an N-dimensional tensor) rather than one thing along a 1-d array. The implementation usually just targets the first (batch) case, leaving bad performance for the non-batch case, which usually requires a different specialization to solve the problem via recursive sub-division or multiple kernel launches with scratch space.

Use thrust::inclusive_scan for 1D cumsum/cumprod

d0664ae

For large 1D tensors thrust::inclusive_scan is much faster than our current implementation.

soumith merged commit 6e0ef02 into torch:master Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use thrust::inclusive_scan for 1D cumsum/cumprod #742

Use thrust::inclusive_scan for 1D cumsum/cumprod #742

colesbury commented Apr 4, 2017

pavanky commented Apr 4, 2017

colesbury commented Apr 5, 2017

wickedfoo commented Apr 5, 2017 •

edited

Loading

Use thrust::inclusive_scan for 1D cumsum/cumprod #742

Use thrust::inclusive_scan for 1D cumsum/cumprod #742

Conversation

colesbury commented Apr 4, 2017

pavanky commented Apr 4, 2017

colesbury commented Apr 5, 2017

wickedfoo commented Apr 5, 2017 • edited Loading

wickedfoo commented Apr 5, 2017 •

edited

Loading