Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use thrust::inclusive_scan for 1D cumsum/cumprod #742

Merged
merged 1 commit into from
Apr 5, 2017

Conversation

colesbury
Copy link
Contributor

For large 1D tensors thrust::inclusive_scan is much faster than our current implementation.

For 4 million elements thrust takes about 430µs vs 200ms for the current implementation.

For large 1D tensors thrust::inclusive_scan is much faster than our
current implementation.
@pavanky
Copy link
Contributor

pavanky commented Apr 4, 2017

May be consider using thrust + for loop for scanning along inner most dimension if the current version is really that slow.

@soumith soumith merged commit 6e0ef02 into torch:master Apr 5, 2017
@colesbury
Copy link
Contributor Author

@pavanky, there are cases where the current implementation would be faster for 2D+ tensors so I didn't want to completely switch to that (if the outer dimension is large enough to parallelize over). If the 2D+ case is a bottleneck we may want to revisit this and possibly pull in segmented scan from moderngpu.

@wickedfoo
Copy link
Contributor

wickedfoo commented Apr 5, 2017

This is the general tradeoff that much of cutorch makes for anything that is a reduction of sorts. Many cutorch reduction functions operate under the assumption that there is a large enough batch (you are doing something along dimension D of an N-dimensional tensor) rather than one thing along a 1-d array. The implementation usually just targets the first (batch) case, leaving bad performance for the non-batch case, which usually requires a different specialization to solve the problem via recursive sub-division or multiple kernel launches with scratch space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants