Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from master #4

Merged
merged 49 commits into from
Jun 1, 2017
Merged

merge from master #4

merged 49 commits into from
Jun 1, 2017

Conversation

elikosan
Copy link
Owner

@elikosan elikosan commented Jun 1, 2017

No description provided.

colesbury and others added 30 commits March 6, 2017 10:50
This is similar to THCCachingHostAllocator_recordEvent() but on CUDA
allocations. It's useful for overlapping copies with computation. The
workflow is approximately:

  0. allocate dst tensor on copy stream
  1. copy from CPU to GPU on copy stream
  2. synchronize the main stream with the copy stream via
     cudaStreamWaitEvent
  3. THCCachingAllocator_recordStream(dst, main_stream)

The recordStream() call is necessary to prevent the dst tensor from
begin reused on the copy stream before the main stream finishes work.

Previously, you would need to insert a second cudaStreamWaitEvent before
dst is freed to force the copy stream to wait on the main stream.
Add THCCachingAllocator_recordStream()
Check event_count before merging blocks
fix bug that invalidates all tests
key only block-wide bitonic sort
add implementation of inclusive scan via upsweep-downsweep
linspace and logspace for CUDA Tensors
Narrow V when returning only some right singular vectors
Make rinfo_ optional in btrifact
Use zero instead of mul when beta == 0 in addr
Update btrisolve argument order.
Time to get rid of warp-synchronous code. It will break!
For large 1D tensors thrust::inclusive_scan is much faster than our
current implementation.
lvdmaaten and others added 19 commits April 19, 2017 06:57
* move TopK to generic

* partial genericization of kernel code

* introduce TopKTypeConfig, specialize radix type and conversion for floats

* implement topk for byte tensor

* implement for char tensor

* implement for int tensor, extend test to check indices as well

* works for longs too

* make bitfield set/get a struct, add support for 64-bit types

* extend to double tensor

* implement for half tensor

* asserts; test fix
By default, this parameter is False -- a backwards incompatible change, but
one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter
"keepdims" since you can pass multiple dims to reduction functions).

The old behavior seems desired for normalization type operations
where the tensor will immediately be expanded out again, e.g.:
probs.sum(1).expand_as(probs)
which no longer works because the dimension to expand is missing.
This can be fixed by simply passing True as "keepdim" argument
to the reduction operation, e.g:
probs.sum(1, keepdim=True).expand_as(probs)
@elikosan elikosan merged commit b7bd5f0 into elikosan:master Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.