cutorch.synchronize() does not work #9

arashno · 2016-11-14T01:46:04Z

I installed NCCL and I am trying to use it.
Without using NCCL, everythings seems fine but slow.
But when I am trying to use NCCL, In my code when I call cutorch.synchronize() it just stops and it does nothing without making any error.
How can I findout the root of the problem?
Thanks

ngimel · 2016-11-15T17:03:05Z

Do you have a small repro? A known issue with nccl is that it will hang if some other thread calls cudaFree when nccl kernels are being scheduled. You can try running your code with env var THC_CACHING_ALLOCATOR=1, and see if problem persists.

arashno · 2016-11-15T18:29:25Z

I am using this code, But I changed it to use nccl.
[https://github.com/soumith/imagenet-multiGPU.torch]
I tried THC_CACHING_ALLOCATOR=1 and the problem persists.

ngimel · 2016-11-15T18:34:55Z

No changes are necessary to this code to use nccl, it should do it automatically. Take a look at https://github.com/facebook/fb.resnet.torch/, it also uses nccl without deadlocks. If you are adding cutorch.synchronize in such a way that is is called while nccl kernels are being scheduled, nccl will deadlock, you should make sure you are not doing that.

arashno · 2016-11-15T18:42:43Z

It doesn't seems that it uses nccl by default.
This is the original code in util.lua
` local model_single = model

  model = nn.DataParallelTable(1)

  for i=1, nGPU do
     cutorch.setDevice(i)
     model:add(model_single:clone():cuda(), i)
  end`

I changed it to use nccl:

`local model_single = model

  model = nn.DataParallelTable(1,true,true)

  model:threads(function()
       require 'cudnn'
     end)

  for i=1, nGPU do
     cutorch.setDevice(i)
     model:add(model_single:clone():cuda(), i)
  end`

Without doing the change it uses default communication between GPUs which is very slow.
Actually, when t uses the default communication, using 4 GPUs is slower than using just 1 GPU.

arashno · 2016-11-16T17:46:46Z

Also, I find out that it does work for 2 GPUs but it does hang for 4 GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cutorch.synchronize() does not work #9

cutorch.synchronize() does not work #9

arashno commented Nov 14, 2016

ngimel commented Nov 15, 2016

arashno commented Nov 15, 2016

ngimel commented Nov 15, 2016

arashno commented Nov 15, 2016 •

edited

Loading

arashno commented Nov 16, 2016

cutorch.synchronize() does not work #9

cutorch.synchronize() does not work #9

Comments

arashno commented Nov 14, 2016

ngimel commented Nov 15, 2016

arashno commented Nov 15, 2016

ngimel commented Nov 15, 2016

arashno commented Nov 15, 2016 • edited Loading

arashno commented Nov 16, 2016

arashno commented Nov 15, 2016 •

edited

Loading