You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your excellent work. I'm trying to apply U-DiT in my task, but got some unexpected results. When I start training of U-DiT, it raises such warning and iteration time gets much longer.
lib/python3.10/site-packages/torch/autograd/__init__.py:251: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [384, 768, 1, 1], strides() = [768, 1, 768, 768]
bucket_view.sizes() = [384, 768, 1, 1], strides() = [768, 1, 1, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:320.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
I tried to localize this problem. There is no warning if I use backbone of DiT, and iteration time is reasonable(1.4s v.s. 7.75s). I wonder if anyone encounters similar problem. Thanks a lot.
The training process uses DDP for multi-device training. torch version I used is
Thanks for your excellent work. I'm trying to apply U-DiT in my task, but got some unexpected results. When I start training of U-DiT, it raises such warning and iteration time gets much longer.
I tried to localize this problem. There is no warning if I use backbone of DiT, and iteration time is reasonable(1.4s v.s. 7.75s). I wonder if anyone encounters similar problem. Thanks a lot.
The training process uses DDP for multi-device training. torch version I used is
The text was updated successfully, but these errors were encountered: