-
Notifications
You must be signed in to change notification settings - Fork 6.8k
MXNetError: unknown type for MKLDNN :2 when training Mask RCNN with mxnet-cu101==1.7.0 #19631
Comments
If you look at this file: You'll see its running dtype checking and doesnt find a case in the switch for value 2. If you go look at what value 2 is: You'll see that its |
Thank You @samskalicky . Mask RCNN does use AMP and it casts the weights and gradients to FP16 here https://github.com/dmlc/gluon-cv/blob/master/scripts/instance/mask_rcnn/train_mask_rcnn.py#L705 |
I also tried running Mask RCNN script on single node using Below is the output from the run:
However, running it with
|
@bgawrych @bartekkuncer @grygielski FYI, looks like something changed from 1.6.0 to 1.7.0 that is causing this issue when running on CPU with MKLDNN |
I would suspect that the merging of mkldnn as default caused some issue in the contrib operators. |
Update: Commenting out this line of code (https://github.com/dmlc/gluon-cv/blob/master/scripts/instance/mask_rcnn/train_mask_rcnn.py#L705-L710) seems to work with Horovod
|
I will try to analyze the issue. |
@anko-intel thanks for the fix! |
Description
MXNetError: unknown type for MKLDNN :2
issue usingmxnet-cu101==1.7.0
Error Message
GluonCV: v0.9.0
Horovod: v0.21.0
To Reproduce
Without Horovod
Full log: https://gist.github.com/karan6181/efa4ad8f61c3e21cbee9c55fea98b2f0
With Horovod
Environment
We recommend using our script for collecting the diagnostic information with the following command
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3
The text was updated successfully, but these errors were encountered: