How to run the code without distribution wrapper? I tried several times, follows are my bash script, #10

MadadamXie · 2025-01-01T16:28:39Z

python3 train.py configs/cifar/resnet12_etf_bs512_200e_cifar.py --gpus 1 --work-dir logger/cifar_etf --seed 0 --deterministic

python3 fscil.py configs/cifar/resnet12_etf_bs512_200e_cifar_eval.py logger/cifar_etf logger/cifar_etf/best.pth --gpus 1 --seed 0 --deterministic

those are the bash codes i ran to reproduce your experiment results. The first part works properly , 3 percent lower, but ran without any error message. The second one can't run anyway, throw out the error message like:

python3 fscil.py configs/cifar/resnet12_etf_bs512_200e_cifar_eval.py logger/cifar_etf logger/cifar_etf/best.pth --gpus 1 --seed 0 --deterministic
/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
/home/xieyuhan/WorkSpace/FSCIL/mmcls/utils/setup_env.py:32: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
/home/xieyuhan/WorkSpace/FSCIL/mmcls/utils/setup_env.py:42: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
/home/xieyuhan/WorkSpace/FSCIL/fscil.py:123: UserWarning: `--gpus` is deprecated because we only support single GPU mode in non-distributed training. Use `gpus=1` now.
warnings.warn('`--gpus` is deprecated because we only support '
2025-01-02 00:21:10,965 - mmcls - INFO - Environment info:

sys.platform: linux
Python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 4070 Ti SUPER
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0
PyTorch: 2.5.1
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 12.4
NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 90.1
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.20.1
OpenCV: 4.10.0
MMCV: 1.7.2
MMCV Compiler: GCC 13.2
MMCV CUDA Compiler: 12.4
MMClassification: 0.23.2_71ef7ba+37a5475

2025-01-02 00:21:10,966 - mmcls - INFO - Distributed training: False
2025-01-02 00:21:10,966 - mmcls - INFO - Set random seed to 0, deterministic: True
2025-01-02 00:21:11,020 - mmcls - INFO - ETF head : evaluating 60 out of 100 classes.
2025-01-02 00:21:11,020 - mmcls - INFO - ETF head : with_len : True
2025-01-02 00:21:11,024 - mmcls - INFO - load checkpoint from local path: logger/cifar_etf/best.pth
/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/mmcv/runner/checkpoint.py:334: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(filename, map_location=map_location)
2025-01-02 00:21:11,077 - mmcls - INFO - The model config:
type = 'ImageClassifierCIL'
backbone = dict(type='ResNet12', with_avgpool=False, flatten=False)
neck = dict(type='MLPFFNNeck', in_channels=640, out_channels=512)
head = dict(
type='ETFHead',
num_classes=100,
eval_classes=60,
in_channels=512,
loss=dict(type='DRLoss', loss_weight=10.0),
topk=(1, 5),
cal_acc=True,
with_len=True)
mixup = 0.5
mixup_prob = 0.75

2025-01-02 00:21:11,298 - mmcls - INFO - The feat dataset config is :
{'type': 'CIFAR100FSCILDataset', 'data_prefix': '/home/xieyuhan/DataSets/cifar', 'pipeline': [{'type': 'Resize', 'size': (36, -1), 'interpolation': 'bicubic'}, {'type': 'CenterCrop', 'crop_size': 32}, {'type': 'Normalize', 'mean': [129.304, 124.07, 112.434], 'std': [68.17, 65.392, 70.418], 'to_rgb': False}, {'type': 'ImageToTensor', 'keys': ['img']}, {'type': 'Collect', 'keys': ['img', 'gt_label'], 'meta_keys': ('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg', 'cls_id', 'img_id')}], 'num_cls': 60, 'subset': 'train'}
[>>>>>>>>>>>>>>>>>>>>>>>>>>] 30000/30000, 11624.3 task/s, elapsed: 3s, ETA: 0s
2025-01-02 00:21:14,263 - mmcls - INFO - Feat init done with 60 classes
2025-01-02 00:21:14,332 - mmcls - INFO - Memory done with 60 classes
2025-01-02 00:21:14,350 - mmcls - INFO - The test dataset config is :
{'type': 'CIFAR100FSCILDataset', 'data_prefix': '/home/xieyuhan/DataSets/cifar', 'pipeline': [{'type': 'Resize', 'size': (36, -1), 'interpolation': 'bicubic'}, {'type': 'CenterCrop', 'crop_size': 32}, {'type': 'Normalize', 'mean': [129.304, 124.07, 112.434], 'std': [68.17, 65.392, 70.418], 'to_rgb': False}, {'type': 'ImageToTensor', 'keys': ['img']}, {'type': 'Collect', 'keys': ['img', 'gt_label'], 'meta_keys': ('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg', 'cls_id', 'img_id')}], 'num_cls': 100, 'subset': 'test'}
[>>>>>>>>>>>>>>>>>>>>>>>>>>] 10000/10000, 10867.7 task/s, elapsed: 1s, ETA: 0s
2025-01-02 00:21:15,555 - mmcls - INFO - Test memory done with 100 classes
2025-01-02 00:21:15,580 - mmcls - INFO - The incremental dataset config is :
{'type': 'CIFAR100FSCILDataset', 'data_prefix': '/home/xieyuhan/DataSets/cifar', 'pipeline': [{'type': 'Resize', 'size': (36, -1), 'interpolation': 'bicubic'}, {'type': 'CenterCrop', 'crop_size': 32}, {'type': 'Normalize', 'mean': [129.304, 124.07, 112.434], 'std': [68.17, 65.392, 70.418], 'to_rgb': False}, {'type': 'ImageToTensor', 'keys': ['img']}, {'type': 'Collect', 'keys': ['img', 'gt_label'], 'meta_keys': ('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg', 'cls_id', 'img_id')}], 'num_cls': 60, 'subset': 'train', 'few_cls': (60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99)}
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 200/200, 1393.7 task/s, elapsed: 0s, ETA: 0s
2025-01-02 00:21:16,091 - mmcls - INFO - Incremental memory done with 40 classes
2025-01-02 00:21:16,120 - mmcls - INFO - Evaluating session 1, from 0 to 60.
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6000/6000, 30364.2 task/s, elapsed: 0s, ETA: 0s
2025-01-02 00:21:16,344 - mmcls - INFO - [01]Evaluation results : acc : 78.83 ; acc_base : 78.83 ; acc_inc : nan
2025-01-02 00:21:16,345 - mmcls - INFO - [01]Evaluation results : acc_incremental_old : nan ; acc_incremental_new : 79.60
2025-01-02 00:21:16,347 - mmcls - INFO - Start to execute the incremental sessions.
2025-01-02 00:21:16,369 - mmcls - INFO - Starting session : 2 ------------------------------------------------------------
2025-01-02 00:21:16,369 - mmcls - INFO - Newly added classes are from 60 to 65.
2025-01-02 00:21:16,369 - mmcls - INFO - Model now can classify 65 classes
2025-01-02 00:21:16,369 - mmcls - INFO - 50 steps
2025-01-02 00:21:16,369 - mmcls - INFO - Extracting mean neck feat from 60 to 60
2025-01-02 00:21:16,369 - mmcls - INFO - Copy 1 duplications.
2025-01-02 00:21:16,370 - mmcls - INFO - Session : 2 ; The dataset has 85 samples.
2025-01-02 00:21:16,370 - mmcls - INFO - Labels : [60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 19, 29, 0, 11, 1, 28, 23, 31, 39, 17, 8, 59, 52, 42, 47, 21, 22, 24, 45, 49, 56, 14, 9, 6, 20, 36, 55, 43, 51, 35, 33, 27, 53, 50, 15, 18, 46, 38, 4, 34, 32, 30, 40, 26, 48, 54, 44, 7, 12, 2, 41, 37, 13, 25, 10, 57, 5, 3, 58, 16]
Traceback (most recent call last):
File "/home/xieyuhan/WorkSpace/FSCIL/fscil.py", line 205, in
main()
File "/home/xieyuhan/WorkSpace/FSCIL/fscil.py", line 194, in main
fscil(
File "/home/xieyuhan/WorkSpace/FSCIL/mmfscil/apis/fscil.py", line 598, in fscil
losses = model_finetune(return_loss=True, img=data['feat'], gt_label=data['gt_label'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/mmcv/parallel/data_parallel.py", line 51, in forward
return super().forward(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 191, in forward
return self.module(*inputs[0], **module_kwargs[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func
return old_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/FSCIL/mmcls/models/classifiers/base.py", line 83, in forward
return self.forward_train(img, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/FSCIL/mmfscil/models/classifier.py", line 169, in forward_train
loss1 = self.head.forward_train(x, gt_a)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/FSCIL/mmfscil/models/ETFHead.py", line 109, in forward_train
target = (etf_vec * self.produce_training_rect(gt_label, self.num_classes))[:, gt_label].t()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/FSCIL/mmfscil/models/ETFHead.py", line 157, in produce_training_rect
dist.all_gather_object(recv_list, label.cpu())
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 2710, in all_gather_object
current_device = _get_pg_default_device(group)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 771, in _get_pg_default_device
group = group or _get_default_group()
^^^^^^^^^^^^^^^^^^^^
File "/home/xieyuhan/WorkSpace/anaconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1150, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run the code without distribution wrapper? I tried several times, follows are my bash script, #10

How to run the code without distribution wrapper? I tried several times, follows are my bash script, #10

MadadamXie commented Jan 1, 2025

How to run the code without distribution wrapper? I tried several times, follows are my bash script, #10

How to run the code without distribution wrapper? I tried several times, follows are my bash script, #10

Comments

MadadamXie commented Jan 1, 2025

TorchVision: 0.20.1 OpenCV: 4.10.0 MMCV: 1.7.2 MMCV Compiler: GCC 13.2 MMCV CUDA Compiler: 12.4 MMClassification: 0.23.2_71ef7ba+37a5475

TorchVision: 0.20.1
OpenCV: 4.10.0
MMCV: 1.7.2
MMCV Compiler: GCC 13.2
MMCV CUDA Compiler: 12.4
MMClassification: 0.23.2_71ef7ba+37a5475