运行时出现以下错误 #60

ZhouH188 · 2024-03-15T09:14:28Z

Global seed set to 2022
Loaded pretrained weights for efficientnet-b4
[2024-03-15 17:04:57,495][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmpncggsv8p
[2024-03-15 17:04:57,496][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmpncggsv8p/_remote_module_non_sriptable.py
[2024-03-15 17:04:57,648][main][INFO] - Searching /home/cylunbu/cross_view_transformers-master/logs.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose 'Don't visualize my results'
wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id 0315_170457.
wandb: Tracking run with wandb version 0.12.11
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
Global seed set to 2022
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2024-03-15 17:05:01,761][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2024-03-15 17:05:01,761][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.

distributed_backend=nccl
All distributed processes registered. Starting with 1 processes

/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/cuda/init.py:145: UserWarning:
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Error executing job with overrides: ['+experiment=cvt_nuscenes_vehicle', 'data.dataset_dir=/home/cylunbu/nuscenes/mini', 'data.labels_dir=/home/cylunbu/cvt_labels_nuscenes']
Traceback (most recent call last):
File "scripts/train.py", line 79, in
main()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/main.py", line 48, in decorated_main
_run_hydra(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in
lambda: hydra.run(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "scripts/train.py", line 75, in main
trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1215, in _run
self.strategy.setup(self)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 159, in setup
self._share_information_to_prevent_deadlock()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 388, in _share_information_to_prevent_deadlock
self._share_pids()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 406, in _share_pids
pids = self.all_gather(torch.tensor(os.getpid(), device=self.root_device))
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/parallel.py", line 111, in all_gather
return all_gather_ddp_if_available(tensor, group=group, sync_grads=sync_grads)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 188, in all_gather_ddp_if_available
return AllGatherGrad.apply(tensor, group)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 154, in forward
gathered_tensor = [torch.zeros_like(tensor) for _ in range(torch.distributed.get_world_size())]
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 154, in
gathered_tensor = [torch.zeros_like(tensor) for _ in range(torch.distributed.get_world_size())]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
File "scripts/train.py", line 79, in
main()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/main.py", line 48, in decorated_main
_run_hydra(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in
lambda: hydra.run(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "scripts/train.py", line 75, in main
trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1215, in _run
self.strategy.setup(self)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 159, in setup
self._share_information_to_prevent_deadlock()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 388, in _share_information_to_prevent_deadlock
self._share_pids()
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 406, in _share_pids
pids = self.all_gather(torch.tensor(os.getpid(), device=self.root_device))
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/parallel.py", line 111, in all_gather
return all_gather_ddp_if_available(tensor, group=group, sync_grads=sync_grads)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 188, in all_gather_ddp_if_available
return AllGatherGrad.apply(tensor, group)
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 154, in forward
gathered_tensor = [torch.zeros_like(tensor) for _ in range(torch.distributed.get_world_size())]
File "/home/cylunbu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 154, in
gathered_tensor = [torch.zeros_like(tensor) for _ in range(torch.distributed.get_world_size())]
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行时出现以下错误 #60

运行时出现以下错误 #60

ZhouH188 commented Mar 15, 2024

运行时出现以下错误 #60

运行时出现以下错误 #60

Comments

ZhouH188 commented Mar 15, 2024

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

distributed_backend=nccl
All distributed processes registered. Starting with 1 processes