Perform inference on a video using custom weights #263

LLH-Harward · 2024-04-18T15:46:11Z

Hello, thank you for your outstanding work!
I would like to perform video inference directly using yolo_world, and I have used Roboflow Inference and Supervision, but they only provide some benchmark models, such as l, x, v2-x, v2-l.
The model performance of the Yolo World Hugging Face model (https://huggingface.co/spaces/stevengrove/YOLO-World) is better for my purposes than the standard inference one, "yolo_world/v2-x" for example.
I would like to use the weights from Hugging Face, such as x-1280, for inference. Could you please provide the necessary support? Or is it possible to directly input videos? I would greatly appreciate it.

wondervictor · 2024-04-19T07:50:19Z

Could you provide more details/clues about why the HF version (L-640) of YOLO-World is better than the GitHub version (X-1280)? BTW, the HuggingFace demo only uses L-640.

LLH-Harward · 2024-04-19T08:13:39Z

Thank you for your response.
I apologize if I wasn't clear earlier.
I meant to point out that on Hugging Face, the models with a 1280 input seem to be more effective at detecting small objects. While Roboflow Inference and Supervision does support video processing, it currently only offers basic models like v2-l, v2-x, and lacks access to other 1280 models.
Could you kindly inform me if there's a method to directly use custom weights (for instance, those obtained from training) for video inference?

wondervictor · 2024-04-19T08:16:04Z

Sure, I saw many requests for inferencing with videos and I'll increase the priority of it. And I'll notify you if I make it, not too long.

LLH-Harward · 2024-04-19T08:19:46Z

Thank you so much.

wondervictor · 2024-04-28T08:17:04Z

Hi @LLH-Harward, the latest update has supported video inference. Please check demo/video_demo.py.

LLH-Harward · 2024-04-28T08:42:41Z

Thank you so much！ I'll try it later.

LLH-Harward · 2024-04-28T11:52:23Z

hello,When I used video_demo.py for inference, the following error occurred, showing that there is no "data/coco/lvis/lvis_v1_minival_inserted_image_name.json"
I found the relevant content in the model's configuration file "yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py". How to solve this problem? Can you give relevant guidance?
Related configuration:
torch+cu118==2.1.1
torchvision+cu118==0.16.1
mmcv==2.0.0rc4
mmdet==3.0.0
mmengine==0.10.3
mmyolo==0.6.0

BUG:
inputs: python video_demo.py D:\YOLO-World-master\configs\pretrain\yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py D:\YOLO-World-master\pretrained_weights
yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth D:\YOLO-World-master\result.mp4 "people,laptop,book,bottle,pen,phone" --out out111

bin C:\Users\714\AppData\Roaming\Python\Python39\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll
Loads checkpoint by local backend from path: D:\YOLO-World-master\pretrained_weights\yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth
Traceback (most recent call last):
File "D:\YOLO-World-master\demo\video_demo.py", line 109, in
main()
File "D:\YOLO-World-master\demo\video_demo.py", line 56, in main
model = init_detector(args.config, args.checkpoint, device=args.device)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\apis\inference.py", line 97, in init_detector
metainfo = DATASETS.build(test_dataset_cfg).metainfo
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "D:\YOLO-World-master\yolo_world\datasets\mm_dataset.py", line 25, in init
self.dataset = DATASETS.build(dataset)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmyolo\datasets\yolov5_coco.py", line 19, in init
super().init(*args, **kwargs)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\datasets\base_det_dataset.py", line 40, in init
super().init(*args, **kwargs)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\dataset\base_dataset.py", line 247, in init
self.full_init()
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmyolo\datasets\yolov5_coco.py", line 27, in full_init
self.data_list = self.load_data_list()
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\datasets\lvis.py", line 605, in load_data_list
self.lvis = LVIS(local_path)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\lvis\lvis.py", line 27, in init
self.dataset = self._load_json(annotation_path)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\lvis\lvis.py", line 35, in _load_json
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/coco/lvis/lvis_v1_minival_inserted_image_name.json'

LLH-Harward · 2024-04-29T02:25:32Z

New developments:
After I completed the json file and its path required in the configuration file "yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py", and modified for frame in track_iter_progress(video_reader): in video_demo.py to for frame in video_reader:, the code can now run normally and produce results.

However, the running speed is quite slow. Is it because the mmcv framework loads slowly?

wondervictor · 2024-04-29T02:28:16Z

The visualization takes time to draw objects in frames.

LLH-Harward · 2024-04-29T02:44:19Z

OK, thank you for your help!

wondervictor mentioned this issue Apr 19, 2024

Roadmap of YOLO-World #109

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform inference on a video using custom weights #263

Perform inference on a video using custom weights #263

LLH-Harward commented Apr 18, 2024

wondervictor commented Apr 19, 2024

LLH-Harward commented Apr 19, 2024

wondervictor commented Apr 19, 2024

LLH-Harward commented Apr 19, 2024

wondervictor commented Apr 28, 2024

LLH-Harward commented Apr 28, 2024

LLH-Harward commented Apr 28, 2024

LLH-Harward commented Apr 29, 2024

wondervictor commented Apr 29, 2024

LLH-Harward commented Apr 29, 2024

Perform inference on a video using custom weights #263

Perform inference on a video using custom weights #263

Comments

LLH-Harward commented Apr 18, 2024

wondervictor commented Apr 19, 2024

LLH-Harward commented Apr 19, 2024

wondervictor commented Apr 19, 2024

LLH-Harward commented Apr 19, 2024

wondervictor commented Apr 28, 2024

LLH-Harward commented Apr 28, 2024

LLH-Harward commented Apr 28, 2024

LLH-Harward commented Apr 29, 2024

wondervictor commented Apr 29, 2024

LLH-Harward commented Apr 29, 2024