Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform inference on a video using custom weights #263

Open
LLH-Harward opened this issue Apr 18, 2024 · 10 comments
Open

Perform inference on a video using custom weights #263

LLH-Harward opened this issue Apr 18, 2024 · 10 comments

Comments

@LLH-Harward
Copy link

Hello, thank you for your outstanding work!
I would like to perform video inference directly using yolo_world, and I have used Roboflow Inference and Supervision, but they only provide some benchmark models, such as l, x, v2-x, v2-l.
The model performance of the Yolo World Hugging Face model (https://huggingface.co/spaces/stevengrove/YOLO-World) is better for my purposes than the standard inference one, "yolo_world/v2-x" for example.
I would like to use the weights from Hugging Face, such as x-1280, for inference. Could you please provide the necessary support? Or is it possible to directly input videos? I would greatly appreciate it.

@wondervictor
Copy link
Collaborator

Could you provide more details/clues about why the HF version (L-640) of YOLO-World is better than the GitHub version (X-1280)? BTW, the HuggingFace demo only uses L-640.

@LLH-Harward
Copy link
Author

Thank you for your response.
I apologize if I wasn't clear earlier.
I meant to point out that on Hugging Face, the models with a 1280 input seem to be more effective at detecting small objects. While Roboflow Inference and Supervision does support video processing, it currently only offers basic models like v2-l, v2-x, and lacks access to other 1280 models.
Could you kindly inform me if there's a method to directly use custom weights (for instance, those obtained from training) for video inference?

@wondervictor
Copy link
Collaborator

Sure, I saw many requests for inferencing with videos and I'll increase the priority of it. And I'll notify you if I make it, not too long.

@LLH-Harward
Copy link
Author

Thank you so much.

@wondervictor
Copy link
Collaborator

Hi @LLH-Harward, the latest update has supported video inference. Please check demo/video_demo.py.

@LLH-Harward
Copy link
Author

Thank you so much! I'll try it later.

@LLH-Harward
Copy link
Author

hello,When I used video_demo.py for inference, the following error occurred, showing that there is no "data/coco/lvis/lvis_v1_minival_inserted_image_name.json"
I found the relevant content in the model's configuration file "yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py". How to solve this problem? Can you give relevant guidance?
Related configuration:
torch+cu118==2.1.1
torchvision+cu118==0.16.1
mmcv==2.0.0rc4
mmdet==3.0.0
mmengine==0.10.3
mmyolo==0.6.0

BUG:
inputs: python video_demo.py D:\YOLO-World-master\configs\pretrain\yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py D:\YOLO-World-master\pretrained_weights
yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth D:\YOLO-World-master\result.mp4 "people,laptop,book,bottle,pen,phone" --out out111

bin C:\Users\714\AppData\Roaming\Python\Python39\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll
Loads checkpoint by local backend from path: D:\YOLO-World-master\pretrained_weights\yolo_world_v2_x_obj365v1_goldg_cc3mlite_pretrain_1280ft-14996a36.pth
Traceback (most recent call last):
File "D:\YOLO-World-master\demo\video_demo.py", line 109, in
main()
File "D:\YOLO-World-master\demo\video_demo.py", line 56, in main
model = init_detector(args.config, args.checkpoint, device=args.device)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\apis\inference.py", line 97, in init_detector
metainfo = DATASETS.build(test_dataset_cfg).metainfo
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "D:\YOLO-World-master\yolo_world\datasets\mm_dataset.py", line 25, in init
self.dataset = DATASETS.build(dataset)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\registry\build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmyolo\datasets\yolov5_coco.py", line 19, in init
super().init(*args, **kwargs)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\datasets\base_det_dataset.py", line 40, in init
super().init(*args, **kwargs)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmengine\dataset\base_dataset.py", line 247, in init
self.full_init()
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmyolo\datasets\yolov5_coco.py", line 27, in full_init
self.data_list = self.load_data_list()
File "F:\Anaconda\envs\yoloworld\lib\site-packages\mmdet\datasets\lvis.py", line 605, in load_data_list
self.lvis = LVIS(local_path)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\lvis\lvis.py", line 27, in init
self.dataset = self._load_json(annotation_path)
File "F:\Anaconda\envs\yoloworld\lib\site-packages\lvis\lvis.py", line 35, in _load_json
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/coco/lvis/lvis_v1_minival_inserted_image_name.json'

@LLH-Harward
Copy link
Author

New developments:
After I completed the json file and its path required in the configuration file "yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py", and modified for frame in track_iter_progress(video_reader): in video_demo.py to for frame in video_reader:, the code can now run normally and produce results.

However, the running speed is quite slow. Is it because the mmcv framework loads slowly?

@wondervictor
Copy link
Collaborator

The visualization takes time to draw objects in frames.

@LLH-Harward
Copy link
Author

OK, thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants