Russian Caption system

This project was implemented as part of an internship at Odnoklassniki.

Try it out on Colab:

The system makes captions to images for blind and visually impaired people. The architecture consists of two models:

YOLOv3 - state-of-the-art, real-time object detection system.
Sber ru-GPT models - autoregressive transformer language models.

For the first model, weights were taken from the YOLOv3 neural network and trained in 80 classes from a dataset MS COCO.

For the second model, the weights of two models were taken: ruGPT3Small and ruGPT3Medium. Then they were fine-tuned on a dataset of russion language, containing labels and capltion from them.

Thus, 3 models were developed: ruGPT3Small trained on 2 and 10 epochs, ruGPT3Medium trained on 5 epochs. Generally speaking, following conclusions can be made:ruGPT3Small(2 epochs) model worked best on tests, but ruGPT3Medium makes more eloquent captions.

Installing

To install the dependencies, run

pip install -r requirements.txt

Also you should download weights for YOLOv3, GPT2:

Grab the pretrained weights of yolo3 from https://pjreddie.com/media/files/yolov3.weights
Weight of GPT model from https://drive.google.com/drive/folders/1WFpM3jFpGHSq3GESIKMnTzyRZvdRf9mN?usp=sharing

And for the GPU to work, make sure you've got the drivers installed beforehand (CUDA).

It has been tested to work with Python 3.7.11

Caption

Select model, image and run:

python caption.py -m choosen_models -i your_image.jpg

Models Timings

Time estimated on CPU Intel Core i5.

Name	Download	Time @ 1 image.
Small (2 epochs)	model	7.2 s
Small (10 epochs)	model	6.6 s
Meduim (5 epochs)	model	10.9 s

Captions Examples

Name	vase.jpg	man.jpg	sofa.jpg	cats.jpg
Small (2 epochs)	Ваза и чашка на столе	Человек с мобильным телефоном и галстуком	Человек сидит на диване с мобильным телефоном	Два кота смотрят на человека на завтраке за столом со стулом и чашей
Small (10 epochs)	Маленькая вазочка со стеклянной кружкой на столе	Человек, стоящий перед мобильным телефоном в галстуке	Люди на диване с мобильными телефонами	Два больших коричневых кота смотрят на человека на столе со стулом в чаше
Meduim (5 epochs)	Белая керамическая ваза с розами и белая чашка на деревянном столе	Мужчина в костюме с мобильным телефоном и галстуком на шее	Мужчина сидит на диване и разговаривает по мобильному телефону	Две серые кошки сидят напротив человека, обедающего на кухонном столе возле стульев и чаши

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Russian Caption system

Installing

Caption

Models Timings

Captions Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

Russian Caption system

Installing

Caption

Models Timings

Captions Examples