Skip to content

Latest commit

 

History

History
106 lines (78 loc) · 4.65 KB

README.md

File metadata and controls

106 lines (78 loc) · 4.65 KB

Russian Caption system

This project was implemented as part of an internship at Odnoklassniki.

Try it out on Colab:

Open In Colab

The system makes captions to images for blind and visually impaired people. The architecture consists of two models:

  • YOLOv3 - state-of-the-art, real-time object detection system.
  • Sber ru-GPT models - autoregressive transformer language models.

For the first model, weights were taken from the YOLOv3 neural network and trained in 80 classes from a dataset MS COCO.

For the second model, the weights of two models were taken: ruGPT3Small and ruGPT3Medium. Then they were fine-tuned on a dataset of russion language, containing labels and capltion from them.

Thus, 3 models were developed: ruGPT3Small trained on 2 and 10 epochs, ruGPT3Medium trained on 5 epochs. Generally speaking, following conclusions can be made:ruGPT3Small(2 epochs) model worked best on tests, but ruGPT3Medium makes more eloquent captions.

Installing

To install the dependencies, run

pip install -r requirements.txt

Also you should download weights for YOLOv3, GPT2:

And for the GPU to work, make sure you've got the drivers installed beforehand (CUDA).

It has been tested to work with Python 3.7.11

Caption

Select model, image and run:

python caption.py -m choosen_models -i your_image.jpg

Models Timings

Time estimated on CPU Intel Core i5.

Name Download Time @ 1 image.
Small (2 epochs) model 7.2 s
Small (10 epochs) model 6.6 s
Meduim (5 epochs) model 10.9 s

Captions Examples

Name vase.jpg man.jpg sofa.jpg cats.jpg
Small (2 epochs) Ваза и чашка на столе Человек с мобильным телефоном и галстуком Человек сидит на диване с мобильным телефоном Два кота смотрят на человека на завтраке за столом со стулом и чашей
Small (10 epochs) Маленькая вазочка со стеклянной кружкой на столе Человек, стоящий перед мобильным телефоном в галстуке Люди на диване с мобильными телефонами Два больших коричневых кота смотрят на человека на столе со стулом в чаше
Meduim (5 epochs) Белая керамическая ваза с розами и белая чашка на деревянном столе Мужчина в костюме с мобильным телефоном и галстуком на шее Мужчина сидит на диване и разговаривает по мобильному телефону Две серые кошки сидят напротив человека, обедающего на кухонном столе возле стульев и чаши