Skip to content

The system makes captions to images for blind and visually impaired people.

Notifications You must be signed in to change notification settings

StasGC/ImageCaptionSystem-ru

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Russian Caption system

This project was implemented as part of an internship at Odnoklassniki.

Try it out on Colab:

Open In Colab

The system makes captions to images for blind and visually impaired people. The architecture consists of two models:

  • YOLOv3 - state-of-the-art, real-time object detection system.
  • Sber ru-GPT models - autoregressive transformer language models.

For the first model, weights were taken from the YOLOv3 neural network and trained in 80 classes from a dataset MS COCO.

For the second model, the weights of two models were taken: ruGPT3Small and ruGPT3Medium. Then they were fine-tuned on a dataset of russion language, containing labels and capltion from them.

Thus, 3 models were developed: ruGPT3Small trained on 2 and 10 epochs, ruGPT3Medium trained on 5 epochs. Generally speaking, following conclusions can be made:ruGPT3Small(2 epochs) model worked best on tests, but ruGPT3Medium makes more eloquent captions.

Installing

To install the dependencies, run

pip install -r requirements.txt

Also you should download weights for YOLOv3, GPT2:

And for the GPU to work, make sure you've got the drivers installed beforehand (CUDA).

It has been tested to work with Python 3.7.11

Caption

Select model, image and run:

python caption.py -m choosen_models -i your_image.jpg

Models Timings

Time estimated on CPU Intel Core i5.

Name Download Time @ 1 image.
Small (2 epochs) model 7.2 s
Small (10 epochs) model 6.6 s
Meduim (5 epochs) model 10.9 s

Captions Examples

Name vase.jpg man.jpg sofa.jpg cats.jpg
Small (2 epochs) Ваза и чашка на столе Человек с мобильным телефоном и галстуком Человек сидит на диване с мобильным телефоном Два кота смотрят на человека на завтраке за столом со стулом и чашей
Small (10 epochs) Маленькая вазочка со стеклянной кружкой на столе Человек, стоящий перед мобильным телефоном в галстуке Люди на диване с мобильными телефонами Два больших коричневых кота смотрят на человека на столе со стулом в чаше
Meduim (5 epochs) Белая керамическая ваза с розами и белая чашка на деревянном столе Мужчина в костюме с мобильным телефоном и галстуком на шее Мужчина сидит на диване и разговаривает по мобильному телефону Две серые кошки сидят напротив человека, обедающего на кухонном столе возле стульев и чаши

About

The system makes captions to images for blind and visually impaired people.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published