Skip to content

GSoC 2025

Boris Sekachev edited this page Feb 17, 2025 · 41 revisions


The page contains aggregated information related to participating CVAT.ai Corporation in Google Summer of Code 2025.

Note: To the present time it is not known if CVAT.ai will participate in GSoC, as the application have to be confirmed by GSoC first.


Links

Google Summer of Code references

Wiki references

CVAT resources


Timeline

Full timeline may be found on corresponding GSoC page

Date Description
February 11 18:00 UTC Mentoring organization application deadline
February 27 18:00 UTC List of accepted mentoring organizations published
March 24 18:00 UTC - April 8 18:00 UTC GSoC contributor application period
April 29 18:00 UTC GSoC contributor proposal rankings deadline
May 8 18:00 UTC Accepted GSoC contributor projects announced
May 8 - June 1 Community Bonding Period
June 2 Coding begins
July 14 18:00 UTC - July 18 18:00 UTC Midterm evaluation period
September 1 - 8 18:00 UTC Final evaluation period

CVAT project ideas summary

Mailing list to discuss: cvat-gsoc-2025 mailing list

Index to Ideas Below

  1. API keys and token-based authentication for SDK and CLI
  2. Trackable masks and tags
  3. Support embedded notifications
  4. Multiple objects selection and bulk actions
  5. Timeline for tracked objects
  6. Account deletion and optimized resources erasing
  7. End-to-end model training pipeline for the latest YOLO version
  8. Integrate locust into CI pipelines for performance testing
  9. Projection of 3D Point Cloud Coordinates onto 2D Contextual Images

Idea Template

All projects requires Python and TypeScript programming skills unless other noted.


CVAT project ideas

  1. IDEA: API keys and token-based authentication for SDK and CLI

    • Description: Currently, the only official way to be authenticated in SDK/CLI is by providing your username and password in the requests. This approach works, however it has security issues. The idea is to provide an option for a user to generate and manage API access keys. Such a key could be used as a replacement for the login/password pair.
    • Expected Outcomes:
      • Users can generate API access tokens in the account settings in UI
      • Users may see a list of generated tokens and last time they were used
      • Users can revoke existing API access tokens in the account settings
      • Users can call API endpoints providing API access tokens
      • A token can be used for authentication in SDK/CLI
    • Resources:
    • Skills Required: Python (Django, DRF), Typescript (React)
    • Possible Mentors: Maxim Zhiltsov, Roman Donchenko
    • Rating: Medium
    • Expected size: 175 hours
  2. IDEA: Trackable masks and tags

    • Description: CVAT supports both images and video annotation. It has two kinds of objects: shapes and tracks. Shapes are single frame objects. Tracks may live during the whole video, showing that shapes on different frames relate to the same object on a video. Tracks have additional features, e.g. their shapes, attributes, and properties may be interpolated automatically between keyframes. Currently that is true for all types of objects, except for masks and tags. The purpose of this project is to unify tracking functionality for all the objects.
    • Expected Outcomes:
      • Masks and tags can be tracked during a video in CVAT interface.
      • Tags only interpolate their attributes and properties, as they do not have any position.
      • Masks additionally interpolate their position, using the simplest way (position just propagated between keyframes without linear interpolation, as interpolation of such objects is not a trivial task).
      • Existing annotations formats are updated accordingly.
    • Skills Required: Python (Django, DRF), Typescript (React)
    • Possible Mentors: Maxim Zhiltsov, Roman Donchenko
    • Rating: Hard
    • Expected size: 350 hours
  3. IDEA: Support embedded notifications

    • Description: CVAT is annotation tool used by teams. However now it lacks any notification system. Thus all communications are responsibility of 3rdparty channels. It would be nice to have a kind of embedded notification system to make the process simpler (e.g. workers are notified about new assigned annotation jobs or new issues or comments in jobs assigned to them, requesters are notified when jobs or tasks changes their status)
    • Expected Outcomes:
      • UI provides a page or an overlay with notifications about recent updates (informational, action required, etc.)
      • The feature can use the browser API to send notifications even when CVAT has closed
      • Notifications also can be sent to email if backend has configured
      • Users may setup their notifications preferences using GUI
    • Skills Required: Python (Django, DRF), Typescript (React)
    • Possible Mentors: Boris Sekachev
    • Rating: Hard
    • Expected size: 350 hours
  4. IDEA: Multiple objects selection and bulk actions

    • Description: Now annotation interface only allows working with one object at the same time. This may not to be convenient in some cases and the community proposed to implement selecting many objects, then making some bulk actions on all selected objects simultaneously (e.g. removing, dragging, resizing, changing labels, attributes, or properties)
    • Expected Outcomes:
      • Many objects may be selected, e.g. by holding Ctrl and clicking more objects
      • UI provides visualisation of the selected group on canvas area and in objects sidebar
      • Changing property of one object applies the same changes (if possible) to others
      • Related features (e.g. undo/redo) should be updated correspondingly
    • Resources:
    • Skills Required: Typescript (React)
    • Possible Mentors: Boris Sekachev
    • Rating: Hard
    • Expected size: 350 hours
  5. IDEA: Timeline for tracked objects

    • Description: Timeline shows summary information about selected track on a video (start/end positions, visibility, changes, navigation features, the track preview)
    • Expected Outcomes:
      • A user may see a timeline with keyframes
      • Timeline shows where the track starts, ends, become visible or invisible
      • Timeline provides feature to fast navigation to the selected keyframe
      • Each keyframe on timeline shows short information (e.g. what was updated on this keyframe)
      • User may see the track preview (like animation showing tracked object on different frames)
    • Resources:
    • Skills Required: Typescript (React)
    • Possible Mentors: Kirill Lakhov
    • Rating: Medium
    • Expected size: 175 hours
  6. IDEA: Account deletion and optimized resources erasing

    • Description: Sometimes users want to remove their accounts. To follow personal data regulations and laws in different countries CVAT should provide such functionality. However now the process is totally manual, and only available to instance admins. Implementing account deletion pipeline available for end users would be a great feature. Corresponding user resources should be removed in server workers as removing them in main server processes, generally, leads to request timeout.
    • Expected Outcomes:
      • Users may request deleting their accounts and all associated data from GUI
      • It requires password authentication and email confirmation if email backend has configured
      • Deleting is postponed for configured period of time (e.g. 1 day, or a week) and performed in a worker
      • During this cooldown period user may abort the request
      • Removing of potentially huge resources (like tasks, projects) have to be moved to workers
    • Skills Required: Python (Django, DRF), Typescript (React)
    • Possible Mentors: Maria Khrustaleva
    • Rating: Medium
    • Expected size: 175 hours
  7. IDEA: End-to-end model training pipeline for the latest YOLO version

    • Description: YOLO (You Only Look Once) is a widely-used object detection algorithm that can facilitate automatic annotation of multiple classes. This project aims to experiment with training a YOLO model on a CVAT-annotated dataset, identify typical challenges faced during the process, and document the complete pipeline for the benefit of the community.
    • Expected Outcomes:
      • A Python script that allows users to train a YOLO model with a custom set of detection classes.
      • A model trained from scratch on a dataset annotated using CVAT, leveraging internal CVAT.ai resources for annotation.
      • Integration of the trained model into CVAT to enable automatic labeling via the CVAT CLI interface.
      • In the second iteration, the trained model will be used for pre-annotation, followed by adjustments made by the CVAT.ai data annotation team.
      • The adjusted annotations will be used to fine-tune the model for improved accuracy.
      • A comprehensive guideline or article outlining the entire process from training to fine-tuning, to assist the community.
    • Resources:
    • Skills Required: CV, ML, Python
    • Possible Mentors: Nikita Manovich
    • Rating: Medium
    • Expected size: 175 hours
  8. IDEA: Integrate locust into CI pipelines for performance testing

    • Description: Software that lacks integrated performance testing is at risk of performance regressions, which often go unnoticed until after a new release is deployed to production. This can lead to significant issues, especially when the software serves thousands of users. For this project, we propose enhancing the continuous integration (CI) pipeline by incorporating automated performance testing. By utilizing popular and effective testing solutions, we aim to catch performance issues early in the development cycle, ensuring a smoother user experience and more reliable software releases.
    • Expected Outcomes:
      • Integrate Locust into CI pipeline in CVAT community repository
      • Implement 10 performance tests for different endpoints to find regressions on a regular basis
    • Resources:
    • Skills Required: Python, GitHub actions, CI
    • Possible Mentors: Nikita Manovich
    • Rating: Medium
    • Expected size: 175 hours
  9. IDEA: Projection of 3D Point Cloud Coordinates onto 2D Contextual Images

    • Description: 3D datasets are often accompanied by 2D contextual images captured by cameras synchronized with LiDAR sensors, providing a view of the scene for each frame. While CVAT can display the 3D scene alongside these images, leveraging camera intrinsic parameters allows for more advanced functionality. By performing a projection of 3D points onto 2D images, users can enhance their annotation process, as they will be able to visually match the 3D objects with their corresponding positions on the 2D images. This integration improves the speed and accuracy of annotations by providing a better understanding of how 3D data fits into the 2D visual context.
    • Expected Outcomes:
      • The CVAT server reads and stores camera intrinsic parameters, making this information available to the client via the REST API
      • When camera intrinsic information is provided, the client performs the projection of 3D point cloud data onto the 2D images, using the intrinsic parameters
      • As a result, 3D objects or bounding boxes from the scene are projected onto the 2D images, appearing as cuboids, providing a clear visual alignment between the 3D data and 2D context
    • Resources:
    • Skills Required: Python, TypeScript (three.js)
    • Possible Mentors: Boris Sekachev
    • Rating: Medium
    • Expected size: 175 hours

Idea Template

1. ## _IDEA:_ <Descriptive Title>
   * ***Description:*** 3-7 sentences describing the task
   * ***Expected Outcomes:***
      * < Short bullet list describing what is to be accomplished >
      * <i.e. create a new module called "bla bla">
      * < Has method to accomplish X >
      * <...>
   * ***Resources:***
         * [For example a paper citation](https://arxiv.org/pdf/1802.08091.pdf)
         * [For example an existing feature request](https://github.com/cvat-ai/cvat/pull/5608)
   * ***Skills Required:*** < for example mastery plus experience coding in Python, college course work in vision that covers AI topics, python. Best if you have also worked with deep neural networks. >
   * ***Possible Mentors:*** < your name goes here >
   * ***Rating:*** <Easy, Medium, Hard>
   * ***Expected size:*** <90, 175 or 350 hours>

Potential projects mentors

GSoC admins