Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performing encoding in a separate thread #52

Closed
Breakthrough opened this issue May 14, 2021 · 3 comments
Closed

Investigate performing encoding in a separate thread #52

Breakthrough opened this issue May 14, 2021 · 3 comments
Milestone

Comments

@Breakthrough
Copy link
Owner

Breakthrough commented May 14, 2021

Investigate using threading or multiprocessing to split different parts of the processing pipeline up to make better use of multiple CPU cores. In general, video decoding followed by encoding take up the most processing time in the pipeline overall. Encoding is more CPU intensive, but typically there are less frames to encode than the input video contains, thus it consumes less overall CPU time when processing a given input video.

@Breakthrough
Copy link
Owner Author

Breakthrough commented Feb 16, 2022

Just did a quick benchmark to test this with a 1080p video at 60 FPS containing roughly 3000 frames (50% of them with motion). The results I obtained were:

Scan-Only:

  • Single-Threaded: 47.3 FPS
  • Multi-Threaded: 54.0 FPS (+15%)

Including Video Output:

  • Single-Threaded: 41.5 FPS
  • Multi-Threaded: 51.6 FPS (+24%)

This is using the threading module, not multiprocessing (frames are too large to transfer across processes without shared memory which is only available in Python 3.8+).

I used 3 threads - one for decoding the video, one for the motion detection algorithm, and one for the video encoding. This seems to be a worth-while avenue to pursue, and also helps cleanup the control flow by better separation of concerns.

Here is the benchmark code:
https://gist.github.com/Breakthrough/8aed9a77fd8b9a60fb37e984e33ea596

If anyone would like to test this on their own system, I'd be glad to see what kind of improvement you're seeing from that script.

@Breakthrough
Copy link
Owner Author

Breakthrough commented Feb 20, 2022

Interestingly, it looks like writing the same benchmark in Rust yields another worthy performance gain. I've only tested the single-threaded version of the Python benchmark, but even an unoptimized debug mode build yields 53 FPS (versus 41.5 FPS in Python). If the Rust multithreaded version proves to be significantly faster, then it might be worth considering rewriting DVR-Scan v2.0 in Rust.

Edit: Final results for Rust, including video output:

  • Single-Threaded: 50.9 FPS (+23%)
  • Multi-Threaded: 62.2 FPS (+50%)

@Breakthrough
Copy link
Owner Author

This is now completed in the v1.5 branch. I'm getting roughly 60 FPS now processing a 1080p video, which is a pretty significant improvement. Feel free to grab a pre-built .whl from the AppVeyor build job to test out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant