You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I designed my pipeline to run inference on detection first, then pass all detected boxes into classification. I deployed this as an API. However, when I scaled the service by replicating it, the Requests Per Second (RPS) remained the same.
For example, when I replicated the service 5 times, GPU usage increased 5x, but the RPS did not improve.
Do you have any ideas on how to double the RPS?
ps. I run this API using docker compose and use nginx as lb
The text was updated successfully, but these errors were encountered:
I created a project for detection and classification on image frames.
I implemented detection and classification engines using the following constructor:
I designed my pipeline to run inference on detection first, then pass all detected boxes into classification. I deployed this as an API. However, when I scaled the service by replicating it, the Requests Per Second (RPS) remained the same.
For example, when I replicated the service 5 times, GPU usage increased 5x, but the RPS did not improve.
Do you have any ideas on how to double the RPS?
ps. I run this API using docker compose and use nginx as lb
The text was updated successfully, but these errors were encountered: