-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
httpx client has very poor performance for concurrent requests compared to aiohttp #1596
Comments
Interesting, I was not aware there was such a performance discrepancy between From skimming the linked issue it thankfully seems like there's a lot of inflight work that would bring
Yes, consistency here is very important, using different clients would make everything much more complicated/confusing for little gain, especially as this performance discrepancy can be fixed.
Sorry about this, hopefully the httpx PRs can be merged soon. In the meantime it might be less work for you to use a patched version of I'm going to close this as we have no plans to move away from httpx. In the future we may offer a more extensible custom http client API which would allow you to use any http library as long as you implement the interface. However this isn't likely to happen anytime soon unfortunately. I'll see if we can help land the |
Understood, thanks for the quick reply! |
Hi everyone, I'm working on a chatbot and ran some load testing and profiling under high concurrent users. It appears that the performance of Are there any updates or planned improvements on this? Or would it be worth benchmarking my application with Thanks in advance for your insights! |
^ Likewise, to +1 this, our aiohttp impl is substantially faster, sometimes even twice as much with aiohttp - I go from about 100 requests per second to 200+ on my local machine, so not sure what the discrepancy is caused by. |
We've moved from using httpx to aiohttp and it solved a lot of our concurrency issues - it means we don't use the openai client which I was initially concerned about due to the theoretical connection sharing benefit of using the client but this pales in comparison. I wish i'd swapped sooner. |
this is the aiohttp class we are using to maximize throughput while this issue is open: class OpenAISession:
def __init__(
self,
api_key: str,
):
self.api_key = api_key
self.client = OpenAISession._make_client(api_key)
@staticmethod
def _make_client(api_key: str) -> ClientSession:
connector = TCPConnector(limit=500)
timeout = ClientTimeout(total=600, connect=5)
client = ClientSession(connector=connector, timeout=timeout)
client.headers["Authorization"] = f"Bearer {api_key}"
client.headers["Content-Type"] = "application/json"
return client
async def chat_completion(
self, model: str, messages: list[dict], temperature: float
) -> Completion:
if self.client.closed:
self.client = OpenAISession._make_client(self.api_key)
try:
payload = {
"model": model,
"messages": messages,
"temperature": temperature,
}
logging.info(f"OpenAI API Request Payload: {json.dumps(payload, indent=2)}")
async with self.client.post(
"https://api.openai.com/v1/chat/completions",
json=payload,
timeout=ClientTimeout(total=60),
) as resp:
if resp.status == 429:
raise RateLimitError()
elif resp.status in (500, 503):
raise TemporaryFailure(resp.reason)
elif resp.status == 400:
error_body = await resp.text()
logging.error(f"OpenAI API 400 Error: {error_body}")
logging.error(f"Request payload that caused error: {json.dumps(payload, indent=2)}")
resp.raise_for_status()
return cast(Completion, await resp.json())
except (asyncio.TimeoutError, ServerTimeoutError, ServerDisconnectedError) as e:
# https://github.com/aio-libs/aiohttp/issues/8133
raise TemporaryFailure(str(e)) from e
async def __aenter__(self) -> Self:
await self.client.__aenter__()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
return await self.client.__aexit__(exc_type, exc_val, exc_tb) |
In my case, after profiling, I identified a performance overhead in the ![]() I came across these pull requests addressing similar issues, but they haven't been merged yet: Issue #3215 comment. I tested the optimizations by installing the patches from the pip install --upgrade git+https://github.com/MarkusSintonen/httpcore.git@all-optimizations With these changes, the performance overhead in It would be very helpful if the OpenAI SDK supports a drop-in replacement for |
If performance is still a concern here, please try |
@RobertCraigie : I see you have closed this marking it as not planned here. Shouldn't this be a priority item for OpenAI SDK team to make this SDK usable for use cases to scale ? Doesn't this discourage users to use this SDK at scale ? |
Same here... :( I am a developer of an AI character service that provides interactive conversations, and it uses large language models (LLMs) to generate the characters' responses. The service has used openai package v0.27.4 to request to LLMs. After upgrading openai package to v1.59.4, I noticed a decrease in the server's throughput and an increase in the latency of LLM hosting servers. I suspect that httpx is the cause, as discussed in related conversations, and I would like to explore potential solutions. I look forward to responses from contributors. |
Sorry about this – we're tracking primarily at encode/httpx#3215 but I agree it makes sense to leave this open, as one way or another I do agree this needs to get solved for users of this SDK. |
Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
The API client uses httpx, which has very poor performance when making concurrent requests compared to aiohttp. Open issue for httpx here
This is forcing us to swap out the OpenAI SDK for our own implementation, which is a pain.
I suspect it is the root cause of the difference between node.js and Python demonstrated here
I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this. However for our high concurrency async use cases this renders the SDK useless.
To Reproduce
To reproduce, run chat completion requests in parallel with 20+ concurrent requests, benchmarking the openai API client against an implementation using aiohttp. Example code can be found in the linked issue in httpx.
Code snippets
No response
OS
Linux/MacOs
Python version
v3.12
Library version
1.12.0
The text was updated successfully, but these errors were encountered: