Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpx client has very poor performance for concurrent requests compared to aiohttp #1596

Open
1 task done
willthayes opened this issue Aug 5, 2024 · 11 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@willthayes
Copy link

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

The API client uses httpx, which has very poor performance when making concurrent requests compared to aiohttp. Open issue for httpx here

This is forcing us to swap out the OpenAI SDK for our own implementation, which is a pain.

I suspect it is the root cause of the difference between node.js and Python demonstrated here

I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this. However for our high concurrency async use cases this renders the SDK useless.

To Reproduce

To reproduce, run chat completion requests in parallel with 20+ concurrent requests, benchmarking the openai API client against an implementation using aiohttp. Example code can be found in the linked issue in httpx.

Code snippets

No response

OS

Linux/MacOs

Python version

v3.12

Library version

1.12.0

@willthayes willthayes added the bug Something isn't working label Aug 5, 2024
@RobertCraigie
Copy link
Collaborator

Interesting, I was not aware there was such a performance discrepancy between aiohttp and httpx.

From skimming the linked issue it thankfully seems like there's a lot of inflight work that would bring httpx up to par performance-wise.

I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this.

Yes, consistency here is very important, using different clients would make everything much more complicated/confusing for little gain, especially as this performance discrepancy can be fixed.

However for our high concurrency async use cases this renders the SDK useless.

Sorry about this, hopefully the httpx PRs can be merged soon.

In the meantime it might be less work for you to use a patched version of httpx with the performance fixes included in the linked issue.


I'm going to close this as we have no plans to move away from httpx. In the future we may offer a more extensible custom http client API which would allow you to use any http library as long as you implement the interface. However this isn't likely to happen anytime soon unfortunately.

I'll see if we can help land the httpx / httpcore PRs faster.

@RobertCraigie RobertCraigie closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2024
@willthayes
Copy link
Author

Understood, thanks for the quick reply!

@ShubhamMaddhashiya-bidgely

Hi everyone,

I'm working on a chatbot and ran some load testing and profiling under high concurrent users. It appears that the performance of httpx in asynchronous mode, which is used by the OpenAI Python SDK, isn't meeting expectations.

Are there any updates or planned improvements on this? Or would it be worth benchmarking my application with aiohttp to see if it performs better under high concurrency?

Thanks in advance for your insights!

@opalrose-510
Copy link

^ Likewise, to +1 this, our aiohttp impl is substantially faster, sometimes even twice as much with aiohttp - I go from about 100 requests per second to 200+ on my local machine, so not sure what the discrepancy is caused by.

@Tom-Standen
Copy link

We've moved from using httpx to aiohttp and it solved a lot of our concurrency issues - it means we don't use the openai client which I was initially concerned about due to the theoretical connection sharing benefit of using the client but this pales in comparison. I wish i'd swapped sooner.

@iliazintchenko
Copy link

this is the aiohttp class we are using to maximize throughput while this issue is open:

class OpenAISession:
    def __init__(
        self,
        api_key: str,
    ):
        self.api_key = api_key
        self.client = OpenAISession._make_client(api_key)

    @staticmethod
    def _make_client(api_key: str) -> ClientSession:
        connector = TCPConnector(limit=500)
        timeout = ClientTimeout(total=600, connect=5)
        client = ClientSession(connector=connector, timeout=timeout)
        client.headers["Authorization"] = f"Bearer {api_key}"
        client.headers["Content-Type"] = "application/json"
        return client

    async def chat_completion(
        self, model: str, messages: list[dict], temperature: float
    ) -> Completion:
        if self.client.closed:
            self.client = OpenAISession._make_client(self.api_key)
        try:
            payload = {
                "model": model,
                "messages": messages,
                "temperature": temperature,
            }
            logging.info(f"OpenAI API Request Payload: {json.dumps(payload, indent=2)}")
            
            async with self.client.post(
                "https://api.openai.com/v1/chat/completions",
                json=payload,
                timeout=ClientTimeout(total=60),
            ) as resp:
                if resp.status == 429:
                    raise RateLimitError()
                elif resp.status in (500, 503):
                    raise TemporaryFailure(resp.reason)
                elif resp.status == 400:
                    error_body = await resp.text()
                    logging.error(f"OpenAI API 400 Error: {error_body}")
                    logging.error(f"Request payload that caused error: {json.dumps(payload, indent=2)}")
                resp.raise_for_status()
                return cast(Completion, await resp.json())
        except (asyncio.TimeoutError, ServerTimeoutError, ServerDisconnectedError) as e:
            # https://github.com/aio-libs/aiohttp/issues/8133
            raise TemporaryFailure(str(e)) from e

    async def __aenter__(self) -> Self:
        await self.client.__aenter__()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
        return await self.client.__aexit__(exc_type, exc_val, exc_tb)

@ShubhamMaddhashiya-bidgely

In my case, after profiling, I identified a performance overhead in the AsyncConnectionPool.handle_async_request method. It appears this is mainly due to inefficient handling of idle or expired connections, as revealed in the profiling. We could reduce time complexity by using a more optimized data structure, rather than repeatedly iterating over the connections.

Screenshot 2024-10-28 at 6 55 03 PM (1)

I came across these pull requests addressing similar issues, but they haven't been merged yet: Issue #3215 comment.

I tested the optimizations by installing the patches from the all-optimizations branch by @MarkusSintonen:

pip install --upgrade git+https://github.com/MarkusSintonen/httpcore.git@all-optimizations

With these changes, the performance overhead in AsyncConnectionPool.handle_async_request was significantly reduced.

It would be very helpful if the OpenAI SDK supports a drop-in replacement for http_client, allowing us to swap httpx with aiohttp easily.

@bdraco
Copy link

bdraco commented Nov 14, 2024

If performance is still a concern here, please try aiohttp 3.11.x+ as we have smoothed out some more of the concurrency delays and cancellation races in this version.

https://docs.aiohttp.org/en/stable/changes.html

@rachitchauhan43
Copy link

@RobertCraigie : I see you have closed this marking it as not planned here.

Shouldn't this be a priority item for OpenAI SDK team to make this SDK usable for use cases to scale ? Doesn't this discourage users to use this SDK at scale ?

@kamillle
Copy link

kamillle commented Jan 8, 2025

Same here... :(

I am a developer of an AI character service that provides interactive conversations, and it uses large language models (LLMs) to generate the characters' responses. The service has used openai package v0.27.4 to request to LLMs.

After upgrading openai package to v1.59.4, I noticed a decrease in the server's throughput and an increase in the latency of LLM hosting servers. I suspect that httpx is the cause, as discussed in related conversations, and I would like to explore potential solutions.

I look forward to responses from contributors.

@rattrayalex
Copy link
Collaborator

Sorry about this – we're tracking primarily at encode/httpx#3215 but I agree it makes sense to leave this open, as one way or another I do agree this needs to get solved for users of this SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants