httpx client has very poor performance for concurrent requests compared to aiohttp #1596

willthayes · 2024-08-05T10:05:18Z

Confirm this is an issue with the Python library and not an underlying OpenAI API

This is an issue with the Python library

Describe the bug

The API client uses httpx, which has very poor performance when making concurrent requests compared to aiohttp. Open issue for httpx here

This is forcing us to swap out the OpenAI SDK for our own implementation, which is a pain.

I suspect it is the root cause of the difference between node.js and Python demonstrated here

I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this. However for our high concurrency async use cases this renders the SDK useless.

To Reproduce

To reproduce, run chat completion requests in parallel with 20+ concurrent requests, benchmarking the openai API client against an implementation using aiohttp. Example code can be found in the linked issue in httpx.

Code snippets

No response

OS

Linux/MacOs

Python version

v3.12

Library version

1.12.0

RobertCraigie · 2024-08-05T14:15:06Z

Interesting, I was not aware there was such a performance discrepancy between aiohttp and httpx.

From skimming the linked issue it thankfully seems like there's a lot of inflight work that would bring httpx up to par performance-wise.

I'm not massively familiar with the development of this SDK, and whether there is a key reason for picking httpx over aiohttp. From my reading it was switched over for V1 in order to create consistency between sync and async clients, but I'm not sure how vital it is to achieve this.

Yes, consistency here is very important, using different clients would make everything much more complicated/confusing for little gain, especially as this performance discrepancy can be fixed.

However for our high concurrency async use cases this renders the SDK useless.

Sorry about this, hopefully the httpx PRs can be merged soon.

In the meantime it might be less work for you to use a patched version of httpx with the performance fixes included in the linked issue.

I'm going to close this as we have no plans to move away from httpx. In the future we may offer a more extensible custom http client API which would allow you to use any http library as long as you implement the interface. However this isn't likely to happen anytime soon unfortunately.

I'll see if we can help land the httpx / httpcore PRs faster.

willthayes · 2024-08-05T14:20:45Z

Understood, thanks for the quick reply!

ShubhamMaddhashiya-bidgely · 2024-10-29T08:17:53Z

Hi everyone,

I'm working on a chatbot and ran some load testing and profiling under high concurrent users. It appears that the performance of httpx in asynchronous mode, which is used by the OpenAI Python SDK, isn't meeting expectations.

Are there any updates or planned improvements on this? Or would it be worth benchmarking my application with aiohttp to see if it performs better under high concurrency?

Thanks in advance for your insights!

opalrose-510 · 2024-10-31T00:05:37Z

^ Likewise, to +1 this, our aiohttp impl is substantially faster, sometimes even twice as much with aiohttp - I go from about 100 requests per second to 200+ on my local machine, so not sure what the discrepancy is caused by.

Tom-Standen · 2024-11-02T15:21:55Z

We've moved from using httpx to aiohttp and it solved a lot of our concurrency issues - it means we don't use the openai client which I was initially concerned about due to the theoretical connection sharing benefit of using the client but this pales in comparison. I wish i'd swapped sooner.

iliazintchenko · 2024-11-02T16:59:36Z

this is the aiohttp class we are using to maximize throughput while this issue is open:

class OpenAISession:
    def __init__(
        self,
        api_key: str,
    ):
        self.api_key = api_key
        self.client = OpenAISession._make_client(api_key)

    @staticmethod
    def _make_client(api_key: str) -> ClientSession:
        connector = TCPConnector(limit=500)
        timeout = ClientTimeout(total=600, connect=5)
        client = ClientSession(connector=connector, timeout=timeout)
        client.headers["Authorization"] = f"Bearer {api_key}"
        client.headers["Content-Type"] = "application/json"
        return client

    async def chat_completion(
        self, model: str, messages: list[dict], temperature: float
    ) -> Completion:
        if self.client.closed:
            self.client = OpenAISession._make_client(self.api_key)
        try:
            payload = {
                "model": model,
                "messages": messages,
                "temperature": temperature,
            }
            logging.info(f"OpenAI API Request Payload: {json.dumps(payload, indent=2)}")
            
            async with self.client.post(
                "https://api.openai.com/v1/chat/completions",
                json=payload,
                timeout=ClientTimeout(total=60),
            ) as resp:
                if resp.status == 429:
                    raise RateLimitError()
                elif resp.status in (500, 503):
                    raise TemporaryFailure(resp.reason)
                elif resp.status == 400:
                    error_body = await resp.text()
                    logging.error(f"OpenAI API 400 Error: {error_body}")
                    logging.error(f"Request payload that caused error: {json.dumps(payload, indent=2)}")
                resp.raise_for_status()
                return cast(Completion, await resp.json())
        except (asyncio.TimeoutError, ServerTimeoutError, ServerDisconnectedError) as e:
            # https://github.com/aio-libs/aiohttp/issues/8133
            raise TemporaryFailure(str(e)) from e

    async def __aenter__(self) -> Self:
        await self.client.__aenter__()
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
        return await self.client.__aexit__(exc_type, exc_val, exc_tb)

ShubhamMaddhashiya-bidgely · 2024-11-04T07:41:06Z

In my case, after profiling, I identified a performance overhead in the AsyncConnectionPool.handle_async_request method. It appears this is mainly due to inefficient handling of idle or expired connections, as revealed in the profiling. We could reduce time complexity by using a more optimized data structure, rather than repeatedly iterating over the connections.

I came across these pull requests addressing similar issues, but they haven't been merged yet: Issue #3215 comment.

I tested the optimizations by installing the patches from the all-optimizations branch by @MarkusSintonen:

pip install --upgrade git+https://github.com/MarkusSintonen/httpcore.git@all-optimizations

With these changes, the performance overhead in AsyncConnectionPool.handle_async_request was significantly reduced.

It would be very helpful if the OpenAI SDK supports a drop-in replacement for http_client, allowing us to swap httpx with aiohttp easily.

bdraco · 2024-11-14T23:24:43Z

If performance is still a concern here, please try aiohttp 3.11.x+ as we have smoothed out some more of the concurrency delays and cancellation races in this version.

https://docs.aiohttp.org/en/stable/changes.html

rachitchauhan43 · 2025-01-03T18:18:48Z

@RobertCraigie : I see you have closed this marking it as not planned here.

Shouldn't this be a priority item for OpenAI SDK team to make this SDK usable for use cases to scale ? Doesn't this discourage users to use this SDK at scale ?

kamillle · 2025-01-08T06:41:21Z

Same here... :(

I am a developer of an AI character service that provides interactive conversations, and it uses large language models (LLMs) to generate the characters' responses. The service has used openai package v0.27.4 to request to LLMs.

After upgrading openai package to v1.59.4, I noticed a decrease in the server's throughput and an increase in the latency of LLM hosting servers. I suspect that httpx is the cause, as discussed in related conversations, and I would like to explore potential solutions.

I look forward to responses from contributors.

rattrayalex · 2025-01-13T18:59:21Z

Sorry about this – we're tracking primarily at encode/httpx#3215 but I agree it makes sense to leave this open, as one way or another I do agree this needs to get solved for users of this SDK.

willthayes added the bug Something isn't working label Aug 5, 2024

RobertCraigie closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2024

rattrayalex mentioned this issue Aug 9, 2024

Remove shielding from cancellation process. encode/httpcore#927

Draft

This was referenced Dec 3, 2024

Add LiteLLM+instructor (for structured output) backend for curator bespokelabsai/curator#141

Merged

[Bug]: Cannot get past 50 RPS BerriAI/litellm#6592

Open

Tostino mentioned this issue Dec 27, 2024

OpenAI: Migrate to async client and enhance API support timescale/pgai#219

Closed

gfx mentioned this issue Jan 8, 2025

Slower than expected performance after upgrading #1131

Closed

1 task

rattrayalex marked this as a duplicate of #1131 Jan 13, 2025

rattrayalex mentioned this issue Jan 13, 2025

Improve async performance. encode/httpx#3215

Open

rattrayalex reopened this Jan 13, 2025

RobertCraigie marked this as a duplicate of #1725 Jan 22, 2025

RobertCraigie mentioned this issue Jan 22, 2025

The concurrency of AsyncOpenAI cannot be fully utilized. #1725

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

httpx client has very poor performance for concurrent requests compared to aiohttp #1596

httpx client has very poor performance for concurrent requests compared to aiohttp #1596

willthayes commented Aug 5, 2024

RobertCraigie commented Aug 5, 2024

willthayes commented Aug 5, 2024

ShubhamMaddhashiya-bidgely commented Oct 29, 2024

opalrose-510 commented Oct 31, 2024

Tom-Standen commented Nov 2, 2024

iliazintchenko commented Nov 2, 2024

ShubhamMaddhashiya-bidgely commented Nov 4, 2024

bdraco commented Nov 14, 2024

rachitchauhan43 commented Jan 3, 2025

kamillle commented Jan 8, 2025

rattrayalex commented Jan 13, 2025

httpx client has very poor performance for concurrent requests compared to aiohttp #1596

httpx client has very poor performance for concurrent requests compared to aiohttp #1596

Comments

willthayes commented Aug 5, 2024

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

RobertCraigie commented Aug 5, 2024

willthayes commented Aug 5, 2024

ShubhamMaddhashiya-bidgely commented Oct 29, 2024

opalrose-510 commented Oct 31, 2024

Tom-Standen commented Nov 2, 2024

iliazintchenko commented Nov 2, 2024

ShubhamMaddhashiya-bidgely commented Nov 4, 2024

bdraco commented Nov 14, 2024

rachitchauhan43 commented Jan 3, 2025

kamillle commented Jan 8, 2025

rattrayalex commented Jan 13, 2025