Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Use httplib's chunked encoding support? #2291

Closed
bluetech opened this issue Jun 21, 2021 · 3 comments · Fixed by #2565
Closed

[v2] Use httplib's chunked encoding support? #2291

bluetech opened this issue Jun 21, 2021 · 3 comments · Fixed by #2565
Milestone

Comments

@bluetech
Copy link
Member

urllib3.HTTPConnection has a method request_chunked which sends the body with chunked encoding. Since Python 3.6, httplib supports this directly, with very similar-looking code. Since v2 now requires Python 3.6, I was wondering if we might want to use this.

I haven't checked if there are any gotchas there, this is more of a placeholder issue since I couldn't find an existing issue discussing this.

@sethmlarson
Copy link
Member

Kind-of related to this, I was wondering if we should instead add a chunked parameter to HTTPConnection.request() and shim .request_chunked() to simply call return self.request(..., chunked=True). Reasoning being that the rest of our API looks like this, seems more natural to have .request(..., chunked=True) everywhere instead of at every level until you get to HTTPConnection and suddenly a shift to two different methods?

@sethmlarson
Copy link
Member

The above change would also bring us closer to the stdlib's implementation but keep things urllib3-flavored, we're already married to the chunked parameter (and I like it better than encode_chunked tbh)

@bluetech
Copy link
Member Author

Kind-of related to this, I was wondering if we should instead add a chunked parameter to HTTPConnection.request()

IMO this is a good idea. Several places in the code do if chunked: request_chunked() else: request(). We might as well do that inside request.


Regarding using httplib, I took a closer look at the differences between urllib3's current request_chunked() and httplib's request(encode_chunked=True).

First there are differences with how the headers are handled and how whether to do chunking is decided:

  • urllib3:

    • If Transfer-Encoding is not set, set Transfer-Encoding: chunked.
    • Do chunked.
  • httplib:

    • If Content-Length is set, not chunked.
    • If Transfer-Encoding is set (to anything), chunked.
    • If Content-Length can be determined, set Content-Length and proceed not chunked.
    • Set Transfer-Encoding: chunked and do chunked.

Next, differences in how the body argument is handled:

  1. If the body (or chunks in an iterable body) is str, httplib encodes with latin-1, urllib3 with utf-8.
  2. If the body is a readable, httplib splits to chunks of blocksize, while urllib3 seems to iterate it (which will split chunks on \n I think? Probably not what we want).
  3. httplib checks for bytes-like body with memoryview(...), urllib3 only accepts bytes.
  4. urllib3 does the final self.send(b"0\r\n\r\n") even if the body is None (rationale: "After the if clause, to always have a closed body"), httplib doesn't.

I'll leave the actual analysis of this to another day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants