Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse incoming JSON in POST body #377

Closed
Changaco opened this issue Aug 7, 2014 · 26 comments
Closed

Parse incoming JSON in POST body #377

Changaco opened this issue Aug 7, 2014 · 26 comments

Comments

@Changaco
Copy link
Member

Changaco commented Aug 7, 2014

I'm implementing a simplate to receive webhook callbacks which are POST requests with JSON data in the body (and Content-Type: application/json in the headers). Adding support for JSON in Aspen's Body._parse() function could make it slightly easier to write such simplates. What do you think ?

@chadwhitacre
Copy link
Contributor

Sounds good to me.

@pjz
Copy link
Contributor

pjz commented Aug 7, 2014

I... don't understand your requirements. Why is this not just:

if request.line.method == 'POST' and headers.get('Content-Type', None):
    json = json.loads(request.body.raw)

@Changaco
Copy link
Member Author

Changaco commented Aug 7, 2014

@pjz json.loads(request.body.raw) is what I'm doing, but Aspen could do it for me, like it does for application/x-www-form-urlencoded and multipart/form-data.

@pjz
Copy link
Contributor

pjz commented Aug 7, 2014

JSON isn't really web-spec. It's common, but not specified in any part of HTTP anywhere. I'd be against special-casing it.

OTOH, I'd be all about figuring out some way to set up a general-case handler for arbitrary content-types. A way to specify "if the content-type of the POST body is application/json, run it through this handler and store the result in this variable." Hm, probably want that to be site-wide instead of per-endpoint, eh? So it could be an algorithm hook:

def deserialize_POSTed_json(request):
    if request.line.method == 'POST' and headers.get('Content-Type', None) == 'application/json':
        request.json = json.loads(request.body.raw)
    else:
        request.json = None

Put the above into your algorithm stack somewhere in the request-handling sequence and you should be GTG.

@chadwhitacre
Copy link
Contributor

@pjz I believe that application/x-www-form-urlencoded and multipart/form-data are also not specified in HTTP, but rather in HTML, so per your rationale those should also not be special-cased in our request object, eh?

@chadwhitacre
Copy link
Contributor

P.S. I'd be all for implementing body parsing in an algorithm function (that'd be part of "flatten algorithm further" in our roadmap; #357). In that case it seems that we should do the same with the existing special cases. Then the question is what body parsers we ship in our stock algorithm, and I think we should include an application/json body parser along with application/x-www-form-urlencoded and multipart/form-data.

@pjz
Copy link
Contributor

pjz commented Aug 7, 2014

Agreed! I think the request object should probably have a little registry where you can register content-type handlers or something.

@pjz
Copy link
Contributor

pjz commented Aug 7, 2014

Well, there should be a registry somewhere, at any rate. Maybe the request object? Oh, no, it's site wide. Probably the website object.

@Changaco
Copy link
Member Author

Changaco commented Aug 8, 2014

A registry seems like a simple and performant method to me. Something like:

website.body_parsers['application/json'] = json.loads

@pjz
Copy link
Contributor

pjz commented Aug 8, 2014

That sounds okay... but where should the result be put? That's going to require some management to avoid collisions in the website object's main namespace.

@chadwhitacre
Copy link
Contributor

[W]here should the result be put?

What do we do now? Don't we use request.body for both of the cases we currently support?

@pjz
Copy link
Contributor

pjz commented Aug 8, 2014

Ah, hmm. Yeah, that could work, though it looks like it may require some metaclass hacking or something to replicate the current functionality.

@pjz
Copy link
Contributor

pjz commented Aug 15, 2014

Actually we'll have to change the current API a little; instead of request.body.raw we'll have request.body and request.body_raw (though I'm open to other names for the second one - request.raw_body ? request.rawbody ? ).

@Changaco
Copy link
Member Author

I think I prefer raw_body. However, since you're changing the API, now might be a good time to expose the fact that the body is a stream, by giving access to the socket instead of copying it all into a buffer.

@pjz
Copy link
Contributor

pjz commented Aug 18, 2014

Hmm. I like giving access, but what happens if a body_parser doesn't set .raw_body ? Or is that a requirement? Though admittedly more memory hungry, It's certainly easier on any parser to work off of a string instead of a socket (worst case they turn it back into a socket with StringIO). I was thinking that it would basically:

  • set raw_body
  • call appropriate body_parser if any
  • set body to result of parser, or to raw_body if none

@Changaco
Copy link
Member Author

I like giving access, but what happens if a body_parser doesn't set .raw_body ?

Why would a body parser set .raw_body ? It seems to me that by definition .raw_body is what you have before going through any parser.

It's certainly easier on any parser to work off of a string instead of a socket

Not really, the parser just has to call .read() if it wants a bytestring.


Here's what I'm thinking:

  • have the socket accessible as .body_sock or .raw_body or whatever
  • turn .body into a lazy property
  • when .body is accessed look up the appropriate body parser in website.body_parsers and run it, or raise Response(415) if there is no parser for that Content-Type

@pjz
Copy link
Contributor

pjz commented Aug 19, 2014

If the parser reads the socket then the user can't, and vice versa. By pre-emptively reading the socket and storing the result in .raw_body, they can both access the raw data, so both .raw_body and .body can both be accessible to the programmer. I'm not sure why that's useful, but I hate taking away options.

I do like the Response(415) idea, though.

@Changaco
Copy link
Member Author

I'm not sure why that's useful

Me neither, but it is possible to have both, by first reading .raw_body and then using a BytesIO hack to get .body.

pjz added a commit that referenced this issue Aug 19, 2014
request.body is now the result of running the appropriate body_parser
(based on content-type header) over request.raw_body

Inlined the functionality of context.py into the dynamic_resource object
pjz added a commit that referenced this issue Aug 19, 2014
request.body is now the result of running the appropriate body_parser
(based on content-type header) over request.raw_body

Inlined the functionality of context.py into the dynamic_resource object
@pjz
Copy link
Contributor

pjz commented Aug 24, 2014

Like that, @Changaco ?

@Changaco
Copy link
Member Author

#379 isn't exactly what I proposed:

  • .body isn't a lazy property, it's parsed even if the simplate doesn't use it
  • Response(415) isn't raised when the Content-Type is unknown

but aside from that it looks okay.

@pjz
Copy link
Contributor

pjz commented Aug 25, 2014

Should the Response(415) be raised lazily or immediately?

@Changaco
Copy link
Member Author

If .body becomes lazy then I think it makes sense to raise the exception lazily.

@pjz
Copy link
Contributor

pjz commented Aug 25, 2014

okay, fixed in the latest version of #379

@pjz pjz closed this as completed in ab377fc Sep 2, 2014
chadwhitacre added a commit that referenced this issue Sep 2, 2014
Fix #377: Reify body_parsers and make request.body dynamic
@ravitejabadisa
Copy link

HI i am facing an issue whe i have migrated code from my local system to AWS

RawPostDataException at /users/
You cannot access body after reading from request's data stream

It seems i can't access request.method and request.data in views.py simulteneously.
Is this specific to AWS?Do i nedd to do some specific config changes to resolve this issue?

It would be great if you can suggest some work around for this issue

@pjz
Copy link
Contributor

pjz commented Nov 30, 2015

If you meant to post this as an issue, you should open a new one, not revive one that's over a year old. Also, you should include more detail, up to and including simplified code exemplifying your problem.

@Changaco
Copy link
Member Author

A quick web search shows that RawPostDataException is a Django exception.

Changaco pushed a commit that referenced this issue Mar 11, 2016
request.body is now the result of running the appropriate body_parser
(based on content-type header) over request.raw_body

Inlined the functionality of context.py into the dynamic_resource object
Changaco pushed a commit that referenced this issue Mar 11, 2016
Fix #377: Reify body_parsers and make request.body dynamic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants