Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aiohttp stub records the full url in the recorded data regardless of parameter filtering #517

Closed
vEpiphyte opened this issue Apr 1, 2020 · 2 comments · Fixed by #740
Closed
Labels

Comments

@vEpiphyte
Copy link

vEpiphyte commented Apr 1, 2020

When using the filter_query_parameters argument, the aiohttp stub still records the full url given by the response object for later reconstruction of the yarl response. This is an unexpected behavior and can lead to secrets being leaked. Based on a review of the code, I think the Tornado stub may also have this same issue (but I haven't verified that). This leads us to having to manually scrub filtered values from cassettes to avoid secrets being committed into repositories.

Here is an example test case and recorded cassettes. The query parameter sometoken is intended to be dropped from storage (please ignore the fact that that the test target, httpbin, includes the query parameters in the response body).

import os
import json
import asyncio
import unittest

import vcr
import aiohttp
import requests

URL = 'https://httpbin.org/get?arg1=val1&sometoken=sekrit'
ASSETDIR = './'

FILTER_QUERY_PAREMETERS = ['sometoken']

class FilterQueryParamTest(unittest.TestCase):
    # Helper code, replicated here
    def _getVcrArgs(self):
        kwargs = getattr(self, 'vcr_kwargs', {})
        return kwargs

    def getVcr(self, **kwargs):
        # simplification for the vcr-unittest library.
        fn = '{0}.{1}.yaml'.format(self.__class__.__name__,
                                   self._testMethodName)
        fp = os.path.join(ASSETDIR, fn)
        _kwargs = self._getVcrArgs()
        _kwargs.update(kwargs)
        myvcr = vcr.VCR(**_kwargs)
        cm = myvcr.use_cassette(fp)
        return cm

    def test_requests(self):
        with self.getVcr(filter_query_parameters=FILTER_QUERY_PAREMETERS):
            resp = requests.get(URL)
            d = resp.json()
        self.assertIsInstance(d, dict)

    def test_aiohttp(self):
        async def coro():
            with self.getVcr(filter_query_parameters=FILTER_QUERY_PAREMETERS):
                async with aiohttp.ClientSession() as session:
                    async with session.get(URL) as response:
                        text = await response.text()
            return text

        result = asyncio.run(coro())
        d = json.loads(result)
        self.assertIsInstance(d, dict)

This is the requests output which does not have the filtered value present in any vcrpy constructs:

interactions:
- request:
    body: null
    headers:
      Accept:
      - '*/*'
      Accept-Encoding:
      - gzip, deflate
      Connection:
      - keep-alive
      User-Agent:
      - python-requests/2.23.0
    method: GET
    uri: https://httpbin.org/get?arg1=val1
  response:
    body:
      string: "{\n  \"args\": {\n    \"arg1\": \"val1\", \n    \"sometoken\": \"sekrit\"\n
        \ }, \n  \"headers\": {\n    \"Accept\": \"*/*\", \n    \"Accept-Encoding\":
        \"gzip, deflate\", \n    \"Host\": \"httpbin.org\", \n    \"User-Agent\":
        \"python-requests/2.23.0\", \n    \"X-Amzn-Trace-Id\": \"Root=1-5e849e8a-908aad86267b92fe372fa1da\"\n
        \ }, \n  \"origin\": \"8.8.8.8\", \n  \"url\": \"https://httpbin.org/get?arg1=val1&sometoken=sekrit\"\n}\n"
    headers:
      Access-Control-Allow-Credentials:
      - 'true'
      Access-Control-Allow-Origin:
      - '*'
      Connection:
      - keep-alive
      Content-Length:
      - '385'
      Content-Type:
      - application/json
      Date:
      - Wed, 01 Apr 2020 14:00:42 GMT
      Server:
      - gunicorn/19.9.0
    status:
      code: 200
      message: OK
version: 1

This is the aiohttp response which does include the unfiltered token in the url key of the response:

interactions:
- request:
    body: null
    headers: {}
    method: GET
    uri: https://httpbin.org/get?arg1=val1
  response:
    body:
      string: "{\n  \"args\": {\n    \"arg1\": \"val1\", \n    \"sometoken\": \"sekrit\"\n
        \ }, \n  \"headers\": {\n    \"Accept\": \"*/*\", \n    \"Accept-Encoding\":
        \"gzip, deflate\", \n    \"Host\": \"httpbin.org\", \n    \"User-Agent\":
        \"Python/3.7 aiohttp/3.6.0\", \n    \"X-Amzn-Trace-Id\": \"Root=1-5e849e89-2fdc0c40f507cd205231bb40\"\n
        \ }, \n  \"origin\": \"8.8.8.8\", \n  \"url\": \"https://httpbin.org/get?arg1=val1&sometoken=sekrit\"\n}\n"
    headers:
      Access-Control-Allow-Credentials: 'true'
      Access-Control-Allow-Origin: '*'
      Connection: keep-alive
      Content-Length: '387'
      Content-Type: application/json
      Date: Wed, 01 Apr 2020 14:00:41 GMT
      Server: gunicorn/19.9.0
    status:
      code: 200
      message: OK
    url: https://httpbin.org/get?arg1=val1&sometoken=sekrit
version: 1

Edit:

This behavior was tested against the current master of the vcrpy project, but has also been observed in the latest 4.x release as well.

@DevilXD
Copy link

DevilXD commented Apr 29, 2020

Just wanted to note here that this can be remedied "in the mean time" with something like this:

def filter_response(response):
    response["url"] = ''  # hide the URL
    return response

..., and then making your own VCR instance, passing filter_response to the before_record_response kwarg. This doesn't seem to impact the later response playback in any negative way.

@scop
Copy link
Contributor

scop commented Sep 1, 2022

Here's a more elaborate implementation I've been using that does the actual response["url"] query param filtering:
https://github.com/scop/pytekukko/blob/c43533c8c8ff46f5fc1114e0d19a9cde58c89ae9/tests/test_pytekukko.py#L36-L50
(Nowhere near perfect: it expects replacements, does not do removal, and does not preserve more than one instance of a param, but works for my use case.)

Haven't tried just emptying the whole URL myself, nice if that works, but this one FTR in case someone finds a problem with the emptying approach.

salomvary added a commit to salomvary/vcrpy that referenced this issue Mar 3, 2023
For some unknown reason, the aiohttp stub was recording the unfiltered
*request* URL as part of the response in the cassette. Other stubs do not seem
to do this and I found no other requirement for response.url to be present,
therefore the easy fix was to remove it entirely.

Fixes kevin1024#517.
@hartwork hartwork added the bug label Jul 10, 2023
hartwork added a commit that referenced this issue Jul 19, 2023
…er-for-aiohttp

Fix query param filter for aiohttp (fixes #517)
Harmon758 added a commit to Harmon758/Harmonbot that referenced this issue Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants