Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape redirect URLs in RealCDXExtractorOutput #36

Merged
merged 2 commits into from
Dec 17, 2014

Conversation

gerhardgossen
Copy link
Contributor

The classes does not escape the URLs it gets from the HTTP headers / the HTML meta tags. This makes the resulting CDX files invalid if the redirect URL contains spaces (see e.g. internetarchive/ia-hadoop-tools#4). This commit fixes that by passing the resolved URL through java.net.URI's multi-argument constructor which escapes the individual parts appropriately.

@anjackson
Copy link
Member

This looks good. Can you also add a note to the CHANGES.md file that summarises the change?

@gerhardgossen
Copy link
Contributor Author

Updated CHANGES.md

anjackson added a commit that referenced this pull request Dec 17, 2014
Escape redirect URLs in RealCDXExtractorOutput
@anjackson anjackson merged commit 598c524 into iipc:master Dec 17, 2014
@anjackson
Copy link
Member

Thanks, looks great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants