Skip to content

Edit regular expression in charge of removing anchor, simply add 'colon'#72

Open
ABrisset wants to merge 1 commit into
taganaka:masterfrom
ABrisset:master
Open

Edit regular expression in charge of removing anchor, simply add 'colon'#72
ABrisset wants to merge 1 commit into
taganaka:masterfrom
ABrisset:master

Conversation

@ABrisset

Copy link
Copy Markdown

I found that urls containing anchors like "#sku:123" (e.g a semi-colon) were not cleaned up when passed to the to_absolute method . As a consequence, they were escaped and added to the queue of the crawler, which led to 404 errors. This kind of bug is related to the issue I described here.

To fix it, this commit adds a colon in the regular expression used to remove anchor from urls in the to_absolute method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant