ArchiveBot job e3pq9nd3o10nud4gctgm2nnz0 for http://www.stevenholcomb.com/ (viewer) failed to recurse. It only grabbed the homepage, robots.txt, sitemap.xml, and two (broken) URLs in the sitemap.
I tested with a simpler command and was able to reproduce this with wpull --recursive --level inf --no-verbose --html-parser libxml2-lxml http://www.stevenholcomb.com/ on one of my pipelines with wpull 2.0.3. But when using html5lib, it recurses correctly.
With commit ec24bba (PR #393), however, I'm unable to reproduce it on another machine (different Python version, libraries, etc.). So maybe possibly this is fixed already, but it needs further investigation.
The server's sending UTF-16LE-encoded HTML (without advertising it in a header), which might play a role in this.
ArchiveBot job e3pq9nd3o10nud4gctgm2nnz0 for http://www.stevenholcomb.com/ (viewer) failed to recurse. It only grabbed the homepage, robots.txt, sitemap.xml, and two (broken) URLs in the sitemap.
I tested with a simpler command and was able to reproduce this with
wpull --recursive --level inf --no-verbose --html-parser libxml2-lxml http://www.stevenholcomb.com/on one of my pipelines with wpull 2.0.3. But when using html5lib, it recurses correctly.With commit ec24bba (PR #393), however, I'm unable to reproduce it on another machine (different Python version, libraries, etc.). So maybe possibly this is fixed already, but it needs further investigation.
The server's sending UTF-16LE-encoded HTML (without advertising it in a header), which might play a role in this.