Skip to content

aghiuru/pagefold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pagefold

Download web articles into self-contained markdown folders for offline reading and archiving.

What it does

Given a URL, pagefold fetches the article, extracts the content as markdown, and downloads all images — producing a portable folder you can read anywhere.

output/
  my-article-slug/
    my-article-slug.md
    images/
      a1b2c3.jpg
      d4e5f6.png

The markdown file includes the title, author, publication date, source URL, and article body with local image references.

Features

  • Extracts clean markdown from articles using trafilatura
  • Downloads and embeds all images (parallel, deduplicated by URL hash)
  • Auto-detects JS-heavy pages and renders them with a headless browser (Playwright)
  • Handles cookie consent dialogs automatically

Installation

Requires Python 3.11+ and pipx.

pipx install git+https://github.com/aghiuru/pagefold.git

Usage

pagefold <URL>
pagefold <URL> --output-dir ~/articles
Flag Default Description
--output-dir DIR ./output Directory to save the article folder
-h, --help Show help

Example

pagefold "https://example.com/some-article" --output-dir ~/reading

About

Download an article and its images into a self-contained folder.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages