pagefold

Download web articles into self-contained markdown folders for offline reading and archiving.

What it does

Given a URL, pagefold fetches the article, extracts the content as markdown, and downloads all images — producing a portable folder you can read anywhere.

output/
  my-article-slug/
    my-article-slug.md
    images/
      a1b2c3.jpg
      d4e5f6.png

The markdown file includes the title, author, publication date, source URL, and article body with local image references.

Features

Extracts clean markdown from articles using trafilatura
Downloads and embeds all images (parallel, deduplicated by URL hash)
Auto-detects JS-heavy pages and renders them with a headless browser (Playwright)
Handles cookie consent dialogs automatically

Installation

Requires Python 3.11+ and pipx.

pipx install git+https://github.com/aghiuru/pagefold.git

Usage

pagefold <URL>
pagefold <URL> --output-dir ~/articles

Flag	Default	Description
`--output-dir DIR`	`./output`	Directory to save the article folder
`-h, --help`		Show help

Example

pagefold "https://example.com/some-article" --output-dir ~/reading

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.claude		.claude
src/pagefold		src/pagefold
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pagefold

What it does

Features

Installation

Usage

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pagefold

What it does

Features

Installation

Usage

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages