Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
-
Updated
Apr 4, 2026 - Rust
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
Use LLMs to robustly extract web data
Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.
基于Scala Akka的分布式主题网络爬虫
Serverless AI browser agent
Low-Cost Cross-Domain Web Structured Information Extraction using specialized LoRA adapters.
Automatic extraction of the information on local event from a webpage with Machine Learning
Predicting product recommendation score using the data available on the website of the client
A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.
Programming assignments for Web Information Extraction and Retrieval, FRI UL, 2021. PA1: standalone webcrawler of .gov.si web sites, PA2: approaches of the structured web data extraction, PA3: Data processing and indexing and Data retrieval.
This project is a command-line tool that extracts text from web pages and PDF files, including scanned documents. It supports various extraction methods. This tool is ideal for data scraping, NLP preprocessing, and content analysis.
MarkGrab plugin for Claude Code — web content extraction to LLM-ready markdown
pinterest data extraction toolkit
google search real-time results
Local-first search tool layer for AI agents, built with FastAPI, SearXNG, and Trafilatura.
Add a description, image, and links to the web-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-extraction topic, visit your repo's landing page and select "manage topics."