web-extraction

Here are 18 public repositories matching this topic...

0xMassi / webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

Updated Apr 4, 2026
Rust

lightfeed / extractor

Star

Use LLMs to robustly extract web data

Updated Apr 4, 2026
TypeScript

platonai / PulsarRPAPro

Star

Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.

ai web-crawler web-scraping web-extraction rpa mlscraping auto-web-mining

Updated Dec 8, 2025
Kotlin

iamxiatian / octopus_spider

Star

基于Scala Akka的分布式主题网络爬虫

crawler akka spider web-extraction scala-spider scala-crawler akka-spider akka-crawler

Updated Sep 2, 2019
Scala

lightfeed / browser-agent

Star

Serverless AI browser agent

automation browser ai aws-lambda serverless scraping crawling web-scraping serverless-framework web-crawling browser-automation ai-agents web-extraction playwright browser-agent

Updated Apr 3, 2026
TypeScript

abdo-Mansour / axetract

Star

Low-Cost Cross-Domain Web Structured Information Extraction using specialized LoRA adapters.

nlp data-mining information-extraction web-scraping html-parsing lora cross-domain ai-agents web-extraction llm structured-data-extraction vllm qwen dom-pruning

Updated Mar 30, 2026
Python

galinaalperovich / Ms-Thesis-CVUT

Star

Automatic extraction of the information on local event from a webpage with Machine Learning

machine-learning information-retrieval information-extraction web-extraction

Updated May 26, 2017
Jupyter Notebook

akshatsinghal92 / Product-recommendation-analysis

Star

Predicting product recommendation score using the data available on the website of the client

python nlp machine-learning word-embeddings regression-models partial-dependence-plot web-extraction selenium-python textblob-sentiment-analysis universal-sentence-encoder seaborn-plots

Updated Nov 13, 2021
Jupyter Notebook

avirathtib / scrapeneatly

Star

A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.

open-source scraping web-extraction structured-web-data llms

Updated Feb 24, 2025
Python

Programming assignments for Web Information Extraction and Retrieval, FRI UL, 2021. PA1: standalone webcrawler of .gov.si web sites, PA2: approaches of the structured web data extraction, PA3: Data processing and indexing and Data retrieval.

python html regex xpath webcrawler web-extraction webcrawling

Updated Jul 26, 2021
HTML

franciscomvargas / DeUrlCruncher

Star

Get google URL results from search query

web-extraction

Updated Feb 26, 2024
Batchfile

gazelle93 / Various-Web-Text-Extraction-Methods

Star

This project is a command-line tool that extracts text from web pages and PDF files, including scanned documents. It supports various extraction methods. This tool is ideal for data scraping, NLP preprocessing, and content analysis.

nlp natural-language-processing text-extraction web-extraction pdf-extraction