Scraper

This repo describes how a scraper should work and gives helper libaries for building a scraper

Open API Spec (scraper should implement this and is implemented by the libaries under here)
Lib spec

Why is this repo public?

This makes it a lot easier to use this as a library in other project.

But keep that also in mind when adding new features

Add lib

bun add git+https://bitbucket.org/teamscript/scraper-protocol.git#main

Usable docs

The example below expects the following shell variables: .env.example

// Replace the hash with the latest commit hash
import {
	Server,
	Handlers,
	Cv,
} from "scraper-protocol"

const handlers: Handlers = {
	checkCredentials(_server, username, password) {
		return username == "root" && password == "toor"
	},
}

const server = new Server("scraper-slug", handlers, {})

// Start the server, this returns a promise but will not be awaited as it will basically block forever
server.startServer()

const loginUsers = server.getUsers(true)
console.log("loginUsers", loginUsers)

// Start scraping here

const cv: Cv = {
	referenceNumber: "test-123",
}
server.sendCv(cv)

Adding a custom handler

// Replace the hash with the latest commit hash
import {
	Server,
	Handlers,
	Cv,
} from "scraper-protocol"

const handlers: Handlers = {
	checkCredentials(username, password) {
		return username == "root" && password == "toor"
	},
}

// Custom handlers can be added to the server by passing them in the constructor
const server = new Server("scraper-slug", handlers, {
	// Add custom handlers
	customHandlers: [
		{
			method: "GET",
			path: "/hello",
			handler: (_: Request) => {
				return new Response("Hello World")
			},
		},
	],
})

// Custom handlers can also be added after the server has been created
server.addCustomHandler([
	{
		method: "GET",
		path: "/bye",
		handler: (_: Request) => {
			return new Response("Bye World")
		},
	},
])

// Start the server, this returns a promise but will not be awaited as it will basically block forever
server.startServer()

const loginUsers = server.getUsers(true)
console.log("loginUsers", loginUsers)

// Start scraping here

const cv: Cv = {
	referenceNumber: "test-123",
}
server.sendCv(cv)

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github/workflows		.github/workflows
lib		lib
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
bun.lock		bun.lock
index.ts		index.ts
lib_spec.md		lib_spec.md
openapi.yaml		openapi.yaml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraper

Why is this repo public?

Add lib

Usable docs

Adding a custom handler

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scraper

Why is this repo public?

Add lib

Usable docs

Adding a custom handler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages