Skip to content
Alexis Pavlidis edited this page Jul 16, 2023 · 6 revisions

Contents

Assumptions

In order to enhance the flexibility of our web-crawl application and ensure efficient use of resources, we have implemented a limit feature for the number of URLs that can be crawled. This allows us to better cater to the needs of our end-users while also protecting the overall service.

The purpose of setting a limit is to establish a maximum threshold for the number of URLs that can be processed by the web-crawl application. By default, the limit is set to 100 if the user does not specify a different value. This default value acts as a safeguard, preventing excessive consumption of time and resources that may occur when crawling a large number of URLs.

Methods

You can find the API exposed publicly using Open API.

Crawl

Crawl a URL:

GET /crawl?url={url}

Returns an array of UrlResponse.

Entities

UrlResponse

Attribute Description
url Url crawled
links A list of URLs found in the crawled Url

Examples

Crawl

Crawling a page "https://monzo.com"

GET /crawl?url=https://monzo.com

Crawl with a specified limit

Crawling a page "https://monzo.com" with a limit of 10 crawled URLs.

GET /crawl?url=https://monzo.com&limit=10

Clone this wiki locally