Skip to main content

Crawl

Breadth-first crawl of a website, following same-host links. Each page is extracted in the formats you specify.

Endpoint

POST /v1/crawl
curl -X POST https://api.browsr.dev/v1/crawl \
-H "Authorization: Bearer $BROWSR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"limit": 20,
"max_depth": 3,
"formats": ["markdown"]
}'

Request

FieldTypeDefaultDescription
urlstringrequiredStarting URL
limitnumber10Max pages (max: 100)
max_depthnumber2Max link depth
formatsstring[]["markdown"]Output formats per page
wait_fornumber0ms to wait per page
include_pathsstring[][]Glob patterns to include
exclude_pathsstring[][]Glob patterns to exclude
only_main_contentbooleantrueStrip nav/header/footer
json_optionsobjectJSON extraction (applied to every page)

Path filtering

curl -X POST https://api.browsr.dev/v1/crawl \
-H "Authorization: Bearer $BROWSR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"limit": 50,
"include_paths": ["/docs/**", "/api/**"],
"exclude_paths": ["/blog/**"],
"formats": ["markdown"]
}'

Response

{
"success": true,
"total": 15,
"completed": 15,
"data": [
{
"markdown": "# Getting Started\n\n...",
"metadata": {
"title": "Getting Started",
"sourceURL": "https://docs.example.com/getting-started",
"status_code": 200
}
}
]
}