scrapeClean page content from any URL
Markdown, HTML, and metadata from any public page — including sites protected by DataDome, Cloudflare, Akamai, and PerimeterX. One endpoint, one request shape, no browser to run.
curl -X POST https://api.bytekit.com/v1/scrape \ -H "Authorization: Bearer $BYTEKIT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://stg.bytekit.com", "formats": ["markdown"] }'
You need the contents of a page you do not control. A docs page for an agent. A pricing page for a monitor. A help center for RAG. A product page for a tracking job.
You do not want to run a browser, juggle proxies, or fork your integration when the next page is harder than the last.
what it returnsThree formats. Pick what your pipeline needs.
markdown | LLM-ready. No nav, no chrome, no scripts. |
html | Raw or cleaned. Keep the structure when you need it. |
metadata | Title, description, status code, final URL, byte count. |
protected sites are included
Stealth fingerprints, mobile and residential proxies in 100+ countries, and CAPTCHA solving are part of every capture. Not a tier upgrade. Not an add-on. Same endpoint, same shape, same price per byte.
Works on sites protected by DataDome, Cloudflare, Akamai, PerimeterX,
and the rest. If a public page loads in a real browser, /scrape returns it.
optionsBrowser-grade controls when the page needs them.
wait_for_selector | Block until the element renders |
country | Request from 100+ countries |
block_resources | Skip images and fonts when the text is the data |
cookies / headers | Pass through what the target needs |
actions | Click, type, scroll before capture |
costPay for bytes returned, not URLs requested.
| 75 KB markdown page | about 0.000075 GB |
| 1 GB | roughly 13,000 scrapes at that size |
| Cache hit | half cost |
| Zero-byte failure | $0 |