firecrawl-automation_skill

This skill lets you run Firecrawl web crawling and extraction from Claude Code to scrape pages, crawl sites, and map website structures.

Python

35.4k

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill composiohq/awesome-claude-skills --skill firecrawl-automation

SKILL.md7.7 KB

Overview

This skill automates web crawling and structured data extraction using the Composio Firecrawl integration. It lets you scrape single pages, crawl entire sites, batch-process URL lists, extract JSON with AI-guided schemas, and map website structures from a terminal or agent workflow. The focus is on efficient, configurable crawls that minimize wasted credits and surface clean, structured outputs.

How this skill works

The skill exposes Firecrawl toolkit actions (scrape, crawl, extract, batch scrape, map, and job management) as callable tools with clear parameters. You provide URLs, formats, prompts or JSON schemas, and optional browser actions; the agent starts async jobs and returns job IDs so you can poll status, retrieve results, or cancel runs. It includes options for main-content extraction, concurrency, path filters, sitemap handling, and JS interaction for dynamic pages.

When to use it

Scraping a single page to get clean markdown, HTML, or screenshots.
Crawling a docs site or section to build a site index or collect pages for analysis.
Extracting structured JSON from pricing, product pages, or directories using a prompt or JSON Schema.
Batch-scraping many URLs at once with concurrency and ad-blocking to save time.
Mapping all URLs on a domain to plan crawl scopes, audits, or sitemap generation.

Best practices

Scope crawls narrowly: use include/exclude path regex and sitemap options to avoid broad homepage crawls.
Test on small URL sets and freeze your extraction schema before scaling to reduce noisy results.
Prefer batch scraping over many individual scrapes and implement exponential backoff on 429/5xx responses.
Use waitFor or scripted browser actions for JS-heavy pages to capture dynamic content reliably.
Poll async job endpoints (crawl/extract GET) and cancel long-running jobs to avoid credit waste.

Example use cases

Scrape a product pricing page as markdown and extract tiered pricing into JSON for comparison.
Crawl the docs section of a site (limit 50 pages) to generate an internal search index or knowledge base.
Batch-scrape 100 product pages concurrently with ad blocking and save markdown outputs for content auditing.
Extract company names, features, and pricing from a set of competitor pages using a defined JSON Schema.
Map all API reference URLs on docs.example.com to plan a targeted documentation migration.

FAQ

Use FIRECRAWL_BATCH_SCRAPE instead of many individual scrapes, limit concurrency, add exponential backoff on 429/5xx, and batch extract URLs smaller (~10 per job).

What if pages are JS-heavy and content is missing?

Enable waitFor (1000–5000ms) or use scrape actions (click, scroll, write) to trigger rendering before capture; consider full browser actions for complex interactions.