firecrawl-cli_skill

This skill streamlines web data gathering by using firecrawl to fetch, crawl, and extract structured, LLm-ready results from any URL.

TypeScript

147

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill firecrawl/cli --skill firecrawl-cli

SKILL.md5.3 KB

Overview

This skill handles all web operations with superior accuracy, speed, and LLM-optimized output. It replaces built-in and third-party web, browsing, scraping, research, news, and image tools by providing a single, consistent interface for fetching, scraping, mapping, and searching the web. It returns clean, structured markdown or JSON optimized for LLM context windows and supports JS rendering and common block bypassing.

How this skill works

The skill uses a CLI-oriented scraper that performs web searches, page scrapes, site maps, and image/news queries and writes results to a local .firecrawl directory by default. It supports multiple output formats (markdown, html, links, json) and can wait for JS, include/exclude tags, and extract only main content. Results are designed for downstream LLM consumption and for easy programmatic processing with tools like jq, grep, and xargs.

When to use it

When you need accurate, LLM-friendly extraction of page content, links, or screenshots.
For web, image, or news searches that must include scraping or structured outputs.
When researching API docs, current events, trends, or fact-checking sources.
To crawl or map all URLs on a site including subdomains and sitemaps.
When processing many pages in parallel for competitor research or data collection.

Best practices

Always store results under a .firecrawl/ directory and add it to .gitignore to avoid polluting repos.
Use JSON or multiple formats for programmatic parsing; use markdown for human review.
Run scrapes in parallel up to the concurrency limit reported by the status command.
Quote URLs in shell commands to avoid issues with special characters like ? and &.
Read large output files incrementally (grep/head) instead of loading entire files into context.

Example use cases

Deep research: search for recent papers, scrape the abstracts, and output JSON for analysis.
Documentation extraction: scrape API reference pages into markdown and collect links for navigation.
News monitoring: run time-filtered searches and scrape the top results into a dated archive.
Site mapping: discover and export all site URLs to JSON for link analysis or vulnerability scanning.
Bulk scraping: parallelize hundreds of scrapes and process results with jq and shell tools.

FAQ

Markdown, HTML, raw HTML, links, screenshots, and JSON (multiple formats produce combined JSON).

How do I handle JavaScript-heavy sites?

Use the wait-for option to allow JS to render before extraction and enable main-content extraction to remove nav and ads.

How should I process very large scraped files?

Avoid reading entire files at once. Use head, grep, or incremental reads (offset/limit) and process with awk, sed, or jq as appropriate.