Firecrawl

firecrawl

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

AI Web Scraping Developer Tools Web del proveedor

Acciones

Triggers

Autenticación

—

OAuth gestionado

Información técnica: el detalle de parámetros, schemas y triggers de esta página está pensado para equipos de integración. Si solo necesitas saber si tu herramienta favorita está disponible, basta con ver la lista de acciones.

Acciones disponibles (7)

Cada acción es una operación que el agente puede ejecutar contra este conector. Haz clic en una acción para ver sus parámetros.

Cancel a crawl jobFIRECRAWL_CANCEL_A_CRAWL_JOBAcción

Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Parámetros de entrada

idstringObligatorio
The unique identifier (UUID) of the crawl job to be canceled.

Parámetros de salida

dataobjectObligatorio
errorstring
Error if any occurred during the execution of the action
successfulbooleanObligatorio
Whether or not the action execution was successful or not

Start a web crawlFIRECRAWL_CRAWLAcción

Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.

Parámetros de entrada

urlstringObligatorio
The base URL to start crawling from. This is the initial entry point for the web crawler.
delayinteger
Delay in milliseconds between requests to avoid overwhelming the server
limitinteger
Maximum number of pages to crawl. The crawl will stop once this limit is reached. Default is 10.
webhookstring
An optional webhook URL to receive real-time updates on the crawl job. Events include crawl start (`crawl.started`), page crawled (`crawl.page`), and crawl completion (`crawl.completed` or `crawl.failed`). The payload structure matches the `/scrape` endpoint response.
maxDepthinteger
Maximum depth of subpages to crawl relative to the base URL. A depth of 0 crawls only the base URL, 1 crawls the base URL and its direct links, etc. Default is 2.
excludePathsstring[]
A list of Regular Expression (regex) patterns for URL paths to exclude from the crawl. URLs whose paths match any of these patterns will be ignored. For example, `"blog/archive/.*"` would exclude all paths under `/blog/archive/`.
includePathsstring[]
A list of Regular Expression (regex) patterns for URL paths to include in the crawl. Only URLs whose paths match one of these patterns will be processed. For example, `"products/featured/.*"` would only include paths under `/products/featured/`.
ignoreSitemapboolean
If true, the crawler will ignore any sitemap.xml found on the website. Defaults to true.
maxDiscoveryDepthinteger
Maximum depth for discovering new links, separate from crawling depth
allowBackwardLinksboolean
If true, allows the crawler to navigate to pages that were linked from pages already visited (i.e., navigate 'backwards'). Defaults to false.
allowExternalLinksboolean
If true, allows the crawler to follow links that lead to external websites (different domains). Defaults to false.
scrapeOptions_proxystring
Proxy configuration for requests
scrapeOptions_maxAgeinteger
Maximum age in seconds for cached content. If content is older than this, it will be re-scraped
scrapeOptions_mobileboolean
If true, emulate a mobile device when scraping
ignoreQueryParametersboolean
If true, ignore query parameters when determining if a URL has been visited
scrapeOptions_actionsobject[]
List of actions to perform on each page before scraping (e.g., clicking buttons, waiting)
scrapeOptions_formatsstring[]
Specifies the desired output formats for the scraped content from each page. Default is `["markdown"]`. If format is json, jsonOptions is required.
scrapeOptions_headersobject
Custom HTTP headers to send with each request
scrapeOptions_timeoutinteger
Timeout in milliseconds for each page request. Default is 30000ms (30 seconds)
scrapeOptions_waitForinteger
The duration in milliseconds to wait for page JavaScript to execute and content to load before scraping. Useful for pages with dynamically loaded content. Default is 123ms.
scrapeOptions_blockAdsboolean
If true, block advertisements during scraping
scrapeOptions_locationobject
Geolocation settings for the scraper
scrapeOptions_parsePDFboolean
If true, attempt to parse PDF files encountered during crawling
scrapeOptions_excludeTagsstring[]
A list of HTML tags to exclude from the scraped output. Content within these tags (and their children) will be removed before processing.
scrapeOptions_includeTagsstring[]
A list of HTML tags to specifically include in the scraped output. Only content within these tags will be processed. If empty or null, all relevant content is considered based on other options.
scrapeOptions_jsonOptionsobject
Options for JSON format extraction including schema and prompts
scrapeOptions_storeInCacheboolean
If true, store scraped content in cache for future use
scrapeOptions_onlyMainContentboolean
If true, attempts to extract only the main content of each page, excluding common elements like headers, navigation bars, and footers. Default is true.
scrapeOptions_removeBase64Imagesboolean
If true, remove base64-encoded images from the scraped content
scrapeOptions_skipTlsVerificationboolean
If true, skip TLS certificate verification
scrapeOptions_changeTrackingOptionsobject
Options for tracking changes between crawls

Parámetros de salida

dataobjectObligatorio
A dictionary containing the crawled data. This typically includes a job ID, status, and an array of page data if the crawl is complete and successful. The structure can vary based on the crawl outcome (e.g., success, failure, ongoing).
errorstring
Error if any occurred during the execution of the action
successfulbooleanObligatorio
Whether or not the action execution was successful or not

Extract structured dataFIRECRAWL_EXTRACTAcción

Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).

Parámetros de entrada

urlsstring[]Obligatorio
A list of URLs from which to extract data. Wildcards (e.g., `https://example.com/blog/*`) can be used for crawling multiple pages under a specific path.
promptstring
Natural language query for information to extract from URL content. E.g., 'Extract the company mission, whether it supports SSO, etc.'
schemaobject
JSON object defining the desired structure for extracted data (e.g., field names, types). Dictates output format.
enable_web_searchboolean
If `True`, allows crawling links outside initial domains in `urls`; if `False`, restricts to same domains.

Parámetros de salida

dataobjectObligatorio
A dictionary containing the structured data extracted from the URLs. The structure of this data will conform to the provided `schema` or the LLM's interpretation of the `prompt`.
errorstring
Error if any occurred during the execution of the action
successfulbooleanObligatorio
Whether or not the action execution was successful or not

Get the status of a crawl jobFIRECRAWL_GET_THE_STATUS_OF_A_CRAWL_JOBAcción

Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.

Parámetros de entrada

idstringObligatorio
Unique identifier (UUID) of the crawl job.

Parámetros de salida

dataobjectObligatorio
Details of the crawl job, including `status` (e.g., "scraping", "completed", "failed"), `total` pages attempted, `completed` successfully crawled pages, `creditsUsed`, and `expiresAt` (data expiration timestamp).
errorstring
Error if any occurred during the execution of the action
successfulbooleanObligatorio
Whether or not the action execution was successful or not

Map multiple URLsFIRECRAWL_MAP_MULTIPLE_URLS_BASED_ON_OPTIONSAcción

Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Parámetros de entrada

urlstringObligatoriouri
The primary base URL from which the mapping process will begin.
limitinteger
Maximum number of unique links/pages to discover and return; helps control mapping scope and duration.
searchstring
Optional search query to guide URL mapping, prioritizing or finding specific page types. 'Smart' search is limited to 1000 initial results in Alpha, but overall mapping can exceed this.
ignoreSitemapboolean
If true, the crawler ignores sitemap.xml files, relying on page links for discovery.
includeSubdomainsboolean
If true, includes subdomains of the base URL in the mapping. E.g., if `url` is example.com, blog.example.com is mapped.

Parámetros de salida

dataobjectObligatorio
Dictionary containing the URL mapping results, typically a list of discovered URLs or a structured sitemap representation.
errorstring
Error if any occurred during the execution of the action
successfulbooleanObligatorio
Whether or not the action execution was successful or not

Scrape URLFIRECRAWL_SCRAPEAcción

Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.

Parámetros de entrada

urlstringObligatorio
The fully qualified URL of the web page to scrape.
actionsobject[]
An optional list of browser actions (e.g., click, type, wait) to perform on the page *before* scraping occurs. Useful for interacting with dynamic content, filling forms, or navigating through page elements.
formatsstring[]
A list of desired output formats for the scraped content. Defaults to ['markdown']. If `json` is included, `jsonOptions` *must* be provided.
timeoutinteger
Maximum time in milliseconds to wait for the scraping request to complete. Defaults to 30000.
waitForinteger
Time in milliseconds to wait for the page to load or for dynamic content to render before starting the scrape. Defaults to 0.
locationobject
Location settings for the request
excludeTagsstring[]
A list of HTML tags to specifically exclude from the output. Content within these tags will be removed.
includeTagsstring[]
A list of HTML tags to specifically include in the output. Content within these tags will be prioritized.
jsonOptionsobject
Options for JSON extraction
onlyMainContentboolean
If true, attempts to extract only the main article content, excluding headers, footers, navigation bars, and ads. Defaults to true.

Parámetros de salida

dataobjectObligatorio
A dictionary containing the scraped data. Keys correspond to the requested formats (e.g., 'markdown', 'html', 'json', 'screenshot'), and values are the extracted content or metadata for those formats.
errorstring
Error if any occurred during the execution of the action
successfulbooleanObligatorio
Whether or not the action execution was successful or not

SearchFIRECRAWL_SEARCHAcción

Performs a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.

Parámetros de entrada

langstring
Language code for search results (e.g., 'en' for English, default 'en').
limitinteger
Maximum number of search results to return (1-10, default 5).
querystringObligatorio
The search query to execute.
countrystring
Country code to tailor search results (e.g., 'us' for United States, default 'us').
formatsstring[]
Desired output formats for scraped content of each search result (e.g., 'markdown', 'html'). If None, default scraping applies. Available: 'markdown', 'html', 'rawHtml', 'links', 'screenshot', 'screenshot@fullPage'.
timeoutinteger
Maximum time in milliseconds for search and scrape operations (1000-300000, default 60000).

Parámetros de salida

dataobject[]
List of search result items, each with details and potentially scraped content.
errorstring
Error if any occurred during the execution of the action
successboolean
Indicates if the overall search operation was successful.
warningstring
Optional warning message about the search operation.
successfulbooleanObligatorio
Whether or not the action execution was successful or not