Firecrawl
firecrawlFirecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale
Acciones disponibles (7)
Cada acción es una operación que el agente puede ejecutar contra este conector. Haz clic en una acción para ver sus parámetros.
Cancel a crawl jobFIRECRAWL_CANCEL_A_CRAWL_JOBAcciónCancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.
FIRECRAWL_CANCEL_A_CRAWL_JOBAcciónCancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.
Parámetros de entrada
idstringObligatorioThe unique identifier (UUID) of the crawl job to be canceled.
Parámetros de salida
dataobjectObligatorioerrorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Start a web crawlFIRECRAWL_CRAWLAcciónInitiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.
FIRECRAWL_CRAWLAcciónInitiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.
Parámetros de entrada
urlstringObligatorioThe base URL to start crawling from. This is the initial entry point for the web crawler.
delayintegerDelay in milliseconds between requests to avoid overwhelming the server
limitintegerMaximum number of pages to crawl. The crawl will stop once this limit is reached. Default is 10.
webhookstringAn optional webhook URL to receive real-time updates on the crawl job. Events include crawl start (`crawl.started`), page crawled (`crawl.page`), and crawl completion (`crawl.completed` or `crawl.failed`). The payload structure matches the `/scrape` endpoint response.
maxDepthintegerMaximum depth of subpages to crawl relative to the base URL. A depth of 0 crawls only the base URL, 1 crawls the base URL and its direct links, etc. Default is 2.
excludePathsstring[]A list of Regular Expression (regex) patterns for URL paths to exclude from the crawl. URLs whose paths match any of these patterns will be ignored. For example, `"blog/archive/.*"` would exclude all paths under `/blog/archive/`.
includePathsstring[]A list of Regular Expression (regex) patterns for URL paths to include in the crawl. Only URLs whose paths match one of these patterns will be processed. For example, `"products/featured/.*"` would only include paths under `/products/featured/`.
ignoreSitemapbooleanIf true, the crawler will ignore any sitemap.xml found on the website. Defaults to true.
maxDiscoveryDepthintegerMaximum depth for discovering new links, separate from crawling depth
allowBackwardLinksbooleanIf true, allows the crawler to navigate to pages that were linked from pages already visited (i.e., navigate 'backwards'). Defaults to false.
allowExternalLinksbooleanIf true, allows the crawler to follow links that lead to external websites (different domains). Defaults to false.
scrapeOptions_proxystringProxy configuration for requests
scrapeOptions_maxAgeintegerMaximum age in seconds for cached content. If content is older than this, it will be re-scraped
scrapeOptions_mobilebooleanIf true, emulate a mobile device when scraping
ignoreQueryParametersbooleanIf true, ignore query parameters when determining if a URL has been visited
scrapeOptions_actionsobject[]List of actions to perform on each page before scraping (e.g., clicking buttons, waiting)
scrapeOptions_formatsstring[]Specifies the desired output formats for the scraped content from each page. Default is `["markdown"]`. If format is json, jsonOptions is required.
scrapeOptions_headersobjectCustom HTTP headers to send with each request
scrapeOptions_timeoutintegerTimeout in milliseconds for each page request. Default is 30000ms (30 seconds)
scrapeOptions_waitForintegerThe duration in milliseconds to wait for page JavaScript to execute and content to load before scraping. Useful for pages with dynamically loaded content. Default is 123ms.
scrapeOptions_blockAdsbooleanIf true, block advertisements during scraping
scrapeOptions_locationobjectGeolocation settings for the scraper
scrapeOptions_parsePDFbooleanIf true, attempt to parse PDF files encountered during crawling
scrapeOptions_excludeTagsstring[]A list of HTML tags to exclude from the scraped output. Content within these tags (and their children) will be removed before processing.
scrapeOptions_includeTagsstring[]A list of HTML tags to specifically include in the scraped output. Only content within these tags will be processed. If empty or null, all relevant content is considered based on other options.
scrapeOptions_jsonOptionsobjectOptions for JSON format extraction including schema and prompts
scrapeOptions_storeInCachebooleanIf true, store scraped content in cache for future use
scrapeOptions_onlyMainContentbooleanIf true, attempts to extract only the main content of each page, excluding common elements like headers, navigation bars, and footers. Default is true.
scrapeOptions_removeBase64ImagesbooleanIf true, remove base64-encoded images from the scraped content
scrapeOptions_skipTlsVerificationbooleanIf true, skip TLS certificate verification
scrapeOptions_changeTrackingOptionsobjectOptions for tracking changes between crawls
Parámetros de salida
dataobjectObligatorioA dictionary containing the crawled data. This typically includes a job ID, status, and an array of page data if the crawl is complete and successful. The structure can vary based on the crawl outcome (e.g., success, failure, ongoing).
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Extract structured dataFIRECRAWL_EXTRACTAcciónExtracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).
FIRECRAWL_EXTRACTAcciónExtracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).
Parámetros de entrada
urlsstring[]ObligatorioA list of URLs from which to extract data. Wildcards (e.g., `https://example.com/blog/*`) can be used for crawling multiple pages under a specific path.
promptstringNatural language query for information to extract from URL content. E.g., 'Extract the company mission, whether it supports SSO, etc.'
schemaobjectJSON object defining the desired structure for extracted data (e.g., field names, types). Dictates output format.
enable_web_searchbooleanIf `True`, allows crawling links outside initial domains in `urls`; if `False`, restricts to same domains.
Parámetros de salida
dataobjectObligatorioA dictionary containing the structured data extracted from the URLs. The structure of this data will conform to the provided `schema` or the LLM's interpretation of the `prompt`.
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get the status of a crawl jobFIRECRAWL_GET_THE_STATUS_OF_A_CRAWL_JOBAcciónRetrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.
FIRECRAWL_GET_THE_STATUS_OF_A_CRAWL_JOBAcciónRetrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.
Parámetros de entrada
idstringObligatorioUnique identifier (UUID) of the crawl job.
Parámetros de salida
dataobjectObligatorioDetails of the crawl job, including `status` (e.g., "scraping", "completed", "failed"), `total` pages attempted, `completed` successfully crawled pages, `creditsUsed`, and `expiresAt` (data expiration timestamp).
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Map multiple URLsFIRECRAWL_MAP_MULTIPLE_URLS_BASED_ON_OPTIONSAcciónMaps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.
FIRECRAWL_MAP_MULTIPLE_URLS_BASED_ON_OPTIONSAcciónMaps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.
Parámetros de entrada
urlstringObligatoriouriThe primary base URL from which the mapping process will begin.
limitintegerMaximum number of unique links/pages to discover and return; helps control mapping scope and duration.
searchstringOptional search query to guide URL mapping, prioritizing or finding specific page types. 'Smart' search is limited to 1000 initial results in Alpha, but overall mapping can exceed this.
ignoreSitemapbooleanIf true, the crawler ignores sitemap.xml files, relying on page links for discovery.
includeSubdomainsbooleanIf true, includes subdomains of the base URL in the mapping. E.g., if `url` is example.com, blog.example.com is mapped.
Parámetros de salida
dataobjectObligatorioDictionary containing the URL mapping results, typically a list of discovered URLs or a structured sitemap representation.
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Scrape URLFIRECRAWL_SCRAPEAcciónScrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.
FIRECRAWL_SCRAPEAcciónScrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.
Parámetros de entrada
urlstringObligatorioThe fully qualified URL of the web page to scrape.
actionsobject[]An optional list of browser actions (e.g., click, type, wait) to perform on the page *before* scraping occurs. Useful for interacting with dynamic content, filling forms, or navigating through page elements.
formatsstring[]A list of desired output formats for the scraped content. Defaults to ['markdown']. If `json` is included, `jsonOptions` *must* be provided.
timeoutintegerMaximum time in milliseconds to wait for the scraping request to complete. Defaults to 30000.
waitForintegerTime in milliseconds to wait for the page to load or for dynamic content to render before starting the scrape. Defaults to 0.
locationobjectLocation settings for the request
excludeTagsstring[]A list of HTML tags to specifically exclude from the output. Content within these tags will be removed.
includeTagsstring[]A list of HTML tags to specifically include in the output. Content within these tags will be prioritized.
jsonOptionsobjectOptions for JSON extraction
onlyMainContentbooleanIf true, attempts to extract only the main article content, excluding headers, footers, navigation bars, and ads. Defaults to true.
Parámetros de salida
dataobjectObligatorioA dictionary containing the scraped data. Keys correspond to the requested formats (e.g., 'markdown', 'html', 'json', 'screenshot'), and values are the extracted content or metadata for those formats.
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
SearchFIRECRAWL_SEARCHAcciónPerforms a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.
FIRECRAWL_SEARCHAcciónPerforms a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.
Parámetros de entrada
langstringLanguage code for search results (e.g., 'en' for English, default 'en').
limitintegerMaximum number of search results to return (1-10, default 5).
querystringObligatorioThe search query to execute.
countrystringCountry code to tailor search results (e.g., 'us' for United States, default 'us').
formatsstring[]Desired output formats for scraped content of each search result (e.g., 'markdown', 'html'). If None, default scraping applies. Available: 'markdown', 'html', 'rawHtml', 'links', 'screenshot', 'screenshot@fullPage'.
timeoutintegerMaximum time in milliseconds for search and scrape operations (1000-300000, default 60000).
Parámetros de salida
dataobject[]List of search result items, each with details and potentially scraped content.
errorstringError if any occurred during the execution of the action
successbooleanIndicates if the overall search operation was successful.
warningstringOptional warning message about the search operation.
successfulbooleanObligatorioWhether or not the action execution was successful or not