Webscraper io
webscraper_ioWebScraper.IO is a web scraping tool that makes web data extraction easy and accessible for everyone through a cloud-based API.
Acciones disponibles (10)
Cada acción es una operación que el agente puede ejecutar contra este conector. Haz clic en una acción para ver sus parámetros.
Create SitemapWEBSCRAPER_IO_CREATE_SITEMAPAcciónTool to create a new sitemap configuration for web scraping. Use when you need to define a new scraping structure with start URLs and selector rules for data extraction from a website.
WEBSCRAPER_IO_CREATE_SITEMAPAcciónTool to create a new sitemap configuration for web scraping. Use when you need to define a new scraping structure with start URLs and selector rules for data extraction from a website.
Parámetros de entrada
startUrlstring[]ObligatorioArray of starting URLs where scraping begins. At least one URL is required.
selectorsobject[]ObligatorioArray of selector objects defining data extraction rules. Minimum one selector required.
sitemap_idstringObligatorioUnique identifier for the sitemap. Must be alphanumeric with hyphens (e.g., 'webscraper-io-landing').
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Delete SitemapWEBSCRAPER_IO_DELETE_SITEMAPAcciónTool to permanently delete a sitemap configuration from Web Scraper Cloud account. Use when you need to remove a sitemap that is no longer needed.
WEBSCRAPER_IO_DELETE_SITEMAPAcciónTool to permanently delete a sitemap configuration from Web Scraper Cloud account. Use when you need to remove a sitemap that is no longer needed.
Parámetros de entrada
sitemap_idintegerObligatorioThe unique identifier of the sitemap to delete
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Disable Sitemap SchedulerWEBSCRAPER_IO_DISABLE_SITEMAP_SCHEDULERAcciónTool to disable automatic scheduling for a sitemap. Use when you need to stop automated scraping jobs from running on a schedule.
WEBSCRAPER_IO_DISABLE_SITEMAP_SCHEDULERAcciónTool to disable automatic scheduling for a sitemap. Use when you need to stop automated scraping jobs from running on a schedule.
Parámetros de entrada
sitemap_idintegerObligatorioThe unique identifier of the sitemap whose scheduler should be disabled
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Enable Sitemap SchedulerWEBSCRAPER_IO_ENABLE_SITEMAP_SCHEDULERAcciónTool to enable and configure automatic scheduling for sitemap scraping jobs. Use when you need to automate scraping jobs to run at specific times using cron expressions with customizable request intervals, page load delays, driver types, and proxy settings.
WEBSCRAPER_IO_ENABLE_SITEMAP_SCHEDULERAcciónTool to enable and configure automatic scheduling for sitemap scraping jobs. Use when you need to automate scraping jobs to run at specific times using cron expressions with customizable request intervals, page load delays, driver types, and proxy settings.
Parámetros de entrada
proxyanyProxy configuration. Use format 'datacenter-{country_code}' (e.g., 'datacenter-us') or 'residential-{country_code}' (e.g., 'residential-us'), or 0 for no proxy, or 1 to use proxy, or proxy id for Scale plan users
driverstringObligatorioenumScraper driver type. 'fast' doesn't execute JavaScript and extracts from raw HTML. 'fulljs' is full driver with JavaScript execution
fastfulljscron_daystringObligatorioDay of month field of cron expression. Use '*' for any day, '1-31' for range, or specific values
cron_hourstringObligatorioHour field of cron expression. Use '*' for any hour, '0-23' for range, or specific values like '9,17'
cron_monthstringObligatorioMonth field of cron expression. Use '*' for any month, '1-12' for range, or specific values
sitemap_idintegerObligatorioThe unique identifier of the sitemap to enable scheduling for
cron_minutestringObligatorioMinute field of cron expression. Use '*' for any minute, '*/10' for every 10 minutes, or specific values like '0,15,30,45'
cron_weekdaystringObligatorioDay of week field of cron expression. Use '*' for any weekday, '0-6' for range (0=Sunday), or specific values
cron_timezonestringObligatorioTimezone for cron schedule using tz database format (e.g., 'Europe/Riga', 'America/New_York', 'Asia/Tokyo')
page_load_delayintegerObligatorioTime period in milliseconds that scraper will wait for the page to load before extracting data. Default is 2000ms (2 seconds)
request_intervalintegerObligatorioPage request interval in milliseconds. Default is 2000ms (2 seconds). Defines the delay between page requests during scraping
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Account InfoWEBSCRAPER_IO_GET_ACCOUNT_INFOAcciónTool to retrieve account information including email and page credits. Use when you need to check account details or available credits.
WEBSCRAPER_IO_GET_ACCOUNT_INFOAcciónTool to retrieve account information including email and page credits. Use when you need to check account details or available credits.
Parámetros de entrada
Sin parámetros.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Scraping JobsWEBSCRAPER_IO_GET_SCRAPING_JOBSAcciónTool to retrieve all scraping jobs for the account with optional filtering and pagination. Use when you need to list scraping jobs, check job status, or filter jobs by sitemap or tag.
WEBSCRAPER_IO_GET_SCRAPING_JOBSAcciónTool to retrieve all scraping jobs for the account with optional filtering and pagination. Use when you need to list scraping jobs, check job status, or filter jobs by sitemap or tag.
Parámetros de entrada
tagstringFilter jobs by tag name. Use to retrieve jobs with a specific tag.
pageintegerPage number for pagination. Use to retrieve specific page of results.
sitemap_idintegerFilter jobs by specific sitemap ID. Use to retrieve jobs for a particular sitemap.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get SitemapWEBSCRAPER_IO_GET_SITEMAPAcciónTool to retrieve a specific sitemap configuration by ID. Use when you need to inspect or reference an existing sitemap's configuration.
WEBSCRAPER_IO_GET_SITEMAPAcciónTool to retrieve a specific sitemap configuration by ID. Use when you need to inspect or reference an existing sitemap's configuration.
Parámetros de entrada
sitemap_idintegerObligatorioThe numeric identifier of the sitemap to retrieve
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get SitemapsWEBSCRAPER_IO_GET_SITEMAPSAcciónTool to retrieve all sitemaps for the authenticated account with pagination support. Use when you need to list available sitemaps or filter them by tag. Supports optional pagination via page parameter and filtering by tag name.
WEBSCRAPER_IO_GET_SITEMAPSAcciónTool to retrieve all sitemaps for the authenticated account with pagination support. Use when you need to list available sitemaps or filter them by tag. Supports optional pagination via page parameter and filtering by tag name.
Parámetros de entrada
tagstringFilter sitemaps by tag name to retrieve only sitemaps with a specific tag.
pageintegerPage number for pagination (e.g., 2 for the second page).
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Sitemap SchedulerWEBSCRAPER_IO_GET_SITEMAP_SCHEDULERAcciónTool to retrieve scheduler configuration for a sitemap. Use when you need to check scheduling settings including cron configuration and proxy settings.
WEBSCRAPER_IO_GET_SITEMAP_SCHEDULERAcciónTool to retrieve scheduler configuration for a sitemap. Use when you need to check scheduling settings including cron configuration and proxy settings.
Parámetros de entrada
sitemap_idintegerObligatorioThe unique identifier of the sitemap
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Update SitemapWEBSCRAPER_IO_UPDATE_SITEMAPAcciónTool to update an existing sitemap configuration including structure, URLs, and selectors. Use when you need to modify sitemap settings.
WEBSCRAPER_IO_UPDATE_SITEMAPAcciónTool to update an existing sitemap configuration including structure, URLs, and selectors. Use when you need to modify sitemap settings.
Parámetros de entrada
_idstringObligatorioInternal identifier for the sitemap
startUrlstring[]ObligatorioArray of URLs where scraping begins
selectorsobject[]ObligatorioArray of selector objects defining data extraction rules
sitemap_idintegerObligatorioThe unique identifier of the sitemap to update
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not