WebScraping.AI
webscraping_aiWebScraping.AI provides an API for web scraping with features like Chrome JS rendering, rotating proxies, and HTML parsing.
Acciones disponibles (4)
Cada acción es una operación que el agente puede ejecutar contra este conector. Haz clic en una acción para ver sus parámetros.
Get account usage and quotaWEBSCRAPING_AI_ACCOUNT_INFOAcciónTool to retrieve account api call quota and usage. use when checking remaining requests and subscription details.
WEBSCRAPING_AI_ACCOUNT_INFOAcciónTool to retrieve account api call quota and usage. use when checking remaining requests and subscription details.
Parámetros de entrada
Sin parámetros.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Retrieve HTML ContentWEBSCRAPING_AI_GET_HTMLAcciónTool to retrieve html content of a web page. use when you need raw page html, optionally rendered with javascript.
WEBSCRAPING_AI_GET_HTMLAcciónTool to retrieve html content of a web page. use when you need raw page html, optionally rendered with javascript.
Parámetros de entrada
jsbooleanWhether to render JavaScript before fetching.
urlstringObligatoriouriThe target URL to scrape.
proxystringProxy location/country code, e.g., 'us', 'de'.
devicestringenumDevice type to spoof the user-agent as.
mobiledesktopcookiesobjectCustom cookies to include in the browser session.
headersobjectCustom HTTP headers to include in the request.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Rendered HTMLWEBSCRAPING_AI_GET_RENDERED_HTMLAcciónTool to retrieve fully rendered html of a webpage. use when js-generated content must be included.
WEBSCRAPING_AI_GET_RENDERED_HTMLAcciónTool to retrieve fully rendered html of a webpage. use when js-generated content must be included.
Parámetros de entrada
jsstringBase64-encoded JavaScript to execute after rendering.
urlstringObligatoriouriThe target URL to render and fetch HTML.
waitintegerWait time before capture, in milliseconds.
devicestringenumBrowser device mode to simulate.
desktopmobilelocalestringBrowser locale (RFC5646 code).
cookiesstringCookies in 'key1=value1; key2=value2;' format.
headersobjectExtra HTTP headers as JSON object.
refererstringuriReferer header value.
timeoutintegerRequest timeout, in milliseconds.
proxy_typestringenumProxy type to use for the request.
datacenterresidentialuser_agentstringCustom User-Agent string.
disable_imagesbooleanWhether to disable image loading.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get TextWEBSCRAPING_AI_GET_TEXTAcciónTool to retrieve raw text content from a specified web page. use when you need plain text extraction from a url.
WEBSCRAPING_AI_GET_TEXTAcciónTool to retrieve raw text content from a specified web page. use when you need plain text extraction from a url.
Parámetros de entrada
urlstringObligatorioThe target URL to scrape text from.
proxystringenumProxy region to use for the request (e.g., 'us' or 'eu').
useulocalestringBrowser locale/language (e.g., 'en-US').
sessionstringSession ID for preserving cookies across multiple calls.
timeoutintegerRequest timeout in seconds (must be >= 1).
render_jsbooleanWhether to render JavaScript on the page before extracting text.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not