Diffbot
diffbotDiffbot provides AI-powered tools to extract and structure data from web pages, transforming unstructured web content into structured, linked data.
Acciones disponibles (14)
Cada acción es una operación que el agente puede ejecutar contra este conector. Haz clic en una acción para ver sus parámetros.
Diffbot SearchDIFFBOT_DIFFBOT_SEARCHAcciónTool to search data extracted by crawl or bulk jobs using dql queries. use after data extraction jobs complete to retrieve search results.
DIFFBOT_DIFFBOT_SEARCHAcciónTool to search data extracted by crawl or bulk jobs using dql queries. use after data extraction jobs complete to retrieve search results.
Parámetros de entrada
sizeintegerNumber of results to return (max 1000). If omitted, defaults to 10.
sortstringField name to sort results by; prefix with '-' for descending order.
querystringObligatorioStructured search query in DQL format, specifying conditions to match documents.
fieldsstringComma-separated list of fields to include in the response; all if omitted.
offsetintegerOffset for pagination (API param 'from'; default=0).
excludestringComma-separated list of fields to exclude from the response.
explainbooleanReturn detailed query explanation (default=false).
indicesstringComma-separated list of indices to search (default="public").
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Diffbot Account DetailsDIFFBOT_GET_ACCOUNTAcciónTool to retrieve account details, including plan information and usage statistics. use after authenticating to verify subscription and daily quota status.
DIFFBOT_GET_ACCOUNTAcciónTool to retrieve account details, including plan information and usage statistics. use after authenticating to verify subscription and daily quota status.
Parámetros de entrada
Sin parámetros.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Diffbot AnalyzeDIFFBOT_GET_ANALYZEAcciónTool to automatically determine a page's content type and route it to the appropriate extraction api. use when you have only a url and need diffbot to choose the right extractor.
DIFFBOT_GET_ANALYZEAcciónTool to automatically determine a page's content type and route it to the appropriate extraction api. use when you have only a url and need diffbot to choose the right extractor.
Parámetros de entrada
urlstringObligatorioThe full URL of the page to analyze, including http:// or https://
fieldsstringComma-separated list of fields to limit the output fields and reduce response size.
callbackstringOptional JSONP callback function name. If set, the API returns JSONP-wrapped response.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Article DataDIFFBOT_GET_ARTICLEAcciónTool to extract information from articles, including authors, publication dates, and images. use when you need structured metadata from a web article url.
DIFFBOT_GET_ARTICLEAcciónTool to extract information from articles, including authors, publication dates, and images. use when you need structured metadata from a web article url.
Parámetros de entrada
urlstringObligatorioFull URL of the web page to analyze, must start with http or https
modestringExtraction mode override (defaults to 'article')
statsbooleanWhether to include statistics like word count
fieldsstring[]List of specific fields to include in the response. If provided, only these fields are returned.
pagingstringPaging token for multi-page articles (returned in previous response)
timeoutintegerMaximum time in milliseconds to wait for page rendering
callbackstringName of the JSONP callback function (if using JSONP)
discussionbooleanWhether to include discussion/comment data in the response
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Discussion ThreadDIFFBOT_GET_DISCUSSIONAcciónTool to extract threads of content from forums, comment sections, and review pages. use when you need structured discussion data from web pages after identifying the discussion url.
DIFFBOT_GET_DISCUSSIONAcciónTool to extract threads of content from forums, comment sections, and review pages. use when you need structured discussion data from web pages after identifying the discussion url.
Parámetros de entrada
urlstringObligatorioThe URL of the discussion page to process.
fieldsstringComma-separated list of fields to include in the response.
maxPagesintegerMaximum number of pages to concatenate; set to 'all' for all pages.
norenderbooleanSet to True to disable full page rendering for faster responses.
discussionbooleanSet to False to disable comment extraction.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Diffbot Get EventDIFFBOT_GET_EVENTAcciónTool to extract event details from web pages. use when you need structured event data such as venue, date, and description.
DIFFBOT_GET_EVENTAcciónTool to extract event details from web pages. use when you need structured event data such as venue, date, and description.
Parámetros de entrada
urlstringObligatoriouriURL of the event page to analyze
fieldsstringComma-separated list of fields to return, e.g., title,date,location
pagingbooleanEnable automatic paging of results
timeoutintegerMaximum timeout in milliseconds for the API call
callbackstringJSONP callback function name, if JSONP output is needed
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Diffbot Get ImageDIFFBOT_GET_IMAGEAcciónTool to extract detailed information about images, including dimensions and recognition data. use after confirming the image url is publicly accessible.
DIFFBOT_GET_IMAGEAcciónTool to extract detailed information about images, including dimensions and recognition data. use after confirming the image url is publicly accessible.
Parámetros de entrada
urlstringObligatoriouriPublicly-accessible URL of the image to analyze
fieldsstring[]Comma-separated list or array of specific fields to include in response, e.g., 'naturalWidth','captions'
pagingbooleanWhether to include paging information for multi-image responses
timeoutintegerMaximum time to wait for API response, in milliseconds
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Diffbot Get ProductDIFFBOT_GET_PRODUCTAcciónTool to extract product information such as specifications, prices, availability, and reviews. use when you need structured product data including specs, pricing, and reviews.
DIFFBOT_GET_PRODUCTAcciónTool to extract product information such as specifications, prices, availability, and reviews. use when you need structured product data including specs, pricing, and reviews.
Parámetros de entrada
urlstringObligatoriouriURL of the product page to analyze
modestringExtraction mode override (defaults to 'product')
fieldsstring[]List of fields to return, e.g., title,offerPrice,images
pagingbooleanEnable automatic paging of results
timeoutintegerMaximum timeout in milliseconds for the API call
callbackstringJSONP callback function name, if JSONP output is needed
discussionbooleanInclude discussions/comments in the response
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Get Video DataDIFFBOT_GET_VIDEOAcciónTool to extract information from videos, including titles, descriptions, and embedded html. use when you need structured video metadata from any web page.
DIFFBOT_GET_VIDEOAcciónTool to extract information from videos, including titles, descriptions, and embedded html. use when you need structured video metadata from any web page.
Parámetros de entrada
urlstringObligatorioFull URL of the web page to analyze for embedded videos, must start with http or https
modestringExtraction mode override (e.g., 'auto')
fieldsstring[]List of specific fields to include in the response; if provided, only these fields are returned
pagingbooleanWhether to return all detected results in one call (may increase runtime)
timeoutintegerMaximum time in milliseconds to wait for extraction
callbackstringName of the JSONP callback function (if using JSONP)
fallbackbooleanWhether to try an alternate extraction method if the primary fails
discussionbooleanInclude user discussion data (comments) if available
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
List Bulk JobsDIFFBOT_LIST_BULK_JOBSAcciónTool to list all bulk jobs associated with a specific token. use after authenticating to retrieve statuses of all jobs for the account.
DIFFBOT_LIST_BULK_JOBSAcciónTool to list all bulk jobs associated with a specific token. use after authenticating to retrieve statuses of all jobs for the account.
Parámetros de entrada
Sin parámetros.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Resolve Lost IDDIFFBOT_RESOLVE_LOST_IDAcciónTool to resolve lost ids in the knowledge graph. use when you need to map a lost identifier to its canonical counterpart for data consistency.
DIFFBOT_RESOLVE_LOST_IDAcciónTool to resolve lost ids in the knowledge graph. use when you need to map a lost identifier to its canonical counterpart for data consistency.
Parámetros de entrada
typestringThe type of object (e.g., 'article', 'product'). If omitted, Diffbot will attempt to infer.
lostIdstringObligatorioThe lost ID which needs to be resolved to a canonical ID.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Start Bulk JobDIFFBOT_START_BULKAcciónTool to start a bulk extract job. use when processing large numbers of urls asynchronously.
DIFFBOT_START_BULKAcciónTool to start a bulk extract job. use when processing large numbers of urls asynchronously.
Parámetros de entrada
namestringOptional job name for identification.
urlsstring[]List of page URLs to process (max 1000).
apiUrlstringObligatoriouriFull Extract API URL to call (must include token).
urlListstringComma-separated list of URLs or public file URL containing URLs.
jobConfigobjectAdvanced bulk job configuration object.
notifyEmailstringemailEmail to notify when job completes.
notifyWebhookstringuriWebhook URL to POST on job completion.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Start Crawl JobDIFFBOT_START_CRAWLAcciónTool to spider a site for links and process them with the extract api into a single collection. use when you have seed urls and want to collect structured data across a site. requires a plus plan for crawl api access.
DIFFBOT_START_CRAWLAcciónTool to spider a site for links and process them with the extract api into a single collection. use when you have seed urls and want to collect structured data across a site. requires a plus plan for crawl api access.
Parámetros de entrada
namestringObligatorioUnique name for the crawl job.
typestringObligatorioType of content to extract (e.g., 'article', 'product').
seedsstring[]ObligatorioList of URLs to begin crawling.
apiUrlstringCustom Extract API endpoint to process crawled URLs.
repeatstringSchedule for repeating this crawl (e.g., 'daily').
crawlDelaynumberSeconds to wait between requests to the same domain.
maxToCrawlintegerMaximum number of pages to crawl before stopping.
notifyEmailstringEmail to notify upon crawl completion.
maxToProcessintegerMaximum number of pages to process through the Extract API.
customHeadersobjectCustom HTTP headers to include when crawling.
obeyRobotsTxtbooleanWhether to respect robots.txt directives.
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not
Stop Bulk JobDIFFBOT_STOP_BULK_JOBAcciónTool to stop a running bulk job. use when you need to halt further processing of urls in a job in progress. invoke only after confirming the jobid to avoid accidental stoppage.
DIFFBOT_STOP_BULK_JOBAcciónTool to stop a running bulk job. use when you need to halt further processing of urls in a job in progress. invoke only after confirming the jobid to avoid accidental stoppage.
Parámetros de entrada
jobIdstringObligatorioUnique identifier of the Bulk job to stop
Parámetros de salida
dataobjectObligatorioData from the action execution
errorstringError if any occurred during the execution of the action
successfulbooleanObligatorioWhether or not the action execution was successful or not