Skip to main content

extract.run()

Extracts structured data from a document synchronously based on provided instructions.
client.extract.run(
    input="https://example.com/document.pdf",
    instructions={"schema": {...}, "prompt": "Extract all contact information"},
    parsing={...},
    settings={...}
)

Parameters

input
string | list[string]
required
The URL of the document to be processed. You can provide one of the following:
  1. A publicly available URL
  2. A presigned S3 URL
  3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
  4. A jobid:// prefixed URL obtained from a previous /parse invocation
  5. A list of URLs (for multi-document pipelines, V3 API only)
instructions
object
The instructions to use for the extraction. Define the schema and extraction prompts.
parsing
ParseOptions
The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.
settings
object
The settings to use for the extraction.
async_
ConfigV3AsyncConfig
The configuration options for asynchronous processing (default synchronous). Only available when using async mode.

Response

ExtractRunResponse
ExtractResponse | AsyncExtractResponse
Returns either an ExtractResponse with the extracted data (sync mode) or an AsyncExtractResponse containing a job_id (async mode).

extract.run_job()

Extracts structured data from a document asynchronously and returns a job ID immediately.
response = client.extract.run_job(
    input="https://example.com/document.pdf",
    instructions={"schema": {...}, "prompt": "Extract all contact information"},
    async_={"webhook": {"url": "https://example.com/webhook"}},
    parsing={...},
    settings={...}
)

print(response.job_id)  # Use this to check job status later

Parameters

input
string | list[string]
required
The URL of the document to be processed. You can provide one of the following:
  1. A publicly available URL
  2. A presigned S3 URL
  3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
  4. A jobid:// prefixed URL obtained from a previous /parse invocation
  5. A list of URLs (for multi-document pipelines, V3 API only)
instructions
object
The instructions to use for the extraction. Define the schema and extraction prompts.
async_
ConfigV3AsyncConfig
The configuration options for asynchronous processing (default synchronous).
parsing
ParseOptions
The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.
settings
object
The settings to use for the extraction.

Response

ExtractRunJobResponse
object
job_id
string
The ID of the asynchronous job. Use client.job.get(job_id) to retrieve the result when the job completes.