Skip to main content
The Parse API converts documents into structured content, extracting text, tables, images, and layouts with high accuracy.

Basic Usage

from reducto import Reducto

client = Reducto()

response = client.parse.run(
    input="https://example.com/document.pdf"
)
print(response)

Method Signature

client.parse.run(
    input: str,
    enhance: Enhance | None = None,
    formatting: Formatting | None = None,
    retrieval: Retrieval | None = None,
    settings: Settings | None = None,
    spreadsheet: Spreadsheet | None = None,
    async_: ConfigV3AsyncConfig | None = None
) -> ParseRunResponse

Parameters

input
string
required
The URL of the document to parse. You can provide:
  • A publicly available URL
  • A presigned S3 URL
  • A reducto:// prefixed URL from the /upload endpoint
  • A jobid:// prefixed URL from a previous parse invocation
  • A list of URLs (for multi-document pipelines, V3 API only)
enhance
Enhance
Enhancement options for improving extraction accuracy.
formatting
Formatting
Control output formatting and structure.
retrieval
Retrieval
Configure chunking for retrieval-optimized output.
settings
Settings
Processing settings and preferences.
spreadsheet
Spreadsheet
Spreadsheet-specific parsing options.
async_
ConfigV3AsyncConfig
Configuration for asynchronous processing. When provided, the request returns immediately with a job ID.

Advanced Example

from reducto import Reducto

client = Reducto()

response = client.parse.run(
    input="https://example.com/document.pdf",
    enhance={
        "summarize_figures": True,
        "agentic": ["table", "figure"]
    },
    formatting={
        "add_page_markers": True,
        "table_output_format": "json",
        "merge_tables": False
    }
)

# Access the parsed content
print(response.content)

Async Job Processing

For long-running documents, use run_job() to process asynchronously:
from reducto import Reducto

client = Reducto()

# Start an async job
job = client.parse.run_job(
    input="https://example.com/large-document.pdf",
    async_={"webhook": {"url": "https://example.com/webhook"}}
)

print(f"Job ID: {job.job_id}")

# Poll for results
result = client.job.get(job.job_id)

Input Formats

The Parse API supports multiple input methods:

Direct URL

response = client.parse.run(
    input="https://example.com/document.pdf"
)

File Upload

from pathlib import Path

# First upload the file
upload_response = client.upload(
    file=Path("/path/to/document.pdf")
)

# Then parse using the reducto:// URL
response = client.parse.run(
    input=upload_response.url
)

Reuse Previous Parse

# Use output from a previous parse job
response = client.parse.run(
    input=f"jobid://{previous_job_id}"
)