Document Splitting - Reducto Python SDK

The Split API intelligently divides documents into sections based on content categories, making it easy to organize and process different parts of a document separately.

Basic Usage

Sync
Async

from reducto import Reducto

client = Reducto()

response = client.split.run(
    input="https://example.com/document.pdf",
    split_description=[
        {"name": "summary", "description": "Executive summary section"},
        {"name": "financials", "description": "Financial data and tables"},
        {"name": "risks", "description": "Risk factors and disclosures"}
    ]
)
print(response)

import asyncio
from reducto import AsyncReducto

client = AsyncReducto()

async def main():
    response = await client.split.run(
        input="https://example.com/document.pdf",
        split_description=[
            {"name": "summary", "description": "Executive summary section"},
            {"name": "financials", "description": "Financial data and tables"},
            {"name": "risks", "description": "Risk factors and disclosures"}
        ]
    )
    print(response)

asyncio.run(main())

Method Signature

client.split.run(
    input: str,
    split_description: list[SplitCategory],
    parsing: ParseOptions | None = None,
    settings: dict | None = None,
    split_rules: str | None = None
) -> SplitResponse

Parameters

input

string

required

The URL of the document to split. You can provide:

A publicly available URL
A presigned S3 URL
A reducto:// prefixed URL from the /upload endpoint
A jobid:// prefixed URL from a previous parse invocation
A list of URLs (for multi-document pipelines, V3 API only)

split_description

array

required

List of category definitions for splitting the document. Each category should have:

name: Category identifier
description: Description of what content belongs in this category

parsing

ParseOptions

Configuration options for parsing the document. If you’re passing in a jobid:// URL, this will be ignored.

settings

object

Settings for split processing.

split_rules

string

Natural language prompt describing rules for splitting the document.

Split Categories

Define categories based on document structure:

Sync
Async

from reducto import Reducto

client = Reducto()

categories = [
    {
        "name": "introduction",
        "description": "Opening remarks and overview"
    },
    {
        "name": "methodology",
        "description": "Research methods and approach"
    },
    {
        "name": "results",
        "description": "Findings, data, and analysis"
    },
    {
        "name": "conclusion",
        "description": "Summary and final thoughts"
    }
]

response = client.split.run(
    input="https://example.com/research-paper.pdf",
    split_description=categories
)

# Access split sections
for section in response.sections:
    print(f"{section.category}: {len(section.content)} chars")

import asyncio
from reducto import AsyncReducto

client = AsyncReducto()

async def main():
    categories = [
        {
            "name": "introduction",
            "description": "Opening remarks and overview"
        },
        {
            "name": "methodology",
            "description": "Research methods and approach"
        },
        {
            "name": "results",
            "description": "Findings, data, and analysis"
        },
        {
            "name": "conclusion",
            "description": "Summary and final thoughts"
        }
    ]

    response = await client.split.run(
        input="https://example.com/research-paper.pdf",
        split_description=categories
    )

    # Access split sections
    for section in response.sections:
        print(f"{section.category}: {len(section.content)} chars")

asyncio.run(main())

Custom Split Rules

Add natural language rules to guide the splitting process:

from reducto import Reducto

client = Reducto()

response = client.split.run(
    input="https://example.com/contract.pdf",
    split_description=[
        {"name": "terms", "description": "Terms and conditions"},
        {"name": "pricing", "description": "Pricing and payment terms"},
        {"name": "warranties", "description": "Warranties and guarantees"}
    ],
    split_rules="Split at major section boundaries. Keep related clauses together. Preserve the hierarchy of subsections."
)

Split with Parsing Options

Combine splitting with custom parsing configuration:

from reducto import Reducto

client = Reducto()

response = client.split.run(
    input="https://example.com/document.pdf",
    split_description=[
        {"name": "text_sections", "description": "Text-heavy sections"},
        {"name": "data_sections", "description": "Sections with tables and charts"}
    ],
    parsing={
        "enhance": {
            "summarize_figures": True,
            "agentic": ["table"]
        },
        "formatting": {
            "table_output_format": "json",
            "add_page_markers": True
        }
    }
)

Async Job Processing

For large documents, use async job processing:

from reducto import Reducto

client = Reducto()

# Start an async split job
job = client.split.run_job(
    input="https://example.com/large-document.pdf",
    split_description=[
        {"name": "section1", "description": "First section"},
        {"name": "section2", "description": "Second section"}
    ],
    async_={
        "webhook": {"url": "https://example.com/webhook"}
    }
)

print(f"Job ID: {job.job_id}")

# Poll for results
result = client.job.get(job.job_id)

Financial Document Example

Split a financial report into meaningful sections:

from reducto import Reducto

client = Reducto()

categories = [
    {
        "name": "executive_summary",
        "description": "High-level overview and key highlights"
    },
    {
        "name": "financial_statements",
        "description": "Income statement, balance sheet, cash flow"
    },
    {
        "name": "md_and_a",
        "description": "Management discussion and analysis"
    },
    {
        "name": "footnotes",
        "description": "Accounting notes and disclosures"
    },
    {
        "name": "risk_factors",
        "description": "Risk disclosures and forward-looking statements"
    }
]

response = client.split.run(
    input="https://example.com/annual-report.pdf",
    split_description=categories,
    split_rules="Preserve table integrity. Keep footnote references with their corresponding sections."
)

# Process each section separately
for section in response.sections:
    print(f"\n=== {section.category.upper()} ===")
    print(section.content[:200] + "...")

Reusing Parsed Documents

Split a document that was previously parsed:

from reducto import Reducto

client = Reducto()

# First parse the document
parse_response = client.parse.run(
    input="https://example.com/document.pdf",
    formatting={"add_page_markers": True}
)

# Then split using the job ID (no re-parsing needed)
split_response = client.split.run(
    input=f"jobid://{parse_response.job_id}",
    split_description=[
        {"name": "part1", "description": "First part"},
        {"name": "part2", "description": "Second part"}
    ]
)

​Basic Usage

​Method Signature

​Parameters

​Split Categories

​Custom Split Rules

​Split with Parsing Options

​Async Job Processing

​Financial Document Example

​Reusing Parsed Documents