Configuration Types - Reducto Python SDK

Settings

General document processing settings.

document_password

string

Password to decrypt password-protected documents.

embed_pdf_metadata

boolean

If True, embed OCR metadata into the returned PDF. Defaults to False.

extraction_mode

enum

The mode to use for text extraction from PDFs. One of:

ocr - Uses optical character recognition only
hybrid - Combines OCR with embedded PDF text for best accuracy (default)

force_file_extension

string

Force the URL to be downloaded as a specific file extension (e.g. .png).

force_url_result

boolean

Force the result to be returned in URL form.

ocr_system

enum

OCR system to use. One of:

standard - Best multilingual OCR system
legacy - Only supports Germanic languages (for backwards compatibility)

page_range

Union[PageRange, List[PageRange], List[int]]

The page range to process (1-indexed). By default, the entire document is processed.

persist_results

boolean

If True, persist the results indefinitely. Defaults to False.

return_images

List[enum]

Whether to return images for the specified block types. Options: figure, table. By default, no images are returned.

return_ocr_data

boolean

If True, return OCR data in the result. Defaults to False.

timeout

float

The timeout for the job in seconds.

Enhance

Configuration for enhancing extraction accuracy using AI models.

agentic

List[Union[TableAgentic, FigureAgentic, TextAgentic]]

Agentic uses vision language models to enhance the accuracy of the output of different types of extraction. This will incur a cost and latency increase.

summarize_figures

boolean

If True, summarize figures using a small vision language model. Defaults to True.

TableAgentic

scope

literal

required

Always set to “table”.

prompt

string

Custom prompt for table agentic processing.

FigureAgentic

scope

literal

required

Always set to “figure”.

advanced_chart_agent

boolean

If True, use the advanced chart agent. Defaults to False.

prompt

string

Custom prompt for figure agentic processing.

return_overlays

boolean

If True, return overlays for the figure. This allows you to verify the quality of the extraction.

TextAgentic

scope

literal

required

Always set to “text”.

prompt

string

Custom instructions for agentic text. Note: This only applies to form regions (key-value).

Formatting

Configuration for output formatting options.

add_page_markers

boolean

If True, add page markers to the output. Defaults to False. Useful for extracting data with page specific information.

include

List[enum]

A list of formatting to include in the output. Options:

change_tracking
highlight
comments
hyperlinks
signatures

merge_tables

boolean

A flag to indicate if consecutive tables with the same number of columns should be merged. Defaults to False.

table_output_format

enum

The mode to use for table output. Options:

html - HTML table format
json - JSON array format
md - Markdown table format
jsonbbox - JSON with bounding boxes
dynamic - Returns md for simpler tables and html for complex tables (default)
csv - CSV format

Retrieval

Configuration for retrieval and chunking behavior.

chunking

Chunking

Chunking configuration for the document.

embedding_optimized

boolean

If True, use embedding optimized mode. Defaults to False.

filter_blocks

List[enum]

A list of block types to filter out from ‘content’ and ‘embed’ fields. By default, no blocks are filtered. Options:

Header
Footer
Title
Section Header
Page Number
List Item
Figure
Table
Key Value
Text
Comment
Signature

Chunking

Configuration for document chunking behavior.

chunk_mode

enum

Choose how to partition chunks. Options:

variable - Chunks by character length and visual context
section - Chunks by section headers
page - Chunks according to pages
page_sections - Chunks first by page, then by sections within each page
disabled - Returns one single chunk
block - Chunks by individual blocks

chunk_size

integer

The approximate size of chunks (in characters) that the document will be split into. Defaults to null, in which case the chunk size is variable between 250-1500 characters.

ChunkingConfig

Alternate chunking configuration (similar to Chunking).

chunk_mode

enum

Choose how to partition chunks. Options:

variable - Chunks by character length and visual context
section - Chunks by section headers
page - Chunks according to pages
page_sections - Chunks first by page, then by sections within each page
disabled - Returns one single chunk
block - Chunks by individual blocks

chunk_size

integer

The approximate size of chunks (in characters) that the document will be split into. Defaults to None, in which case the chunk size is variable between 250-1500 characters.

EnrichConfig

Configuration for content enrichment using AI models.

enabled

boolean

If enabled, a large language/vision model will be used to postprocess the extracted content. Note: enabling enrich requires tables be outputted in markdown format. Defaults to False.

mode

enum

The mode to use for enrichment. Options:

standard - Standard enrichment (default)
page - Page-level enrichment
table - Table-level enrichment

prompt

string

Add information to the prompt for enrichment.

Spreadsheet

Configuration for spreadsheet processing.

clustering

enum

In a spreadsheet with different tables inside, controls splitting behavior. Options:

accurate - Applies more powerful models for superior accuracy, at 5× the default per-cell rate
fast - Default clustering mode
disabled - Disables clustering; registers as one large table

exclude

List[enum]

Whether to exclude certain elements in the output. Options:

hidden_sheets
hidden_rows
hidden_cols
styling
spreadsheet_images

include

List[enum]

Whether to include certain elements in the output. Options:

cell_colors
formula

split_large_tables

SplitLargeTables

Configuration for splitting large tables.

SplitLargeTables

enabled

boolean

If True, split large tables into smaller tables. Defaults to True.

size

integer

The size of the tables to split into. Defaults to 50.

ArrayExtractConfig

Configuration for array extraction of long lists.

enabled

boolean

Array extraction allows you to extract long lists of information from lengthy documents. It makes parallel calls on overlapping sections of the document.

mode

enum

The array extraction version to use. Options:

auto - Automatically selects the best mode
legacy - Legacy extraction mode
streaming - Streaming extraction mode
no_overlap - No overlap between segments

pages_per_segment

integer

Length of each segment, in pages, for parallel calls with array extraction.

streaming_extract_item_density

integer

Number of items to extract in each stream call. Lower numbers will increase quality but be much slower. 50 works well for most documents with tables.

ParseOptions

Complete parsing options combining all configuration types.

enhance

Enhance

Enhancement configuration.

formatting

Formatting

Formatting configuration.

retrieval

Retrieval

Retrieval configuration.

settings

Settings

General settings configuration.

spreadsheet

Spreadsheet

Spreadsheet-specific configuration.

WebhookConfigNew

Configuration for webhook delivery.

channels

List[string]

A list of Svix channels the message will be delivered down. Omit to send to all channels.

metadata

object

JSON metadata included in webhook request body.

mode

enum

The mode to use for webhook delivery. Options:

disabled - No webhook delivery (default)
svix - Use Svix for webhook delivery (recommended for production)
direct - Direct webhook delivery

url

string

The URL to send the webhook to (if using direct webhook).

​Settings

​Enhance

​TableAgentic

​FigureAgentic

​TextAgentic

​Formatting

​Retrieval

​Chunking

​ChunkingConfig

​EnrichConfig

​Spreadsheet

​SplitLargeTables

​ArrayExtractConfig

​ParseOptions

​WebhookConfigNew

Settings

Enhance

TableAgentic

FigureAgentic

TextAgentic

Formatting

Retrieval

Chunking

ChunkingConfig

EnrichConfig

Spreadsheet

SplitLargeTables

ArrayExtractConfig

ParseOptions

WebhookConfigNew