Docs

Upload a file, preview it, transform it, and download it in any supported format. Files are processed in memory and never stored.

CLI (alpha)

Professional subscribers can also use Reparatio from the command line without writing raw HTTP calls.

CLI — reparatio-cli

Install once with pipx or uv; use the reparatio command from any terminal.

pipx install reparatio-cli
reparatio key set rp_YOUR_KEY
reparatio convert sales.csv sales.parquet
reparatio inspect report.xlsx

View on GitHub →

In early alpha — interfaces may change. Requires a Professional subscription API key.

Supported Formats

Input

CSV · TSV · GZ (any supported format inside) · ZIP (any supported format inside) · BZ2 (any supported format inside) · ZST (any supported format inside)
Excel (.xlsx, .xls) · ODS
JSON · JSON Lines · GeoJSON
YAML (.yaml, .yml) · XML · BSON
SQL dump (.sql) · SQLite (.sqlite, .db)
Parquet · Feather · Arrow · ORC · Avro
PDF (text-layer tables) · HTML tables
Markdown tables (.md, .markdown)
Subtitles (.srt, .vtt)

Output

CSV · TSV · CSV.GZ · CSV.BZ2 · CSV.ZST · CSV.ZIP · TSV.GZ · TSV.BZ2 · TSV.ZST · TSV.ZIP
Excel (.xlsx) · ODS
JSON · JSON.GZ · JSON.BZ2 · JSON.ZST · JSON.ZIP · JSON Lines · JSON Lines GZ · JSON Lines BZ2 · JSON Lines ZST · JSON Lines ZIP
GeoJSON · GeoJSON.GZ · GeoJSON.BZ2 · GeoJSON.ZST · GeoJSON.ZIP
YAML · BSON
SQLite
Parquet · Feather · Arrow · ORC · Avro
Subtitles (.srt, .vtt)

Gzip (.csv.gz, .tsv.gz, etc.), bzip2 (.csv.bz2, etc.), Zstandard (.csv.zst, etc.), and ZIP archives are decompressed automatically on input — the first file inside a ZIP is used. Hive-partitioned Parquet: upload a ZIP containing only .parquet files to load them as a single merged dataset; partition columns are automatically extracted from key=value directory names (e.g. year=2024/region=west/data.parquet).

Convert

The main tab. Upload a file, preview the schema and first 8 rows, then download in any output format.

Fix Encoding — Auto-detects character encoding and repairs garbled text (mojibake). Recommended for files from legacy systems. On by default. Includes an EBCDIC heuristic that detects common mainframe encodings automatically. For reliable results with EBCDIC files, use the API's encoding_override parameter (e.g. cp037 for EBCDIC US, cp500 for EBCDIC International, cp1026 for EBCDIC Turkish, cp1140 for EBCDIC US with Euro sign) via the CLI, SDK, or MCP server.
No header row — For CSV/TSV files where the first row is data. Columns will be named col_0, col_1, etc.
Make headers editable — Click any column header in the preview to rename it. The new names are used in the output.
Delimiter — For CSV-like files with a non-standard separator (e.g. | or ;). Leave blank to auto-detect.
Sheet / Table picker — Appears automatically for formats with multiple tables: Excel, ODS, SQLite, HTML, SQL dumps, PDF, and Markdown files.
Column selection — Check or uncheck columns to include in the output. Use Toggle All to flip all at once. Hover a column header to see its type, null count, and unique count.
Schema export — After previewing a file, a collapsible Schema export panel appears below the preview. It shows the detected schema as a ready-to-paste Polars schema= dict, a Pandas dtype= dict, and a CREATE TABLE SQL statement. Each block has a copy button.
Override column types — After previewing, expand Override column types inside the column selector to change how a column is interpreted at conversion time. Select the target type from the dropdown; for Date or Datetime columns, an optional format field appears (e.g. %d/%m/%Y). Unsupported values are silently coerced to null rather than aborting the conversion.
Null value strings — Enter a comma-separated list of strings to treat as null when loading the file (e.g. N/A, NULL, -, none). Passed as Polars null_values=. Useful for files where missing data is encoded as a placeholder rather than an empty cell.
Column reordering — Use the up/down arrow buttons in the column selection table to reorder columns before converting. The output file respects the new column order.
Filter rows — Enter a SQL WHERE-style expression to keep only matching rows. No SQL keyword needed — just the condition, e.g. amount > 100 and region = 'EU'. Column names are case-sensitive.
Deduplicate rows — Removes duplicate rows from the output, keeping the first occurrence.
Sample — Output a random sample. Enter a number of rows (e.g. 1000) or a fraction (e.g. 0.1 for 10%). Only one is applied; row count takes priority if both are set.
Preview Output — Click the Preview Output button to see the first 8 rows of the converted file before downloading. Calls /convert with preview_only=true and renders the result as a table. No credit is consumed for previews.

Merge

Combine two or more files into one. Select both files — shared columns are detected automatically for join operations.

Append — Stack all rows from both files vertically. Columns that don't exist in one file are filled with null. You can add additional files beyond the first two using the "+ Add another file" button.
Left Join — All rows from File 1; matching columns from File 2 where the join column(s) match.
Right Join — All rows from File 2; matching columns from File 1 where the join column(s) match.
Full Outer Join — All rows from both files; nulls where no match exists.
Inner Join — Only rows where the join column(s) match in both files.

Use Preview Result to inspect the first 8 rows before downloading. A warning is shown if columns are mismatched on append.

Append

Stack rows from two or more files into a single output. Available as a dedicated tab in the web UI (between Merge and SQL). Columns are matched by name; columns missing from some files are filled with null.

Upload two or more files and choose an output format. The files are concatenated in the order they were added.
A warning is shown if column sets differ between files.
Also available via the API at /api/v1/append, the CLI (reparatio append), and the Python SDK (client.append()).

SQL Query

Run a SQL query directly against an uploaded file. The file is loaded as a table named data.

Standard SQL is supported: SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, aggregations, and most scalar functions.
Use Preview Result to run the query and see the first 8 rows as a table.
Use Download Result to run the query and save the full output in your chosen format.
For Excel, ODS, SQLite, HTML, SQL dumps, and other multi-table formats, a sheet picker appears after selecting the file.
File caching — your file is uploaded once and held in memory on the server for 15 minutes. Every query you run against it resets the clock. You can run as many queries as you like without re-uploading. If the session expires after 15 minutes of inactivity, the file is automatically re-uploaded on your next query.

JSON Flattener

Available for all users; unlimited nesting depth requires a Standard or Professional subscription. After previewing a .json or .jsonl file, a Flatten nested JSON checkbox appears in the options row. When checked, nested objects are expanded into flat columns using _ as a separator (e.g. address_city).

Depth — Controls how many levels of nesting to expand. 1 level expands only top-level keys. Unlimited recurses all the way down; requires Standard or Professional. Free users who select Unlimited receive 1-level flattening in the downloaded file (the preview is unrestricted).
Arrays: Serialize as JSON strings — Any column that still contains a list after normalization is converted to a JSON string (e.g. ["a","b"]). This is the safe default — no rows are added or lost.
Arrays: Explode first array column — The first column that contains lists is exploded: each list element becomes its own row. Useful when the array represents repeated records (e.g. a tags or items field). Remaining list columns are serialized as JSON strings.
For .json files without the checkbox checked, the existing behaviour is preserved: one level of normalization is applied automatically.
For .jsonl files without the checkbox, each line is read as a native Polars row, preserving Struct and List column types.

Data Cleaner

Standard or Professional subscribers only. After previewing a file on the Convert tab, a Clean This File button appears below the preview. Click it to apply the selected cleaning operations in one step. The preview updates immediately and the download button uses the cleaned data.

Remove duplicate rows — Keeps the first occurrence of each duplicate row and discards the rest.
Drop empty columns — Removes any column whose every value is null or blank.
Fix placeholders — Converts common stand-ins for missing data (N/A, NULL, null, None, -, nan, missing, etc.) to proper null values.
Strip whitespace — Removes leading and trailing spaces from every string cell.
Standardize numbers — Strips currency symbols and thousands separators from string columns that look mostly numeric (e.g. $1,234.56 → 1234.56). Only applied to columns where more than 50 % of non-null values parse as a number. Unchecked by default.
Missing values — Choose whether to keep nulls as-is, drop any row that contains a null, or replace null strings with an empty string.

After cleaning, a summary is shown: e.g. "Removed 2,347 duplicate rows · Dropped 3 empty columns · Fixed 45 placeholders." The cleaned file is held in the 15-minute cache; use Download Converted File to save it in your chosen format.

File Splitter

Standard or Professional subscribers only. Upload a large file and split it into multiple smaller files, downloaded as a single ZIP archive.

Split into N files — Divide the data into exactly N equal parts (the last part may be slightly smaller).
Max rows per file — Each output file contains at most this many rows.
Max size per file (MB) — Each output file is approximately this size. The row count per chunk is estimated from a 1,000-row sample of the data, so actual file sizes may vary slightly.
Output files are named filename_part0001.ext, filename_part0002.ext, etc.
Supported output formats: CSV, TSV, JSON Lines, Parquet, Excel (.xlsx).
Maximum 500 output files per split operation.
File caching is shared with the SQL Query tab — your file is held in memory for 15 minutes after upload.

Format Notes

YAML (.yaml, .yml) — input & output

YAML files are expected to contain a list of records at the top level (the same shape as a JSON array of objects). A single-record YAML document is also accepted.

Nested keys are flattened with _ separators on read (e.g. address_city). On write, each row becomes a YAML mapping in a top-level sequence.

XML (.xml) — input only

XML is parsed via xmltodict and then flattened with pd.json_normalize. This works reliably for simple data-export XML (e.g. a root element containing a list of repeated child elements).

⚠ XML is a document format, not a tabular one. The parser makes a best-effort attempt to find a list of records inside your document. Results depend entirely on the shape of the XML:

Deeply nested or mixed-content documents may produce many columns with sparse data.
XML attributes become columns prefixed with @.
Repeated sibling elements at the same level are treated as a list of records — if there is only one sibling, it may be parsed as a single dict instead.
XML with namespaces will include the namespace prefix in column names.

If the result looks wrong, inspect the raw XML structure and consider flattening it manually before uploading.

SQL dump (.sql) — input only

Accepts mysqldump, phpMyAdmin exports, and similar MySQL-dialect dumps. The file is parsed with sqlparse, MySQL-specific syntax is stripped, and the selected table's CREATE TABLE + INSERT INTO statements are replayed into a temporary in-memory SQLite database.

Use the Sheet / Table picker to choose which table to extract. The picker shows the number of INSERT statements per table as a rough row-count proxy (multi-row INSERTs count as one statement).

⚠ PostgreSQL pg_dump and dumps that use stored procedures, custom types, or COPY statements are not fully supported. Basic INSERT-based pg_dump output usually works.

PDF (.pdf) — input only

Tables are extracted from the PDF's text layer using pdfplumber. The Sheet / Table picker lists every table found across all pages, labelled by page number.

⚠ PDF extraction has hard limits — it reads text that is already present in the file. It does not perform OCR. Scanned PDFs (images of pages with no text layer) will return no tables. OCR support is planned for a future release.

⚠ Table detection quality depends on the PDF. Tables with merged cells, rotated text, irregular spacing, or drawn without visible rules may produce garbled or incomplete results. Always preview before downloading.

HTML tables (.html, .htm) — input only

Every <table> element in the page is extracted via pandas.read_html. Use the Sheet / Table picker to choose which table to use.

Headers are inferred from <th> elements. Multi-level column headers are joined with a space. Inline formatting and links are ignored.

Markdown tables (.md, .markdown) — input only

GitHub-flavoured Markdown pipe tables are extracted using a pure-Python regex parser (no dependencies). If the file contains more than one table, the Sheet / Table picker appears so you can choose which one to use.

All cell values are treated as plain text strings. Column alignment markers in the separator row are ignored.

Subtitles (.srt, .vtt) — input & output

Subtitle files are parsed into a four-column table: index, start_time, end_time, text. Timestamps are normalised to HH:MM:SS.mmm. Inline HTML and VTT cue tags are stripped from the text.

On output, the index, start_time, end_time, and text columns are used if present; missing columns are filled with defaults. This means you can translate subtitles by editing the text column (e.g. via the SQL Query tab) and then export back to SRT or VTT.

SRT uses comma as the millisecond separator (00:00:01,500); VTT uses a dot (00:00:01.500) and includes a WEBVTT header.

BSON (.bson) — input & output

Reads MongoDB dump files produced by mongodump (raw concatenated BSON document format). Each document becomes a row; nested fields are flattened with _ separators.

On output, each row is encoded as a separate BSON document and written sequentially — compatible with mongorestore.

⚠ BSON-specific types (ObjectId, Decimal128, Binary, etc.) are converted to their string representations when flattening. They are not restored on round-trip write.

Free Tier & Row Limits

No account or payment is required to try anything. Anonymous users get up to 100 rows per conversion and 10 conversions per day. Register and verify your email for 250 rows and 25 conversions per day — still free.

If the output exceeds the row limit and you have not entered an access key, the download will contain only the allowed rows. This applies to Convert, Merge, Append, and SQL Query alike.

To download full files without row limits, see the Pricing page.

Plans: Standard ($29/mo) for unlimited rows and all formats; Professional ($79/mo) adds API, CLI, MCP, Fixed-Width, and EBCDIC; Credits ($10 = 25 conversions) for pay-as-you-go use. Credits never expire.

Other Limits

Maximum file size: 1 GB (combined size for Merge)
Preview shows the first 8 rows only. The full file is used for conversion, merging, and queries.
Excel output is limited to approximately 1,048,576 rows (Excel's row limit).
SQLite output writes a single table named data. SQLite input reads the first table by default; use the sheet picker to choose another.

Error Messages

When something goes wrong, Reparatio shows the original technical error from the processing engine alongside a plain-English explanation. The technical message is selectable so you can copy it for a support request; the explanation beneath it describes what likely went wrong and what to try.

Encoding errors — Enable Fix Encoding in the options. It auto-detects and repairs garbled characters from legacy systems.
Parse errors — The file may be corrupt, or its extension may not match the actual format. Try opening it in another application first.
Empty file / no data found — Check the sheet picker. For PDFs, the file must have a real text layer; scanned PDFs (image-only pages) cannot be read.
SQL errors — The table is always named data. Check column names in the preview — they are case-sensitive.
GeoJSON output fails — Your file must have a column named geometry containing WKT strings (e.g. POINT(-73.9 40.7)).

Conversion Warnings

Some conversions work well in every case. Others depend on the shape of your data.

JSON / JSON Lines / YAML / BSON → CSV, TSV, or Excel

Works well only if your data is a flat list of objects with consistent keys. Nested objects are flattened with _ separators (e.g. address_city). Fields that contain arrays will likely produce unreadable results or errors.

Any format → GeoJSON

Requires a column named geometry containing valid WKT geometry strings (e.g. POINT (0 0)). If no such column exists, the conversion will fail. Converting from GeoJSON always produces a geometry column, so GeoJSON → GeoJSON always works.

GeoJSON → CSV, TSV, or Excel

Works, but geometry is stored as a plain WKT text string. The spatial information is preserved as text, but the output file will not be recognized as a spatial format by GIS tools.

Avro — Boolean columns

Boolean columns do not survive an Avro round-trip cleanly. They are read back as integers (0 / 1). If boolean semantics matter, convert to a different format first.

REST API

The Reparatio REST API lets you convert, inspect, merge, query, append, and parse fixed-width and EBCDIC mainframe files programmatically. A Professional subscription is required for all endpoints that produce output files.

Base URL: https://reparatio.app/api/v1

Interactive reference (ReDoc): https://reparatio.app/api/redoc

Try-it console (Swagger UI): https://reparatio.app/api/docs

OpenAPI schema (JSON): https://reparatio.app/api/openapi.json

Higher-level wrappers (alpha)

If you prefer not to write raw HTTP calls, Reparatio provides SDKs for Python, JavaScript/TypeScript, Common Lisp, Java, and C#.

Python SDK — reparatio-sdk-py

github.com/jfrancis42/reparatio-sdk-py

pip install reparatio

from reparatio import Reparatio
with Reparatio(api_key="rp_...") as client:
    result = client.inspect("data.csv")
    out = client.convert("data.csv", "parquet")
    open(out.filename, "wb").write(out.content)

JavaScript / TypeScript SDK — reparatio-sdk-js

github.com/jfrancis42/reparatio-sdk-js

npm install reparatio

import { Reparatio } from "reparatio";
const client = new Reparatio("rp_YOUR_KEY");

const result = await client.inspect(file, "data.csv");
const { data, filename } = await client.convert(file, "parquet", "data.csv");
const { data: zip } = await client.batchConvert(zipFile, "parquet");

Common Lisp SDK — reparatio-sdk-cl

github.com/jfrancis42/reparatio-sdk-cl

; Requires SBCL + Quicklisp
(ql:quickload "reparatio")
(use-package :reparatio)
(let ((c (make-client :api-key "rp_...")))
  (inspect-file c #p"data.csv")
  (let ((r (convert c #p"data.csv" "parquet")))
    (reparatio-result-content r)))

Java SDK — reparatio-sdk-java

github.com/jfrancis42/reparatio-sdk-java

// Requires Java 11+ (no external deps)
import com.reparatio.ReparatioClient;
var client = new ReparatioClient("rp_...");
var info   = client.inspect(Path.of("data.csv"));
var result = client.convert(Path.of("data.csv"), "parquet");
Files.write(Path.of(result.filename()), result.content());

C# SDK — reparatio-sdk-cs

github.com/jfrancis42/reparatio-sdk-cs

// Requires .NET 6+ or Mono
using Reparatio;
var client = new ReparatioClient("rp_...");
var info   = await client.InspectAsync("data.csv");
var result = await client.ConvertAsync("data.csv", "parquet");
File.WriteAllBytes(result.Filename, result.Content);

Ruby SDK — reparatio-sdk-rb

github.com/jfrancis42/reparatio-sdk-rb

# gem install reparatio
require "reparatio"
client = Reparatio.new(api_key: "rp_...")
info   = client.inspect("data.csv")
result = client.convert("data.csv", "parquet")
File.binwrite(result.filename, result.content)

Requires a Professional plan API key ($79/mo).

Authentication

Pass your API key in the X-API-Key request header. Keys are issued after purchase on the Pricing tab.

API access requires a Professional subscription. See Pricing.

X-API-Key: rp_xxxxxxxxxxxxxxxxxxxx

The /health, /formats, and /inspect endpoints do not require authentication. All other endpoints require a Professional subscription.

Common Parameters

Requests that accept a file use multipart/form-data. Parameters are form fields unless noted otherwise.

Parameter	Type	Default	Description
`no_header`	bool	false	Treat the first row of CSV/TSV as data, not a header. Columns will be named `col_0`, `col_1`, …
`fix_encoding`	bool	true	Auto-detect and repair character encoding (mojibake). Recommended for files from legacy systems.
`delimiter`	string	""	Column separator for CSV-like files. Leave empty to auto-detect.
`sheet`	string	""	Sheet or table name for Excel, ODS, SQLite, HTML, PDF, and SQL dump files. Leave empty for the first sheet.

Response Headers

Header	When present	Description
`Content-Disposition`	All file downloads	Suggested filename for the converted output.
`X-Reparatio-Warning`	`/merge`, `/append`	Human-readable warning, e.g. "Column mismatch — missing values filled with null."

Error Codes

Status	Meaning
400	Bad request — e.g. SQL syntax error, unknown merge operation.
401	Missing or invalid `X-API-Key` header.
403	Subscription inactive or plan does not include API access (Professional plan required).
413	File too large — maximum 2 GB per file (Professional). 500 MB (Standard).
422	File could not be parsed — check format and encoding.

Error responses are JSON: {"detail": "human readable message"}.

Endpoints

GET /health No auth required

Liveness check. Returns immediately with no side effects.

{"status": "healthy", "version": "1"}

GET /formats No auth required

Returns the lists of supported input and output format identifiers.

{"input": ["csv", "tsv", "xlsx", ...], "output": ["csv", "tsv", "parquet", ...]}

GET /me Pro

Returns subscription details for the provided API key.

{"email": "you@example.com", "plan": "pro", "tier": "pro", "expires_at": "2026-04-14T00:00:00+00:00", "api_access": true, "active": true, "credits_balance": 0, "email_verified": true}

credits_balance — remaining pay-as-you-go credits (credits plans only; 0 for subscription plans).

POST /inspect No auth required

Inspect a file without converting it. Returns schema, detected encoding, row count, sheet names, per-column stats, and a data preview.

Parameter	Type	Default	Description
`file`	file (required)	—	The data file to inspect.
`no_header`	bool	false	See Common Parameters.
`fix_encoding`	bool	true	See Common Parameters.
`delimiter`	string	""	See Common Parameters.
`sheet`	string	""	See Common Parameters.
`preview_rows`	int	8	Number of preview rows to return (1–100).

Response: filename, detected_encoding, rows, sheets, columns[] (name, dtype, null_count, unique_count), preview[].

POST /convert Pro

Convert a file to a different format. Returns the converted file as a binary download.

Parameter	Type	Default	Description
`file`	file (required)	—	The file to convert.
`target_format`	string (required)	—	Output format: `csv`, `tsv`, `csv.gz`, `csv.bz2`, `csv.zst`, `csv.zip`, `tsv.gz`, `tsv.bz2`, `tsv.zst`, `tsv.zip`, `xlsx`, `ods`, `json`, `json.gz`, `json.bz2`, `json.zst`, `json.zip`, `jsonl`, `jsonl.gz`, `jsonl.bz2`, `jsonl.zst`, `jsonl.zip`, `geojson`, `geojson.gz`, `geojson.bz2`, `geojson.zst`, `geojson.zip`, `parquet`, `feather`, `arrow`, `orc`, `avro`, `sqlite`. See `/formats`.
`no_header`	bool	false	See Common Parameters.
`fix_encoding`	bool	true	See Common Parameters.
`delimiter`	string	""	See Common Parameters.
`sheet`	string	""	See Common Parameters.
`columns`	JSON array	[]	Rename columns. Supply an array of new names in the same order as the source columns. Must match the column count exactly, or is ignored.
`select_columns`	JSON array	[]	Include only the named columns in the output. Order is preserved.
`deduplicate`	bool	false	Remove duplicate rows, keeping the first occurrence.
`deduplicate_on`	JSON array	[]	Column names to use for deduplication key. Empty array deduplicates on all columns.
`sample_n`	int	0	Return a random sample of exactly N rows. Takes priority over `sample_frac`.
`sample_frac`	float	0.0	Return a random fraction of rows, e.g. `0.1` for 10%.
`geometry_column`	string	"geometry"	Column containing WKT or GeoJSON geometry, used when outputting to GeoJSON.
`cast_columns`	JSON object	{}	Override inferred column types. Keys are column names; values are objects with `"type"` (required) and optional `"format"` for date parsing. Example: `{"price":{"type":"Float64"},"date":{"type":"Date","format":"%d/%m/%Y"}}`. Supported types: `String`, `Int8`–`Int64`, `UInt8`–`UInt64`, `Float32`, `Float64`, `Boolean`, `Date`, `Datetime`, `Time`. Failed casts are silently coerced to null.
`null_values`	string (JSON array)	[]	Strings to treat as null at load time, e.g. `["N/A","NULL","-"]`. Passed as Polars `null_values=`.
`encoding_override`	string	""	Force a specific encoding, bypassing chardet auto-detection. Pass any Python codec name, e.g. `cp037` (EBCDIC US), `cp500` (EBCDIC International), `cp1026` (EBCDIC Turkish), `cp1140` (EBCDIC US+Euro). Leave empty to auto-detect.
`preview_only`	bool	false	If true, return the first 8 rows as JSON instead of the converted file. Useful for previewing before downloading.
`webhook_url`	string	""	Optional HTTPS URL to receive a completion notification. When provided, the converted file is cached for 15 minutes and a JSON job receipt is returned immediately (see Webhooks). Must be a public HTTPS endpoint — private and loopback IPs are blocked.

Response (normal): binary file stream with Content-Disposition: attachment; filename="…". When preview_only=true, returns a JSON array of row objects instead. When webhook_url is set, returns a JSON job receipt instead of the file (see Webhooks).

POST /batch-convert Pro

Accept a ZIP of input files, convert each file to target_format, and return a ZIP of converted files. Files that cannot be parsed are skipped; their names and error messages are returned in the X-Reparatio-Errors response header as a URL-encoded JSON array of {"file","error"} objects.

Parameter	Type	Default	Description
`zip_file`	file (required)	—	ZIP archive containing the input files.
`target_format`	string (required)	—	Output format for every file. Same values as `/convert`.
`no_header`	bool	false	See Common Parameters.
`fix_encoding`	bool	true	See Common Parameters.
`delimiter`	string	""	See Common Parameters.
`select_columns`	JSON array	[]	Include only the named columns in the output.
`deduplicate`	bool	false	Remove duplicate rows from each file.
`deduplicate_on`	JSON array	[]	Deduplication key columns.
`sample_n`	int	0	Random sample of N rows per file.
`sample_frac`	float	0.0	Random sample fraction per file.
`cast_columns`	JSON object	{}	Column type overrides applied to every file. Same format as `/convert`.
`webhook_url`	string	""	Optional HTTPS URL to receive a completion notification. When provided, the converted ZIP is cached for 15 minutes and a JSON job receipt is returned immediately (see Webhooks). Must be a public HTTPS endpoint.

Response (normal): application/zip with Content-Disposition: attachment; filename="converted.zip". If any files were skipped, X-Reparatio-Errors contains a URL-encoded JSON array. When webhook_url is set, returns a JSON job receipt instead (see Webhooks).

GET /download/{token} Pro

Download a converted file using the job_id returned by a webhook-mode conversion. Tokens are valid for 15 minutes after the conversion completes. Requires the same API key that submitted the job.

Response: binary file stream with appropriate Content-Type and Content-Disposition headers. Returns HTTP 404 if the token is expired or unknown.

Webhooks

When webhook_url is supplied to /convert or /batch-convert, the conversion runs synchronously and the server makes a single POST to your URL after the job finishes. The converted file is cached for 15 minutes and retrievable via GET /api/v1/download/{job_id}.

The same JSON object is returned in the API response and POSTed to your webhook endpoint, so you can treat the initial response as the receipt and use the webhook as a secondary notification.

Webhook request headers

Header	Value
`Content-Type`	`application/json`
`User-Agent`	`Reparatio-Webhook/1.0`
`X-Reparatio-Signature`	`sha256=<hex>` — HMAC-SHA256 of the raw request body, keyed with your `WEBHOOK_SECRET`. Only present if `WEBHOOK_SECRET` is configured on the server. Verify this to confirm the request originated from Reparatio.

Payload schema — conversion.completed

Field	Type	Description
`event`	string	Always `"conversion.completed"`.
`job_id`	string (UUID)	Unique identifier for this job. Use with `GET /api/v1/download/{job_id}` to retrieve the file.
`status`	string	Always `"completed"` (errors that prevent any output raise HTTP 4xx before the webhook fires).
`download_url`	string	Full URL to retrieve the converted file. Valid for 15 minutes.
`filename`	string	Suggested download filename, e.g. `"sales.parquet"` or `"converted.zip"`.
`input_format`	string	Detected input format, e.g. `"csv"`. `"zip"` for batch jobs.
`output_format`	string	Requested output format.
`input_bytes`	int	Size of the uploaded file in bytes.
`output_bytes`	int	Size of the converted output in bytes.
`elapsed_ms`	int	Server-side conversion time in milliseconds.
`files_converted`	int	Number of files successfully converted. Batch-convert only.
`timestamp`	string (ISO 8601)	UTC timestamp of job completion.
`errors`	array	Array of `{"file","error"}` objects for files that could not be converted. Empty for single-file jobs. Non-empty does not prevent the webhook from firing — at least one file converted successfully.

Example payload

{
  "event":           "conversion.completed",
  "job_id":          "a3f7c2d1-84b0-4e19-9f6a-0c1d2e3f4a5b",
  "status":          "completed",
  "download_url":    "https://reparatio.app/api/v1/download/a3f7c2d1-...",
  "filename":        "sales.parquet",
  "input_format":    "csv",
  "output_format":   "parquet",
  "input_bytes":     204800,
  "output_bytes":    51200,
  "elapsed_ms":      312,
  "timestamp":       "2026-03-14T18:42:00.123456+00:00",
  "errors":          []
}

Signature verification (Python example)

import hmac, hashlib

def verify_signature(body: bytes, header: str, secret: str) -> bool:
    expected = "sha256=" + hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, header)

POST /merge Pro

Merge or join exactly two files. Returns the merged file as a binary download.

Parameter	Type	Default	Description
`file1`	file (required)	—	First (left) file.
`file2`	file (required)	—	Second (right) file.
`operation`	string (required)	—	One of: `append`, `inner`, `left`, `right`, `outer`. Use `append` to stack rows; the join variants require `join_on`.
`join_on`	string	""	Comma-separated column names to join on, e.g. `id,date`. Required for join operations.
`target_format`	string (required)	—	Output format identifier.
`no_header`	bool	false	See Common Parameters.
`fix_encoding`	bool	true	See Common Parameters.
`geometry_column`	string	"geometry"	See `/convert`.

Response: binary file stream. Check X-Reparatio-Warning for column-mismatch notices.

POST /append Pro

Stack rows from two or more files into a single output. Columns are matched by name; missing values are filled with null.

Parameter	Type	Default	Description
`files`	file[] (required)	—	Two or more files to stack. Send multiple `files` fields in the same form.
`target_format`	string (required)	—	Output format identifier.
`no_header`	bool	false	See Common Parameters.
`fix_encoding`	bool	true	See Common Parameters.

Response: binary file stream named appended.<ext>. Check X-Reparatio-Warning for column-mismatch notices.

POST /query Pro

Execute a SQL SELECT query against a file and return the result as a download. The file is loaded as a table named data.

Parameter	Type	Default	Description
`file`	file (required)	—	The file to query.
`sql`	string (required)	—	A SQL query. Reference the file as `data`, e.g. `SELECT * FROM data WHERE amount > 100`.
`target_format`	string	"csv"	Output format for the query result.
`no_header`	bool	false	See Common Parameters.
`fix_encoding`	bool	true	See Common Parameters.
`delimiter`	string	""	See Common Parameters.
`sheet`	string	""	See Common Parameters.

Response: binary file stream named <original_name>_query.<ext>.

EBCDIC & Fixed-Width Format (Professional)

Reparatio can decode IBM EBCDIC mainframe files and parse fixed-width column layouts. Both features require a Professional subscription and are accessed through dedicated endpoints.

POST /api/v1/convert EBCDIC encoding override

Pass encoding_override to force an EBCDIC code page. Reparatio will auto-detect EBCDIC in most cases, but explicit override is recommended for production pipelines.

Code page	Description
`cp037`	IBM US/Canada — most common North American mainframe encoding
`cp500`	IBM International — common in European mainframe installs
`cp1047`	IBM z/OS Unix Services
`cp1140`	IBM US with Euro sign (€) — modern billing systems
`cp273`	IBM Germany / Austria
`cp285`	IBM United Kingdom
`cp297`	IBM France
`cp875`	IBM Greece

curl -X POST https://reparatio.app/api/v1/convert \
  -H "X-API-Key: rp_xxxxxxxxxxxxxxxxxxxx" \
  -F "file=@mainframe_export.dat" \
  -F "target_format=csv" \
  -F "encoding_override=cp037" \
  -o output.csv

POST /api/v1/fwf-detect Auto-detect column boundaries

Upload a fixed-width file; Reparatio analyses whitespace density to suggest column split positions. Returns a JSON array of boundary offsets and suggested column names.

Parameter	Type	Default	Description
`file`	file (required)	—	Fixed-width text file to analyse.
`encoding_override`	string	""	Force encoding (e.g. `cp037`). Leave empty to auto-detect.
`skip_rows`	int	0	Number of header/metadata rows to skip before data begins.

curl -X POST https://reparatio.app/api/v1/fwf-detect \
  -H "X-API-Key: rp_xxxxxxxxxxxxxxxxxxxx" \
  -F "file=@report.txt" \
  -F "encoding_override=cp037"

POST /api/v1/fwf-convert Parse and convert a fixed-width file

Convert a fixed-width file to any supported output format using explicit column boundaries. Typically called after /fwf-detect to confirm boundaries, then again with user-adjusted positions and column names.

Parameter	Type	Default	Description
`file`	file (required)	—	Fixed-width text file.
`boundaries`	string (required)	—	JSON array of integer column start positions, e.g. `[0,10,25,40]`.
`col_names`	string	""	JSON array of column names. If omitted, columns are named `col_0`, `col_1`, etc.
`target_format`	string	"csv"	Output format (csv, xlsx, parquet, json, tsv, …).
`encoding_override`	string	""	Force encoding (e.g. `cp037`).
`skip_rows`	int	0	Rows to skip before data.
`strip_char`	string	""	Character to strip from cell values (e.g. `\|` for pipe-padded files).

curl -X POST https://reparatio.app/api/v1/fwf-convert \
  -H "X-API-Key: rp_xxxxxxxxxxxxxxxxxxxx" \
  -F "file=@report.txt" \
  -F 'boundaries=[0,10,25,40,60]' \
  -F 'col_names=["id","name","dept","salary","hire_date"]' \
  -F "target_format=parquet" \
  -F "encoding_override=cp037" \
  -o report.parquet

Code Examples

All examples convert data.csv to Parquet using the /api/v1/convert endpoint.

curl

curl -X POST https://reparatio.app/api/v1/convert \
  -H "X-API-Key: rep_live_xxxxxxxxxxxxxxxxxxxx" \
  -F "file=@data.csv" \
  -F "target_format=parquet" \
  -o data.parquet

Python

import httpx

API_KEY = "rep_live_xxxxxxxxxxxxxxxxxxxx"
BASE    = "https://reparatio.app/api/v1"

with open("data.csv", "rb") as f:
    resp = httpx.post(
        f"{BASE}/convert",
        headers={"X-API-Key": API_KEY},
        data={"target_format": "parquet"},
        files={"file": ("data.csv", f, "text/csv")},
        timeout=120,
    )
resp.raise_for_status()

with open("data.parquet", "wb") as out:
    out.write(resp.content)

print("Saved data.parquet")

JavaScript

const API_KEY = "rep_live_xxxxxxxxxxxxxxxxxxxx";
const BASE    = "https://reparatio.app/api/v1";

async function convertFile(file, targetFormat) {
  const form = new FormData();
  form.append("file", file);
  form.append("target_format", targetFormat);

  const resp = await fetch(`${BASE}/convert`, {
    method: "POST",
    headers: { "X-API-Key": API_KEY },
    body: form,
  });

  if (!resp.ok) {
    const err = await resp.json();
    throw new Error(err.detail);
  }

  const blob = await resp.blob();
  const url  = URL.createObjectURL(blob);
  const a    = document.createElement("a");
  a.href     = url;
  a.download = "data.parquet";
  a.click();
  URL.revokeObjectURL(url);
}

// Usage (browser — pass a File object from an <input> element):
// convertFile(fileInput.files[0], "parquet");

Common Lisp

;;; Requires: drakma
;;; (ql:quickload :drakma)

(defparameter *api-key* "rep_live_xxxxxxxxxxxxxxxxxxxx")
(defparameter *base*    "https://reparatio.app/api/v1")

(defun convert-file (input-path target-format output-path)
  "Convert INPUT-PATH to TARGET-FORMAT and write the result to OUTPUT-PATH.
  INPUT-PATH may be a string or pathname; OUTPUT-PATH is written as binary."
  (let ((path (pathname input-path)))
    (multiple-value-bind (body status)
        (drakma:http-request
         (format nil "~A/convert" *base*)
         :method              :post
         :form-data           t
         :force-binary        t
         :additional-headers  `(("X-API-Key" . ,*api-key*))
         ;; File parameters are lists: (field-name pathname content-type filename)
         ;; String parameters are conses: (field-name . value)
         :parameters          `(("file"          ,path "application/octet-stream" ,(file-namestring path))
                                 ("target_format" . ,target-format)))
      (unless (= status 200)
        (error "API error ~A" status))
      (with-open-file (out output-path
                           :direction    :output
                           :element-type '(unsigned-byte 8)
                           :if-exists    :supersede)
        (write-sequence body out))
      (format t "Saved ~A~%" output-path))))

(convert-file "data.csv" "parquet" "data.parquet")

C#

using System.Net.Http;
using System.Net.Http.Headers;

const string ApiKey  = "rep_live_xxxxxxxxxxxxxxxxxxxx";
const string BaseUrl = "https://reparatio.app/api/v1";

using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-API-Key", ApiKey);
client.Timeout = TimeSpan.FromSeconds(120);

await using var fileStream = File.OpenRead("data.csv");
var content = new MultipartFormDataContent();
content.Add(new StreamContent(fileStream), "file", "data.csv");
content.Add(new StringContent("parquet"), "target_format");

var response = await client.PostAsync($"{BaseUrl}/convert", content);
response.EnsureSuccessStatusCode();

var bytes = await response.Content.ReadAsByteArrayAsync();
await File.WriteAllBytesAsync("data.parquet", bytes);
Console.WriteLine("Saved data.parquet");

Java

// Requires Java 11+ (java.net.http is in the standard library)
import java.net.URI;
import java.net.http.*;
import java.net.http.HttpRequest.BodyPublishers;
import java.nio.file.*;

public class ReparatioExample {
    static final String API_KEY  = "rep_live_xxxxxxxxxxxxxxxxxxxx";
    static final String BASE_URL = "https://reparatio.app/api/v1";

    public static void main(String[] args) throws Exception {
        byte[] fileBytes = Files.readAllBytes(Path.of("data.csv"));
        String boundary  = "----ReparatioBoundary";

        String body = "--" + boundary + "\r\n"
            + "Content-Disposition: form-data; name=\"file\"; filename=\"data.csv\"\r\n"
            + "Content-Type: application/octet-stream\r\n\r\n";
        byte[] bodyStart  = body.getBytes();
        byte[] fieldPart  = ("\r\n--" + boundary + "\r\n"
            + "Content-Disposition: form-data; name=\"target_format\"\r\n\r\n"
            + "parquet\r\n--" + boundary + "--\r\n").getBytes();

        byte[] requestBody = new byte[bodyStart.length + fileBytes.length + fieldPart.length];
        System.arraycopy(bodyStart, 0, requestBody, 0,                               bodyStart.length);
        System.arraycopy(fileBytes, 0, requestBody, bodyStart.length,                fileBytes.length);
        System.arraycopy(fieldPart, 0, requestBody, bodyStart.length + fileBytes.length, fieldPart.length);

        HttpClient client   = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(BASE_URL + "/convert"))
            .header("X-API-Key", API_KEY)
            .header("Content-Type", "multipart/form-data; boundary=" + boundary)
            .POST(BodyPublishers.ofByteArray(requestBody))
            .build();

        HttpResponse<byte[]> response = client.send(request, HttpResponse.BodyHandlers.ofByteArray());
        if (response.statusCode() != 200) throw new RuntimeException("API error: " + response.statusCode());

        Files.write(Path.of("data.parquet"), response.body());
        System.out.println("Saved data.parquet");
    }
}

Inspect a file (curl)

curl -X POST https://reparatio.app/api/v1/inspect \
  -F "file=@data.xlsx" \
  -F "preview_rows=5"

SQL query (curl)

curl -X POST https://reparatio.app/api/v1/query \
  -H "X-API-Key: rep_live_xxxxxxxxxxxxxxxxxxxx" \
  -F "file=@sales.csv" \
  -F "sql=SELECT region, SUM(amount) AS total FROM data GROUP BY region ORDER BY total DESC" \
  -F "target_format=csv" \
  -o summary.csv

Join two files (curl)

curl -X POST https://reparatio.app/api/v1/merge \
  -H "X-API-Key: rep_live_xxxxxxxxxxxxxxxxxxxx" \
  -F "file1=@customers.csv" \
  -F "file2=@orders.csv" \
  -F "operation=left" \
  -F "join_on=customer_id" \
  -F "target_format=parquet" \
  -o customers_with_orders.parquet

MCP Server

⚠

Alpha Software

The Reparatio MCP server is in early alpha. Interfaces, tool names, and behaviour may change without notice. Use it in production at your own risk and report issues via the Support tab.

MCP access requires a Professional subscription ($79/mo). See Pricing.

reparatio-mcp on GitHub

MCP server — Python 3.11+, MIT-licensed. Requires uv or pip.

View on GitHub

reparatio-cli on GitHub (alpha)

Command-line tool — pipx install reparatio-cli · MIT-licensed.

View on GitHub

reparatio-sdk-py on GitHub (alpha)

Python SDK — pip install reparatio · MIT-licensed.

View on GitHub

How it works

The MCP server runs locally on your machine. When your AI assistant calls a tool, the server reads the file from your disk, sends it to the Reparatio API for processing, and writes the result back to disk — returning the output path to the assistant. Binary data never passes through the MCP protocol layer; only file paths and options travel as JSON.

Prerequisites

Python 3.11+ — check with python3 --version
A Reparatio API key — required for convert, merge, append, and query. inspect_file works without a key. Get one on the Pricing tab (Professional plan).

uvx (from uv) is the recommended way to run the server — no install step required:

curl -LsSf https://astral.sh/uv/install.sh | sh

Installation

Option A — run without installing (recommended):

uvx reparatio-mcp

Option B — install permanently:

pip install reparatio-mcp

Client Configuration

MCP is a client-side protocol. Your AI assistant application needs to support MCP tool calls. The LLM backend (Claude, GPT-4o, Grok, DeepSeek, Gemini, a local Ollama model, etc.) is configured separately inside each client — the MCP server works with any of them.

Claude Desktop

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json · Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "reparatio": {
      "command": "uvx",
      "args": ["reparatio-mcp"],
      "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
    }
  }
}

Restart Claude Desktop after saving.

Cursor

Global: ~/.cursor/mcp.json · Per-project: .cursor/mcp.json in your project root

{
  "mcpServers": {
    "reparatio": {
      "command": "uvx",
      "args": ["reparatio-mcp"],
      "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
    }
  }
}

Reload the MCP server list in Cursor Settings → MCP after saving.

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "reparatio": {
      "command": "uvx",
      "args": ["reparatio-mcp"],
      "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
    }
  }
}

VS Code — GitHub Copilot

VS Code 1.99+. Add to settings.json (Ctrl+Shift+P → "Open User Settings (JSON)"):

{
  "mcp": {
    "servers": {
      "reparatio": {
        "type": "stdio",
        "command": "uvx",
        "args": ["reparatio-mcp"],
        "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
      }
    }
  }
}

VS Code / JetBrains — Continue

Edit ~/.continue/config.json. Continue supports OpenAI, Anthropic, DeepSeek, Grok, Gemini, Ollama, and more.

{
  "mcpServers": [
    {
      "name": "reparatio",
      "command": "uvx",
      "args": ["reparatio-mcp"],
      "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
    }
  ]
}

Zed

Edit ~/.config/zed/settings.json

{
  "context_servers": {
    "reparatio": {
      "command": {
        "path": "uvx",
        "args": ["reparatio-mcp"],
        "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
      }
    }
  }
}

Open WebUI (Ollama and others)

Go to Settings → Tools → MCP Servers, add a new server with command uvx reparatio-mcp and environment variable REPARATIO_API_KEY=rp_YOUR_KEY. Works with any Ollama model.

Claude Code (CLI)

Add to your project's .mcp.json file:

{
  "mcpServers": {
    "reparatio": {
      "command": "uvx",
      "args": ["reparatio-mcp"],
      "env": { "REPARATIO_API_KEY": "rp_YOUR_KEY" }
    }
  }
}

Environment Variables

Variable	Default	Description
`REPARATIO_API_KEY`	—	Your `rp_…` API key from the Pricing tab.

Available Tools

inspect_file No API key required

Detect encoding, count rows, list column types and statistics, and return a data preview. No API key required — works on the free tier.

Parameter	Type	Default	Description
`path`	string (required)	—	Local file path.
`no_header`	bool	false	Treat first row as data (CSV/TSV).
`fix_encoding`	bool	true	Auto-detect and repair encoding.
`delimiter`	string	""	Custom delimiter (auto-detected if blank).
`sheet`	string	""	Sheet or table name for Excel, ODS, SQLite.
`preview_rows`	int	8	Preview rows to return (1–100).

convert_file Pro

Convert a file to a different format. Optionally select or rename columns, deduplicate rows, or sample a subset. Saves the result to disk and returns the output path.

Parameter	Type	Default	Description
`input_path`	string (required)	—	Source file path.
`target_format`	string (required)	—	Output format: csv, tsv, csv.gz, csv.bz2, csv.zst, csv.zip, tsv.gz, tsv.bz2, tsv.zst, tsv.zip, xlsx, ods, json, json.gz, json.bz2, json.zst, json.zip, jsonl, jsonl.gz, jsonl.bz2, jsonl.zst, jsonl.zip, geojson, geojson.gz, geojson.bz2, geojson.zst, geojson.zip, yaml, bson, srt, vtt, parquet, feather, arrow, orc, avro, sqlite.
`output_path`	string	auto	Where to save the result (default: same directory, extension changed).
`no_header`	bool	false	Treat first row as data.
`fix_encoding`	bool	true	Repair encoding.
`delimiter`	string	""	Custom delimiter for CSV-like input.
`sheet`	string	""	Sheet or table to read.
`columns`	array	[]	Rename columns — new names in order (must match column count).
`select_columns`	array	[]	Columns to include in output (all if omitted).
`deduplicate`	bool	false	Remove duplicate rows.
`sample_n`	int	—	Random sample of N rows.
`sample_frac`	float	—	Random sample fraction, e.g. 0.1 for 10%.
`geometry_column`	string	"geometry"	WKT geometry column for GeoJSON output.
`cast_columns`	object	{}	Override column types. E.g. `{"price": "Float64", "date": "Date:%d/%m/%Y"}`.
`null_values`	array	[]	Strings to treat as null at load time, e.g. `["N/A", "NULL", "-"]`.
`encoding_override`	string	""	Force a specific encoding, bypassing auto-detection. E.g. `cp037` (EBCDIC US), `cp500` (EBCDIC International), `cp1026` (EBCDIC Turkish), `cp1140` (EBCDIC US+Euro). Leave blank to auto-detect.

batch_convert Pro

Convert every file inside a ZIP archive to a common format and return a ZIP of converted files. Files that cannot be parsed are skipped; their names and errors are returned in the output path's warning string.

Parameter	Type	Default	Description
`zip_path`	string (required)	—	Path to ZIP archive containing input files.
`target_format`	string (required)	—	Output format for every file in the ZIP.
`output_path`	string	auto	Path for the output ZIP (default: `converted.zip` in same directory).
`select_columns`	array	[]	Columns to include from every file.
`deduplicate`	bool	false	Remove duplicate rows from each file.
`sample_n`	int	—	Random sample of N rows per file.
`sample_frac`	float	—	Random sample fraction per file.
`cast_columns`	object	{}	Column type overrides applied to every file.

merge_files Pro

Merge two files using a SQL-style join or append (row-stacking). Saves the result to disk and returns the output path.

Parameter	Type	Default	Description
`file1_path`	string (required)	—	First (left) file path.
`file2_path`	string (required)	—	Second (right) file path.
`operation`	string (required)	—	`append`, `inner`, `left`, `right`, or `outer`.
`target_format`	string (required)	—	Output format.
`output_path`	string	auto	Where to save the result.
`join_on`	string	""	Comma-separated column name(s) to join on (required for non-append operations).
`no_header`	bool	false	Treat first row as data.
`fix_encoding`	bool	true	Repair encoding.

append_files Pro

Stack rows from two or more files vertically. Columns missing from some files are filled with null. Saves the result to disk.

Parameter	Type	Default	Description
`paths`	array (required)	—	List of local file paths to append (minimum 2).
`target_format`	string (required)	—	Output format.
`output_path`	string	auto	Default: `appended.{format}` in directory of first file.
`no_header`	bool	false	Treat first row as data.
`fix_encoding`	bool	true	Repair encoding.

query_file Pro

Run a SQL query against a file. The file is loaded as a table named data. Supports SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, aggregations, and most scalar functions.

Parameter	Type	Default	Description
`path`	string (required)	—	Source file path.
`sql`	string (required)	—	SQL query; reference the file as `data`.
`target_format`	string	"csv"	Output format.
`output_path`	string	auto	Default: `{stem}_query.{format}`.
`no_header`	bool	false	Treat first row as data.
`fix_encoding`	bool	true	Repair encoding.
`delimiter`	string	""	Custom delimiter for CSV-like input.
`sheet`	string	""	Sheet or table to read.

parse_fixed_width Pro

Parse a fixed-width or EBCDIC mainframe file and convert it to a structured format. Reparatio auto-detects column boundaries; pass explicit boundaries to override. Supports all IBM EBCDIC code pages via encoding_override.

Parameter	Type	Default	Description
`path`	string (required)	—	Path to the fixed-width or EBCDIC file.
`target_format`	string	"csv"	Output format (csv, xlsx, parquet, json, tsv, …).
`output_path`	string	auto	Default: `{stem}_parsed.{format}`.
`boundaries`	array	auto	Integer column start positions, e.g. `[0, 10, 25, 40]`. If omitted, Reparatio detects boundaries automatically.
`col_names`	array	auto	Column names. If omitted, columns are named `col_0`, `col_1`, etc.
`encoding_override`	string	""	Force EBCDIC or other encoding. E.g. `cp037` (US), `cp500` (International), `cp1047` (z/OS), `cp1140` (US+Euro). Leave blank to auto-detect.
`skip_rows`	int	0	Metadata rows at the top of the file to skip.
`strip_char`	string	""	Character to remove from cell values (e.g. `\|` for pipe-padded exports).

Supported Formats

Format	Input	Output
CSV	✓	✓
TSV	✓	✓
CSV.GZ (gzip)	✓	✓
CSV.BZ2 (bzip2)	✓	✓
CSV.ZST (zstd)	✓	✓
CSV.ZIP	✓	✓
TSV.GZ (gzip)	✓	✓
TSV.BZ2 (bzip2)	✓	✓
TSV.ZST (zstd)	✓	✓
TSV.ZIP	✓	✓
GZ (any supported format inside)	✓	—
ZIP (any supported format inside)	✓	—
BZ2 (any supported format inside)	✓	—
ZST (any supported format inside)	✓	—
Excel (.xlsx)	✓	✓
Excel (.xls, legacy)	✓	—
ODS	✓	✓
JSON	✓	✓
JSON.GZ (gzip)	✓	✓
JSON.BZ2 (bzip2)	✓	✓
JSON.ZST (zstd)	✓	✓
JSON.ZIP	✓	✓
JSON Lines	✓	✓
JSON Lines GZ (gzip)	✓	✓
JSON Lines BZ2 (bzip2)	✓	✓
JSON Lines ZST (zstd)	✓	✓
JSON Lines ZIP	✓	✓
GeoJSON	✓	✓
GeoJSON.GZ (gzip)	✓	✓
GeoJSON.BZ2 (bzip2)	✓	✓
GeoJSON.ZST (zstd)	✓	✓
GeoJSON.ZIP	✓	✓
Parquet	✓	✓
Feather	✓	✓
Arrow	✓	✓
ORC	✓	✓
Avro	✓	✓
SQLite	✓	✓
YAML	✓	✓
BSON	✓	✓
SRT (subtitles)	✓	✓
VTT (subtitles)	✓	✓
HTML / HTM	✓	—
Markdown	✓	—
XML	✓	—
SQL dump	✓	—
PDF (text layer)	✓	—
Fixed-Width / EBCDIC Pro	✓	—

Example Prompts

Once connected, use natural language with your AI assistant:

Inspect ~/data/sales.csv and tell me about the schema.

Convert ~/data/sales.csv to Parquet.

Convert ~/data/report.xlsx, sheet "Q3", to CSV, selecting only the
"date", "region", and "revenue" columns.

Merge orders.csv and customers.csv on customer_id using a left join,
output as Parquet.

Append all the CSV files in ~/data/monthly/ into one file and save it
as ~/data/all_months.parquet.

Query ~/data/events.parquet: give me total revenue by region for 2025,
sorted descending, as a JSON file.

I have ~/data/legacy.csv from an old Windows system — fix the encoding
and convert it to Excel.

Convert ~/data/mainframe_export.dat — it's an EBCDIC cp037 fixed-width
file. Detect the column boundaries and convert it to Parquet.

Default Output Paths

Tool	Default output filename
`convert_file`	`{input_stem}.{target_format}`
`merge_files`	`{file1_stem}_{operation}_{file2_stem}.{target_format}`
`append_files`	`appended.{target_format}`
`query_file`	`{input_stem}_query.{target_format}`
`parse_fixed_width`	`{input_stem}_parsed.{target_format}`

Troubleshooting

"REPARATIO_API_KEY is not set"

The key is missing from the env block in your client config. The variable name must be exactly REPARATIO_API_KEY.

"uvx: command not found"

Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh, then restart your terminal and your AI client.

"Insufficient plan"

The tool requires a Professional plan key ($79/mo). inspect_file is always free.

"File not found"

Use an absolute path starting with / or ~. Relative paths are resolved from the server process's working directory.

"Parse failure"

Try setting fix_encoding: true, specifying a delimiter, or confirming the file extension matches the actual format.

Tools not appearing in the client

Restart the client after editing the config. Confirm uvx is on the PATH visible to the client process — check with which uvx.

Privacy

Your files are sent to the Reparatio API at reparatio.app for processing. Files are handled in memory and never stored — see the Privacy Policy.

Limit	Value
File size	100 MB per file
Standard operations (inspect, convert, batch convert)	50 per day
Professional operations (SQL query, merge, join, append)	20 per day

Upload any data file. Download it in any format.

Privacy Policy

File Storage

Logging

Free Use

Rate-Limiting

Subscriptions and Keys

Information Sharing

Data Security

Contact Information

Security

Security Policy

Pricing

Almost there

Docs

CLI (alpha)

Supported Formats

Convert

Merge

Append

SQL Query

JSON Flattener

Data Cleaner

File Splitter

Format Notes

Free Tier & Row Limits

Other Limits

Error Messages

Conversion Warnings

REST API

Higher-level wrappers (alpha)

Authentication

Common Parameters

Response Headers

Error Codes

Endpoints

Webhooks

EBCDIC & Fixed-Width Format (Professional)

Code Examples

curl

Python

JavaScript

Common Lisp

C#

Java

Inspect a file (curl)

SQL query (curl)

Join two files (curl)

MCP Server

How it works

Prerequisites

Installation

Client Configuration

Claude Desktop

Cursor

Windsurf

VS Code — GitHub Copilot

VS Code / JetBrains — Continue

Zed

Open WebUI (Ollama and others)

Claude Code (CLI)

Environment Variables

Available Tools

Supported Formats

Example Prompts

Default Output Paths

Troubleshooting

Support

Merge Files

Append Files

Fixed-Width Parser

Fixed-Width Format Guide

What is a fixed-width file?

Column detection

Strip character

Data separator

Header separator

Skip rows

EBCDIC encoding

Tips