distil API v1
API Documentation
The distil API converts PDF, DOCX, MD, TXT, and HTML documents into schema-valid, RAG-ready Markdown chunks. All requests are authenticated. Base URL: https://getdistil.app
01
Quick Start
Four steps from signup to your first RAG-ready ZIP.
Get an API key
Sign in → Dashboard → API Keys → Generate key. Copy the key — it is shown once.
Upload a document
POST your PDF, DOCX, MD, TXT, or HTML file to /api/v1/convert. You receive a job_id immediately.
Poll for completion
GET /api/v1/jobs/:id every few seconds. When status is "done", an output_url is included.
Download your ZIP
Fetch output_url directly. It's a Supabase signed URL containing your Markdown chunks. URL is valid for the duration of your plan's retention window.
# Step 2 — upload
curl -X POST https://getdistil.app/api/v1/convert \
-H "Authorization: Bearer distil_sk_..." \
-F "file=@report.pdf"
# → {"job_id":"abc123","status":"pending","estimated_seconds":30}
# Step 3 — poll
curl https://getdistil.app/api/v1/jobs/abc123 \
-H "Authorization: Bearer distil_sk_..."02
Authentication
All API requests require an API key passed as a Bearer token.
Authorization: Bearer distil_sk_...API keys have the format distil_sk_ followed by 32 random characters (42 characters total).
Keys are shown once at creation and cannot be retrieved later. Generate and manage keys in the Dashboard → API Keys.
03
POSTPOST /api/v1/presign
Get a signed Supabase Storage URL so your browser can upload a file directly — bypassing your server entirely. Use this instead of sending the file through /convert when building a web UI.
Request (JSON)
{
"filename": "report.pdf",
"content_type": "application/pdf",
"size": 1048576
}Response — 200 OK
{
"upload_url": "https://xxxx.supabase.co/storage/v1/object/sign/...",
"storage_path": "inputs/user_id/job_id.pdf",
"job_id": "3f7a1c2d-..."
}Upload the file with a PUT request to upload_url. Then pass job_id and storage_path in the body of /convert to start processing.
| Status | Error | Condition |
|---|---|---|
| 401 | Unauthorized | Missing or invalid API key |
| 413 | File too large | size exceeds 20 MB |
| 415 | Unsupported format | filename extension is not supported |
04
POSTPOST /api/v1/convert
Submit a document for conversion. Returns a job_id immediately — processing happens asynchronously.
Request
Content-Type: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
| file | File | Yes | PDF, DOCX, MD, TXT, or HTML. Max 20 MB. |
| domain | string | No | hr | legal | it | sales | support | product | finance | ops | marketing — skip LLM domain classification |
| language | string | No | de | en — skip language detection |
| pii_level | string | No | none | low | high — skip PII classification |
| governance_level | string | No | green | yellow | red — set directly, no LLM call |
| chunk_size | string | No | 200–2000 tokens per chunk. Overrides the default of 800. Pro and Scale only. |
| webhook_url | string | No | HTTPS URL to notify on completion. Pro and Scale only. See Webhook. |
Response — 202 Accepted
{
"job_id": "3f7a1c2d-...",
"status": "pending",
"estimated_seconds": 30
}Error responses
| Status | Error | Condition |
|---|---|---|
| 401 | Unauthorized | Missing or invalid API key |
| 402 | Monthly page limit reached / upgrade required | Monthly quota exhausted, or a Pro-only param (chunk_size, webhook_url) was used on a Free plan |
| 413 | File too large. Maximum size is 20 MB. | File exceeds 20 MB |
| 415 | Unsupported format. | File extension is not .pdf, .docx, .md, .txt, or .html |
curl -X POST https://getdistil.app/api/v1/convert \ -H "Authorization: Bearer distil_sk_..." \ -F "file=@report.pdf" \ -F "webhook_url=https://example.com/hook"
05
POSTPOST /api/v1/batches
Upload up to 5 files in one request. Each file becomes its own job, processed sequentially. Returns a batch_id you can poll.
Request
Content-Type: multipart/form-data
Fields: files[] (up to 5 files) + optional overrides domain, language, pii_level, governance_level, chunk_size, webhook_url — applied to all files.
Response — 202 Accepted
{
"batch_id": "b1c2d3e4-...",
"jobs": [
{
"job_id": "3f7a1c2d-...",
"filename": "report.pdf",
"status": "pending",
"poll_url": "/api/v1/jobs/3f7a1c2d-..."
}
]
}GET /api/v1/batches/:id
Poll the batch as a whole. Returns aggregated counts.
{
"batch_id": "b1c2d3e4-...",
"total": 3,
"done": 2,
"failed": 0,
"pending": 1,
"jobs": [ /* same shape as /convert response per job */ ]
}| Status | Error | Condition |
|---|---|---|
| 401 | Unauthorized | Missing or invalid API key |
| 402 | Plan upgrade required | Batch upload is Pro/Scale only |
| 400 | Too many files | More than 5 files submitted |
| 413 | File too large | One or more files exceed 20 MB |
06
GETGET /api/v1/jobs/:id
Poll for job status. Poll every 2–5 seconds until status is done or failed.
Response — pending or processing
{
"job_id": "3f7a1c2d-...",
"status": "pending" // or "processing"
"created_at": "2026-04-02T10:00:00Z"
}Response — done
{
"job_id": "3f7a1c2d-...",
"status": "done",
"result_status": "success", // or "partial"
"output_url": "https://...", // signed URL, fetch directly
"total_chunks": 42,
"total_tokens": 18400,
"total_pages": 14,
"completed_at": "2026-04-02T10:00:35Z",
"expires_at": "2026-04-03T10:00:35Z",
"source_filename": "report.pdf"
}"success" means all chunks passed validation. "partial" means some chunks failed schema validation — the ZIP still contains the valid chunks, and failed_chunks in the webhook payload tells you how many were dropped.output_url is a Supabase Storage signed URL. Fetch it directly with a GET request — no auth header needed. The URL is valid for your plan's retention window: Free 24 h · Pro 7 days · Scale 30 days.
Response — failed
{
"job_id": "3f7a1c2d-...",
"status": "failed",
"error_message": "string",
"created_at": "2026-04-02T10:00:00Z"
}Error responses
| Status | Error | Condition |
|---|---|---|
| 401 | Unauthorized | Missing or invalid API key |
| 404 | Job not found | Job does not exist or belongs to a different user |
| 410 | Job output has expired | Output ZIP has expired (Free: 24 h · Pro: 7 days · Scale: 30 days). Re-convert the document. |
curl https://getdistil.app/api/v1/jobs/3f7a1c2d \ -H "Authorization: Bearer distil_sk_..."
07
Schema Reference
Each chunk in the output ZIP is a Markdown file with YAML frontmatter. Every field is purpose-built for RAG retrieval.
Example chunk
--- id: a1b2c3d4 source_file: contract-2024.pdf source_pages: [11, 12] title: Limitation of Liability os_component: governance context_tier: task domain: legal language: en pii_level: none summary: Caps total liability at 12 months of fees paid. keywords: [liability, indemnity, consequential damages, cap] token_count: 187 governance_level: yellow --- ## 9.2 Limitation of Liability Neither party shall be liable for indirect, incidental, or consequential damages...
Fields
idstring8-character SHA-256 derived from content and position. Stable across re-runs if content is unchanged.source_filestringOriginal filename of the uploaded document.source_pagesnumber[]Page numbers in the source document this chunk covers.titlestringHuman-readable title derived from the section heading or LLM-generated if none exists.os_component"skill" | "knowledge" | "governance" | "context_profile"Semantic category. skill = procedural/how-to. knowledge = factual/reference. governance = policy/rule. context_profile = identity/persona.context_tier"core" | "task" | "background"Relevance tier for context budgeting. core = always include. task = include for specific tasks. background = include only when needed.domain"hr" | "legal" | "it" | "sales" | "support" | "product" | "finance" | "ops" | "marketing"Business domain. Use for filtered retrieval.language"de" | "en"Primary language of the chunk content.pii_level"none" | "low" | "high"PII risk flag. none = no personal data. low = non-sensitive personal data present. high = sensitive personal data present.summarystring1–3 sentence summary of the chunk in the same language as the content.keywordsstring[] (3–7)Key terms and phrases for keyword-based retrieval.token_countnumberEstimated token count of the chunk content. Use for context window budgeting.governance_level"green" | "yellow" | "red"optionalAccess control classification. green = public. yellow = internal. red = restricted. Defaults to green if omitted.questions_answeredstring[] (2–5)optionalQuestions this chunk answers. Useful as retrieval hints for question-answering systems.ZIP contents
output.zip ├── summary.md # human-readable conversion summary ├── index.yaml # document index — all chunks listed with metadata ├── chunk_001-2.md ├── chunk_002-4.md └── ...
PDF files include page-range suffixes (chunk_001-2.md = ends on page 2). DOCX files use plain names (chunk_001.md) — python-docx does not provide page boundaries.
08
Webhook
Pass a webhook_url when uploading to receive a POST notification on completion.
When it fires
On success, partial, and failed completion. The error_message field is only present on failed; download_url is only present on success / partial.
Payload
// POST to your webhook_url // Content-Type: application/json // X-Distil-Signature: sha256=<hmac> { "job_id": "3f7a1c2d-...", "status": "success", // "success" | "partial" | "failed" "completed_at": "2026-04-04T08:00:00.000Z", "download_url": "https://...", // success/partial only "total_chunks": 42, "total_pages": 14, "error_message": "..." // failed only }
Signature verification
Every request carries an X-Distil-Signature header. The value is sha256=<HMAC-SHA256(raw_body, secret)>. Verify it to reject spoofed payloads.
// Node.js — Express example
const crypto = require('crypto')
app.post('/hook', express.raw({ type: 'application/json' }), (req, res) => {
const sig = req.headers['x-distil-signature'] ?? ''
const mac = 'sha256=' + crypto
.createHmac('sha256', process.env.DISTIL_WEBHOOK_SECRET)
.update(req.body)
.digest('hex')
if (!crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(mac))) {
return res.status(401).send('Invalid signature')
}
const payload = JSON.parse(req.body)
// handle payload ...
res.sendStatus(200)
})# Python — Flask example
import hmac, hashlib, os
@app.route('/hook', methods=['POST'])
def webhook():
sig = request.headers.get('X-Distil-Signature', '')
secret = os.environ['DISTIL_WEBHOOK_SECRET'].encode()
mac = 'sha256=' + hmac.new(secret, request.data, hashlib.sha256).hexdigest()
if not hmac.compare_digest(sig, mac):
abort(401)
payload = request.get_json(force=True)
# handle payload ...
return '', 20009
Usage Quotas
Quotas are measured in pages per month and reset at the start of each calendar month. Exceeding your quota returns a 402 response.
| Plan | Pages / month | Output retention | Bulk upload | Price |
|---|---|---|---|---|
| Free | 50 | 24 hours | — | €0 |
| Pro | 1,000 | 7 days | Up to 5 files | €29 / month |
| Scale | 10,000 | 30 days | Up to 5 files | €199 / month |
Upgrade your plan in the Dashboard → Billing.