/distil/API Docs

distil API v1

API Documentation

The distil API converts PDF, DOCX, MD, TXT, and HTML documents into schema-valid, RAG-ready Markdown chunks. All requests are authenticated. Base URL: https://getdistil.app


01

Quick Start

Four steps from signup to your first RAG-ready ZIP.

1

Get an API key

Sign in → Dashboard → API Keys → Generate key. Copy the key — it is shown once.

2

Upload a document

POST your PDF, DOCX, MD, TXT, or HTML file to /api/v1/convert. You receive a job_id immediately.

3

Poll for completion

GET /api/v1/jobs/:id every few seconds. When status is "done", an output_url is included.

4

Download your ZIP

Fetch output_url directly. It's a Supabase signed URL containing your Markdown chunks. URL is valid for the duration of your plan's retention window.

# Step 2 — upload
curl -X POST https://getdistil.app/api/v1/convert \
  -H "Authorization: Bearer distil_sk_..." \
  -F "file=@report.pdf"

# → {"job_id":"abc123","status":"pending","estimated_seconds":30}

# Step 3 — poll
curl https://getdistil.app/api/v1/jobs/abc123 \
  -H "Authorization: Bearer distil_sk_..."

02

Authentication

All API requests require an API key passed as a Bearer token.

Authorization: Bearer distil_sk_...

API keys have the format distil_sk_ followed by 32 random characters (42 characters total).

Keys are shown once at creation and cannot be retrieved later. Generate and manage keys in the Dashboard → API Keys.

Browser clients (e.g. your own web app using distil) may also authenticate via a Supabase session cookie — the API accepts both.

03

POSTPOST /api/v1/presign

Get a signed Supabase Storage URL so your browser can upload a file directly — bypassing your server entirely. Use this instead of sending the file through /convert when building a web UI.

Request (JSON)

{
  "filename":     "report.pdf",
  "content_type": "application/pdf",
  "size":         1048576
}

Response — 200 OK

{
  "upload_url":    "https://xxxx.supabase.co/storage/v1/object/sign/...",
  "storage_path": "inputs/user_id/job_id.pdf",
  "job_id":        "3f7a1c2d-..."
}

Upload the file with a PUT request to upload_url. Then pass job_id and storage_path in the body of /convert to start processing.

StatusErrorCondition
401UnauthorizedMissing or invalid API key
413File too largesize exceeds 20 MB
415Unsupported formatfilename extension is not supported

04

POSTPOST /api/v1/convert

Submit a document for conversion. Returns a job_id immediately — processing happens asynchronously.

Request

Content-Type: multipart/form-data

FieldTypeRequiredDescription
fileFileYesPDF, DOCX, MD, TXT, or HTML. Max 20 MB.
domainstringNohr | legal | it | sales | support | product | finance | ops | marketing — skip LLM domain classification
languagestringNode | en — skip language detection
pii_levelstringNonone | low | high — skip PII classification
governance_levelstringNogreen | yellow | red — set directly, no LLM call
chunk_sizestringNo200–2000 tokens per chunk. Overrides the default of 800. Pro and Scale only.
webhook_urlstringNoHTTPS URL to notify on completion. Pro and Scale only. See Webhook.

Response — 202 Accepted

{
  "job_id":           "3f7a1c2d-...",
  "status":           "pending",
  "estimated_seconds": 30
}

Error responses

StatusErrorCondition
401UnauthorizedMissing or invalid API key
402Monthly page limit reached / upgrade requiredMonthly quota exhausted, or a Pro-only param (chunk_size, webhook_url) was used on a Free plan
413File too large. Maximum size is 20 MB.File exceeds 20 MB
415Unsupported format.File extension is not .pdf, .docx, .md, .txt, or .html
curl -X POST https://getdistil.app/api/v1/convert \
  -H "Authorization: Bearer distil_sk_..." \
  -F "file=@report.pdf" \
  -F "webhook_url=https://example.com/hook"

05

POSTPOST /api/v1/batches

Upload up to 5 files in one request. Each file becomes its own job, processed sequentially. Returns a batch_id you can poll.

Pro feature — Bulk upload is available on Pro and Scale plans only.

Request

Content-Type: multipart/form-data

Fields: files[] (up to 5 files) + optional overrides domain, language, pii_level, governance_level, chunk_size, webhook_url — applied to all files.

Response — 202 Accepted

{
  "batch_id": "b1c2d3e4-...",
  "jobs": [
    {
      "job_id":    "3f7a1c2d-...",
      "filename":  "report.pdf",
      "status":    "pending",
      "poll_url":  "/api/v1/jobs/3f7a1c2d-..."
    }
  ]
}

GET /api/v1/batches/:id

Poll the batch as a whole. Returns aggregated counts.

{
  "batch_id": "b1c2d3e4-...",
  "total":    3,
  "done":     2,
  "failed":   0,
  "pending":  1,
  "jobs": [ /* same shape as /convert response per job */ ]
}
StatusErrorCondition
401UnauthorizedMissing or invalid API key
402Plan upgrade requiredBatch upload is Pro/Scale only
400Too many filesMore than 5 files submitted
413File too largeOne or more files exceed 20 MB

06

GETGET /api/v1/jobs/:id

Poll for job status. Poll every 2–5 seconds until status is done or failed.

Response — pending or processing

{
  "job_id":     "3f7a1c2d-...",
  "status":     "pending"  // or "processing"
  "created_at": "2026-04-02T10:00:00Z"
}

Response — done

{
  "job_id":        "3f7a1c2d-...",
  "status":        "done",
  "result_status": "success",  // or "partial"
  "output_url":    "https://...",  // signed URL, fetch directly
  "total_chunks":  42,
  "total_tokens":  18400,
  "total_pages":   14,
  "completed_at":  "2026-04-02T10:00:35Z",
  "expires_at":    "2026-04-03T10:00:35Z",
  "source_filename": "report.pdf"
}
result_status"success" means all chunks passed validation. "partial" means some chunks failed schema validation — the ZIP still contains the valid chunks, and failed_chunks in the webhook payload tells you how many were dropped.

output_url is a Supabase Storage signed URL. Fetch it directly with a GET request — no auth header needed. The URL is valid for your plan's retention window: Free 24 h · Pro 7 days · Scale 30 days.

Response — failed

{
  "job_id":        "3f7a1c2d-...",
  "status":        "failed",
  "error_message": "string",
  "created_at":    "2026-04-02T10:00:00Z"
}

Error responses

StatusErrorCondition
401UnauthorizedMissing or invalid API key
404Job not foundJob does not exist or belongs to a different user
410Job output has expiredOutput ZIP has expired (Free: 24 h · Pro: 7 days · Scale: 30 days). Re-convert the document.
curl https://getdistil.app/api/v1/jobs/3f7a1c2d \
  -H "Authorization: Bearer distil_sk_..."

07

Schema Reference

Each chunk in the output ZIP is a Markdown file with YAML frontmatter. Every field is purpose-built for RAG retrieval.

Example chunk

---
id:               a1b2c3d4
source_file:      contract-2024.pdf
source_pages:     [11, 12]
title:           Limitation of Liability
os_component:    governance
context_tier:    task
domain:          legal
language:        en
pii_level:       none
summary:         Caps total liability at 12 months of fees paid.
keywords:        [liability, indemnity, consequential damages, cap]
token_count:     187
governance_level: yellow
---

## 9.2 Limitation of Liability

Neither party shall be liable for indirect, incidental,
or consequential damages...

Fields

idstring8-character SHA-256 derived from content and position. Stable across re-runs if content is unchanged.
source_filestringOriginal filename of the uploaded document.
source_pagesnumber[]Page numbers in the source document this chunk covers.
titlestringHuman-readable title derived from the section heading or LLM-generated if none exists.
os_component"skill" | "knowledge" | "governance" | "context_profile"Semantic category. skill = procedural/how-to. knowledge = factual/reference. governance = policy/rule. context_profile = identity/persona.
context_tier"core" | "task" | "background"Relevance tier for context budgeting. core = always include. task = include for specific tasks. background = include only when needed.
domain"hr" | "legal" | "it" | "sales" | "support" | "product" | "finance" | "ops" | "marketing"Business domain. Use for filtered retrieval.
language"de" | "en"Primary language of the chunk content.
pii_level"none" | "low" | "high"PII risk flag. none = no personal data. low = non-sensitive personal data present. high = sensitive personal data present.
summarystring1–3 sentence summary of the chunk in the same language as the content.
keywordsstring[] (3–7)Key terms and phrases for keyword-based retrieval.
token_countnumberEstimated token count of the chunk content. Use for context window budgeting.
governance_level"green" | "yellow" | "red"optionalAccess control classification. green = public. yellow = internal. red = restricted. Defaults to green if omitted.
questions_answeredstring[] (2–5)optionalQuestions this chunk answers. Useful as retrieval hints for question-answering systems.

ZIP contents

output.zip
├── summary.md          # human-readable conversion summary
├── index.yaml          # document index — all chunks listed with metadata
├── chunk_001-2.md
├── chunk_002-4.md
└── ...

PDF files include page-range suffixes (chunk_001-2.md = ends on page 2). DOCX files use plain names (chunk_001.md) — python-docx does not provide page boundaries.

08

Webhook

Pass a webhook_url when uploading to receive a POST notification on completion.

Pro feature — Webhooks are available on Pro and Scale plans. distil makes 2 POST attempts (10 s timeout each) before giving up. Always implement polling as a fallback.

When it fires

On success, partial, and failed completion. The error_message field is only present on failed; download_url is only present on success / partial.

Payload

// POST to your webhook_url
// Content-Type: application/json
// X-Distil-Signature: sha256=<hmac>

{
  "job_id":        "3f7a1c2d-...",
  "status":        "success",     // "success" | "partial" | "failed"
  "completed_at":  "2026-04-04T08:00:00.000Z",
  "download_url":  "https://...",  // success/partial only
  "total_chunks":  42,
  "total_pages":   14,
  "error_message": "..."          // failed only
}

Signature verification

Every request carries an X-Distil-Signature header. The value is sha256=<HMAC-SHA256(raw_body, secret)>. Verify it to reject spoofed payloads.

// Node.js — Express example
const crypto = require('crypto')

app.post('/hook', express.raw({ type: 'application/json' }), (req, res) => {
  const sig  = req.headers['x-distil-signature'] ?? ''
  const mac  = 'sha256=' + crypto
    .createHmac('sha256', process.env.DISTIL_WEBHOOK_SECRET)
    .update(req.body)
    .digest('hex')

  if (!crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(mac))) {
    return res.status(401).send('Invalid signature')
  }
  const payload = JSON.parse(req.body)
  // handle payload ...
  res.sendStatus(200)
})
# Python — Flask example
import hmac, hashlib, os

@app.route('/hook', methods=['POST'])
def webhook():
    sig     = request.headers.get('X-Distil-Signature', '')
    secret  = os.environ['DISTIL_WEBHOOK_SECRET'].encode()
    mac     = 'sha256=' + hmac.new(secret, request.data, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(sig, mac):
        abort(401)
    payload = request.get_json(force=True)
    # handle payload ...
    return '', 200

09

Usage Quotas

Quotas are measured in pages per month and reset at the start of each calendar month. Exceeding your quota returns a 402 response.

PlanPages / monthOutput retentionBulk uploadPrice
Free5024 hours€0
Pro1,0007 daysUp to 5 files€29 / month
Scale10,00030 daysUp to 5 files€199 / month

Upgrade your plan in the Dashboard → Billing.