Upload Your First Document

Uploading a document creates a Source, the platform’s representation of your file. After upload, an asynchronous extraction pipeline converts the file into structured page texts. You’ll poll for status and then retrieve the extracted content.

Prerequisites:

Authentication configured (see Authentication guide)

Upload a file

Send a multipart form-data request with your file. The server starts extraction automatically and returns the source metadata.Endpoint: POST /api/sources/upload

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/api/sources/upload",
        headers={"apikey": API_KEY, "Authorization": f"Bearer {API_KEY}"},
        files={"file": ("document.pdf", f, "application/pdf")},
    )
source = response.json()
source_id = source["id"]
print(f"Source created: {source_id}")

const formData = new FormData();
formData.append("file", fileBlob, "document.pdf");

const response = await fetch(`${BASE_URL}/api/sources/upload`, {
  method: "POST",
  headers: { apikey: API_KEY, Authorization: `Bearer ${API_KEY}` },
  body: formData,
});
const source = await response.json();
console.log("Source created:", source.id);

curl -X POST '{BASE_URL}/api/sources/upload' \
  -H "apikey: {API_KEY}" \
  -H "Authorization: Bearer {API_KEY}" \
  -F "file=@document.pdf"

Response:

{
  "id": "source-uuid",
  "name": "document.pdf",
  "file_type": "application/pdf",
  "storage_path": "sources-{org}-{project}/{source_id}/document.pdf",
  "extraction_status": "pending",
  "task_id": "celery-task-uuid"
}

Check extraction status

Poll the source until extraction_status reaches a terminal state: extracted (success), attention_required (partial: some pages failed but the source is still indexable), failed, or cancelled. Small documents typically take a few seconds.Endpoint: GET /api/sources/{id}

import time

TERMINAL = {"extracted", "attention_required", "failed", "cancelled"}
while True:
    response = requests.get(
        f"{BASE_URL}/api/sources/{source_id}",
        headers=headers,
    )
    body = response.json()
    status = body["extraction_status"]
    if status == "extracted":
        print("Extraction complete!")
        break
    elif status == "attention_required":
        print("Extraction partial:", body.get("error_message"))
        break
    elif status in ("failed", "cancelled"):
        print(f"Extraction {status}:", body.get("error_message"))
        break
    time.sleep(2)

const TERMINAL = new Set(["extracted", "attention_required", "failed", "cancelled"]);
let status = "pending";
while (!TERMINAL.has(status)) {
  await new Promise((r) => setTimeout(r, 2000));
  const res = await fetch(`${BASE_URL}/api/sources/${sourceId}`, { headers });
  const data = await res.json();
  status = data.extraction_status;
}
console.log("Extraction ended with status:", status);

curl '{BASE_URL}/api/sources/{source_id}' \
  -H "apikey: {API_KEY}" \
  -H "Authorization: Bearer {API_KEY}"

View extracted content

Retrieve the extracted text. GET /page-texts returns { "page_texts": [string, ...], "count": N }, where page_texts is an array of strings, one per page, in order. To fetch a single page, pass ?page=N, which returns { "text": string, "page": N, "count": N }.Endpoint: GET /api/sources/{id}/page-texts

response = requests.get(
    f"{BASE_URL}/api/sources/{source_id}/page-texts",
    headers=headers,
)
body = response.json()
for i, text in enumerate(body["page_texts"], start=1):
    print(f"Page {i}: {text[:100]}...")

const res = await fetch(
  `${BASE_URL}/api/sources/${sourceId}/page-texts`,
  { headers },
);
const body = await res.json();
body.page_texts.forEach((text: string, i: number) =>
  console.log(`Page ${i + 1}: ${text.slice(0, 100)}...`),
);

curl '{BASE_URL}/api/sources/{source_id}/page-texts' \
  -H "apikey: {API_KEY}" \
  -H "Authorization: Bearer {API_KEY}"

What’s Next

Create a Knowledge Base

Index your extracted content for semantic search.

Sources & Extraction

Understand the extraction pipeline in depth.

Sources API Reference

Full endpoint documentation.

Getting Started

Concepts

Guides

API Reference

Upload Your First Document

What’s Next

Create a Knowledge Base

Sources & Extraction

Sources API Reference

​What’s Next

Create a Knowledge Base

Sources & Extraction

Sources API Reference

What’s Next