Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.powabase.ai/llms.txt

Use this file to discover all available pages before exploring further.

Sources represent uploaded documents in the platform. Each source goes through an asynchronous extraction pipeline that converts files into structured derivatives (page texts, markdown, per-page images). Sources are the raw material for knowledge bases — once extracted, their content can be chunked and indexed for semantic search.

Common Patterns

The typical flow is: upload a file (POST /api/sources/upload), poll for completion (GET /api/sources/{id} until extraction_status is ‘extracted’ or ‘attention_required’), then retrieve extracted text (GET /api/sources/{id}/page-texts). For files already in project storage, use import-from-storage. For web pages, use import-url. To swap extraction backends after the fact, POST /api/sources/{id}/reextract with a new extraction_model.

GET /api/sources

List all sources with optional status filter.
status
string
Filter by extraction_status. One of: pending, extracting, extracted, attention_required, failed, cancelled.
response = requests.get(f"{BASE_URL}/api/sources", headers=headers)
print(response.json())

POST /api/sources/upload

Upload a file for extraction. Accepts PDF, DOCX, PPTX, XLSX, images (PNG/JPG/WebP/GIF/TIFF), and plain text. Uses multipart/form-data. Optional fields: name (display name), metadata (JSON string, preserved through indexing), extraction_model (PDF only — one of auto, mistral, paddleocr, lighton, opendataloader, fitz, pdfplumber).
with open("file.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/api/sources/upload",
        headers={"apikey": API_KEY, "Authorization": f"Bearer {API_KEY}"},
        files={"file": ("file.pdf", f, "application/pdf")},
        data={"extraction_model": "mistral"},
    )

POST /api/sources/import-from-storage

Import a file already in project storage as a source.
{
  "bucket": "documents",
  "path": "reports/q4.pdf",
  "name": "Q4 Report"
}
response = requests.post(
    f"{BASE_URL}/api/sources/import-from-storage",
    headers=headers,
    json={"bucket": "documents", "path": "reports/q4.pdf"},
)

POST /api/sources/import-url

Import content from web URLs. mode=‘urls’ imports a fixed list, mode=‘crawl’ spiders from a seed URL, mode=‘sitemap’ parses a sitemap XML. Requires Firecrawl API key to be configured in project settings.
{
  "mode": "urls",
  "urls": ["https://example.com/page1", "https://example.com/page2"],
  "max_pages": 50
}
response = requests.post(
    f"{BASE_URL}/api/sources/import-url",
    headers=headers,
    json={"mode": "urls", "urls": ["https://example.com/page1"]},
)

GET /api/sources/

Get source details including extraction status.
id
string
required
Source ID
response = requests.get(f"{BASE_URL}/api/sources/{source_id}", headers=headers)

GET /api/sources//page-texts

Get extracted text content organized by page.
id
string
required
Source ID
page
integer
Specific page number
response = requests.get(f"{BASE_URL}/api/sources/{source_id}/page-texts", headers=headers)

PATCH /api/sources/

Update a source’s display name or metadata.
id
string
required
Source ID
{
  "name": "New Display Name",
  "metadata": { "author": "alice" }
}
response = requests.patch(f"{BASE_URL}/api/sources/{source_id}", headers=headers, json={"name": "New Display Name"})

POST /api/sources//reextract

Re-run extraction on an existing source, optionally with a different extraction_model.
id
string
required
Source ID
{
  "extraction_model": "paddleocr"
}
response = requests.post(f"{BASE_URL}/api/sources/{source_id}/reextract", headers=headers, json={"extraction_model": "paddleocr"})

POST /api/sources//cancel

Cancel an in-progress extraction. Sets extraction_status to ‘cancelled’.
id
string
required
Source ID
response = requests.post(f"{BASE_URL}/api/sources/{source_id}/cancel", headers=headers)

GET /api/sources//download

Download the original uploaded file (as stored in project storage).
id
string
required
Source ID
response = requests.get(f"{BASE_URL}/api/sources/{source_id}/download", headers=headers)
open("source.pdf", "wb").write(response.content)

GET /api/sources//derivatives//download

Download a derivative artifact. type is one of: markdown, text, page_text, image. For per-page types (page_text, image) pass index=N (0-based) in the query string.
id
string
required
Source ID
type
string
required
Derivative type: markdown, text, page_text, or image
index
integer
0-based index for per-page derivatives (page_text, image)
response = requests.get(f"{BASE_URL}/api/sources/{source_id}/derivatives/markdown/download", headers=headers)
print(response.text)

DELETE /api/sources/

Delete a source and its associated storage files (original + derivatives).
id
string
required
Source ID
response = requests.delete(f"{BASE_URL}/api/sources/{source_id}", headers=headers)

Error Responses

StatusCodeDescription
400invalid_fileThe uploaded file type is not supported or the file is corrupted
404source_not_foundNo source exists with the given ID
413file_too_largeThe uploaded file exceeds the maximum allowed size