File formats for AI upload

Wisteria’s AI upload feature reads documents and proposes courses. This page lists what formats work, what doesn’t, and what to do if your source is in an unsupported format.

Supported formats

Format	Extension	Extraction quality
Adobe PDF	`.pdf`	Good for text-heavy PDFs; struggles with scanned PDFs (no OCR)
Microsoft PowerPoint 97-2003	`.ppt`	Good for text content; ignores most styling
Microsoft PowerPoint 2007+	`.pptx`	Good for text content; ignores most styling

Maximum file size: 25 MB per upload.

Why these specifically

Wisteria uses officeparser for PPT/PPTX extraction and built-in PDF parsing for PDFs. Both libraries are mature and reliable for text-heavy content.

Adding more formats means choosing extraction libraries with varying quality. We’ve prioritised quality of extraction over breadth of supported formats.

What’s NOT supported

Format	Workaround
Word (`.docx`, `.doc`)	Export to PDF first
Apple Keynote (`.key`)	Export to PowerPoint (.pptx) from Keynote
Google Slides	File → Download → PowerPoint (.pptx)
Google Docs	File → Download → PDF
Markdown (`.md`)	Convert to PDF (most editors can)
HTML pages	Save as PDF
Plain text (`.txt`)	Paste into a Doc, save as PDF
Images (`.png`, `.jpg`)	OCR not supported; no path through AI upload. Manual authoring only.
Audio / video	Not supported.
Lark Docs (native)	Currently supported by the scanner, not by direct AI upload. Export to PDF for now.

Quality of extraction by format

PDFs

Text-rich PDFs (like SOPs, manuals) — high quality. Most text is captured cleanly.
PDFs with mostly images / diagrams — low quality. Wisteria can’t see images; if your document’s value is in the visuals, the AI proposal will be thin.
Scanned PDFs (a photo of paper, converted to PDF) — no text extraction at all. Wisteria sees an empty document. Use OCR software first to convert to a text-searchable PDF.
Right-to-left languages (Arabic, Hebrew) — extraction works but reading order may be jumbled.
Tables — text inside table cells is captured; the table structure is sometimes preserved, sometimes flattened.

PowerPoint

Title + bullet slides — high quality.
Speaker notes — captured.
Tables on slides — text captured; layout often flattened.
Images / charts — not captured. AI proposal won’t reflect them.
Animations, transitions — ignored.

Improving extraction quality

If the AI proposal is thinner than you expected:

1. Convert images to text

If your source has critical content in diagrams, manually transcribe the diagrams into text (in a Word doc, export to PDF) and upload that.

2. Combine multiple files

Wisteria handles one file per upload. If your training content spans 5 PDFs, either:

Combine them into one PDF first (many tools do this)
Upload separately and create one course per file, then merge in the editor

3. Use the source for inspiration, not as the only input

The AI proposal is a starting point. After the proposal, use AI Write with the source as context, but write some cards manually too. The best courses are a mix.

Common errors

”Unsupported file format”

The file extension isn’t .pdf, .ppt, or .pptx. Convert to one of these.

”File too large”

Over 25 MB. Compress (Adobe and most PDF tools have a “reduce file size” option) or split into smaller files.

”Could not extract text”

The file is corrupt, password-protected, or empty. Try opening the file in another tool to confirm it has content; remove any password protection.

”Upload timed out”

Network blip during upload. Retry. If consistent, try a smaller file first to test the connection.

”AI couldn’t propose a course”

Claude returned without a usable proposal. Causes:

The extracted text is too short (< 200 words).
The content is genuinely not training-relevant (e.g. a financial report).
A transient AI provider issue.

Retry. If repeated, try manually authoring or use a different source file.

What the AI does with the file

When you upload:

The file is stored in Supabase Storage (private bucket).
Text is extracted server-side using the format-specific library.
The extracted text + your AI Training Profile is sent to Claude.
Claude proposes the course structure.
The file STAYS in storage even after course creation — useful for re-generation later.

The source file is associated with every module in the resulting course. From the course detail page, click “Source file” to download it again.

File size optimisation tips

If you have a 30 MB PDF you need to upload:

Adobe Acrobat — File → Compress PDF
macOS Preview — File → Export → Reduce File Size
Online tools (be careful with sensitive content) — smallpdf.com, ilovepdf.com

For PowerPoint:

Compress images — File → Compress Pictures
Remove unused slide layouts
Save as .pptx instead of .ppt — typically smaller

What if your source is sensitive

When you upload a file, the extracted text is sent to Anthropic’s API. Anthropic doesn’t retain API data for training, but the data does leave your tenant for processing.

If your file contains content you can’t share with an external AI provider, don’t use AI upload. Use manual course authoring instead, or configure BYOK in Settings → API & Keys to route AI calls through your own provider account.

For very sensitive content, contact us about a private deployment option (roadmap, available on request).