File formats for AI upload
Wisteria’s AI upload feature reads documents and proposes courses. This page lists what formats work, what doesn’t, and what to do if your source is in an unsupported format.
Supported formats
| Format | Extension | Extraction quality |
|---|---|---|
| Adobe PDF | .pdf | Good for text-heavy PDFs; struggles with scanned PDFs (no OCR) |
| Microsoft PowerPoint 97-2003 | .ppt | Good for text content; ignores most styling |
| Microsoft PowerPoint 2007+ | .pptx | Good for text content; ignores most styling |
Maximum file size: 25 MB per upload.
Why these specifically
Wisteria uses officeparser for PPT/PPTX extraction and built-in PDF parsing for PDFs. Both libraries are mature and reliable for text-heavy content.
Adding more formats means choosing extraction libraries with varying quality. We’ve prioritised quality of extraction over breadth of supported formats.
What’s NOT supported
| Format | Workaround |
|---|---|
Word (.docx, .doc) | Export to PDF first |
Apple Keynote (.key) | Export to PowerPoint (.pptx) from Keynote |
| Google Slides | File → Download → PowerPoint (.pptx) |
| Google Docs | File → Download → PDF |
Markdown (.md) | Convert to PDF (most editors can) |
| HTML pages | Save as PDF |
Plain text (.txt) | Paste into a Doc, save as PDF |
Images (.png, .jpg) | OCR not supported; no path through AI upload. Manual authoring only. |
| Audio / video | Not supported. |
| Lark Docs (native) | Currently supported by the scanner, not by direct AI upload. Export to PDF for now. |
Quality of extraction by format
PDFs
- Text-rich PDFs (like SOPs, manuals) — high quality. Most text is captured cleanly.
- PDFs with mostly images / diagrams — low quality. Wisteria can’t see images; if your document’s value is in the visuals, the AI proposal will be thin.
- Scanned PDFs (a photo of paper, converted to PDF) — no text extraction at all. Wisteria sees an empty document. Use OCR software first to convert to a text-searchable PDF.
- Right-to-left languages (Arabic, Hebrew) — extraction works but reading order may be jumbled.
- Tables — text inside table cells is captured; the table structure is sometimes preserved, sometimes flattened.
PowerPoint
- Title + bullet slides — high quality.
- Speaker notes — captured.
- Tables on slides — text captured; layout often flattened.
- Images / charts — not captured. AI proposal won’t reflect them.
- Animations, transitions — ignored.
Improving extraction quality
If the AI proposal is thinner than you expected:
1. Convert images to text
If your source has critical content in diagrams, manually transcribe the diagrams into text (in a Word doc, export to PDF) and upload that.
2. Combine multiple files
Wisteria handles one file per upload. If your training content spans 5 PDFs, either:
- Combine them into one PDF first (many tools do this)
- Upload separately and create one course per file, then merge in the editor
3. Use the source for inspiration, not as the only input
The AI proposal is a starting point. After the proposal, use AI Write with the source as context, but write some cards manually too. The best courses are a mix.
Common errors
”Unsupported file format”
The file extension isn’t .pdf, .ppt, or .pptx. Convert to one of these.
”File too large”
Over 25 MB. Compress (Adobe and most PDF tools have a “reduce file size” option) or split into smaller files.
”Could not extract text”
The file is corrupt, password-protected, or empty. Try opening the file in another tool to confirm it has content; remove any password protection.
”Upload timed out”
Network blip during upload. Retry. If consistent, try a smaller file first to test the connection.
”AI couldn’t propose a course”
Claude returned without a usable proposal. Causes:
- The extracted text is too short (< 200 words).
- The content is genuinely not training-relevant (e.g. a financial report).
- A transient AI provider issue.
Retry. If repeated, try manually authoring or use a different source file.
What the AI does with the file
When you upload:
- The file is stored in Supabase Storage (private bucket).
- Text is extracted server-side using the format-specific library.
- The extracted text + your AI Training Profile is sent to Claude.
- Claude proposes the course structure.
- The file STAYS in storage even after course creation — useful for re-generation later.
The source file is associated with every module in the resulting course. From the course detail page, click “Source file” to download it again.
File size optimisation tips
If you have a 30 MB PDF you need to upload:
- Adobe Acrobat — File → Compress PDF
- macOS Preview — File → Export → Reduce File Size
- Online tools (be careful with sensitive content) —
smallpdf.com,ilovepdf.com
For PowerPoint:
- Compress images — File → Compress Pictures
- Remove unused slide layouts
- Save as
.pptxinstead of.ppt— typically smaller
What if your source is sensitive
When you upload a file, the extracted text is sent to Anthropic’s API. Anthropic doesn’t retain API data for training, but the data does leave your tenant for processing.
If your file contains content you can’t share with an external AI provider, don’t use AI upload. Use manual course authoring instead, or wait until BYOK ships and you can use your own AI account.
For very sensitive content, contact us about a private deployment option (roadmap, available on request).