Why the scanner skipped a file

The AI ambient watcher doesn’t surface every file in your tenant — that would be noisy. It filters by several criteria. If a file you wanted to see as a suggestion didn’t appear, work through these checks.

1. The file type isn’t supported

The scanner only processes formats it can extract text from:

✅ PDF (.pdf)
✅ PowerPoint (.pptx, .ppt)
✅ Word (.docx, .doc) — recently added
✅ Lark Docs (via raw_content endpoint)
❌ Google Docs (native format) — not in the current scan path
❌ Excel / Google Sheets — not yet
❌ Images, videos, audio — not in roadmap
❌ Markdown, plain text — not currently scanned

Workaround: Export the file to PDF and upload manually via AI upload.

2. The file is too old

The scanner skips files older than a recency threshold (currently 6 months). The goal is to surface content that’s actively being used, not legacy.

Workaround: Open the file, make a small edit (even just to whitespace), save. The new modification time brings it into the scanner’s window.

3. The file is in a location the impersonated user can’t see

For Google Workspace and Lark, the scanner operates as one tenant user (the impersonated super-admin you specified at setup). It only sees files that user can access.

For Microsoft 365, the scanner uses tenant-level access (Graph API with admin consent), so it sees more — but private files locked at the file level can still be filtered out.

Workaround:

Share the file with the impersonated user (if Google / Lark).
For Microsoft 365 — check the file’s sharing settings in OneDrive/SharePoint.

4. The file passed the type/recency filter but was rejected by Claude

Claude evaluates each candidate file for training relevance based on your AI Training Profile. If the file scored below the confidence threshold, it’s skipped.

Common reasons Claude rejects:

Content doesn’t match the AI Training Profile — the file is about a topic outside your declared training categories.
Content looks too administrative — invoices, financial reports, board minutes are usually filtered out (and your AI Training Profile probably says to exclude them).
Content is too short — under ~200 words; Claude can’t make a confident judgement.
Content is duplicate — Wisteria has seen the same file (or a very similar one) before and already surfaced or dismissed it.

Workaround:

Refine the AI Training Profile. Add the topic to “What kinds of training matter most?” or remove it from “What’s NOT training-relevant?”.
Manually AI upload the file. Bypass the scanner; upload directly to create a course.
Run a manual scan — sometimes the next scan catches what the last one missed, especially after profile changes.

5. The file was previously dismissed or marked “Not training”

If anyone in your workspace dismissed a “Wisteria noticed” card for this file previously, subsequent scans don’t resurface it. If marked “Not training,” the AI evaluator actively filters similar files in future scans too.

Workaround: Contact support; we can re-surface a specific file if you want it back.

6. The per-scan cap

Each scan has a cap on number of files evaluated per run (currently ~25 candidate files per scan). This protects against runaway costs and keeps scan times bounded.

If your tenant has many recently-modified files, the cap may have hit before reaching the one you cared about.

Workaround:

The next scan picks up where the last one left off, so the file will be evaluated in a subsequent scan.
Manually trigger a scan from /admin/courses → 🔍 Run scan now.
If you have a specific file in mind, manually AI upload it.

7. The file is in a SharePoint site / shared drive the scanner doesn’t traverse

Some shared storage locations are restricted by default. For Microsoft 365, the scanner traverses:

Each user’s OneDrive (root + subfolders)
SharePoint sites the tenant exposes

For Google Workspace, the scanner traverses:

The impersonated user’s My Drive
Shared Drives the impersonated user can see

If your file is in a shared drive that’s not accessible to the impersonated user, it’s not in the scanner’s reach.

Workaround:

Move the file to an accessible location.
Grant the impersonated user access.
Manually AI upload.

8. The scanner hit a provider rate limit

Microsoft Graph, Google API, and Lark all rate-limit API calls. If your tenant has many users or files, a scan might hit a rate limit before processing everything.

The scanner respects rate limits gracefully (backs off, retries). Some files might be skipped in the current run but caught in the next.

Workaround: Wait for the next scheduled scan. Or manually re-trigger and skip ahead.

9. The integration is in an error state

If the integration’s credentials have rotated, expired, or been revoked, no scan can happen at all.

Check:

Settings → Integrations — look for the status pill on each card. “Needs re-verification” means there’s an auth issue.
Re-verify — click the card and re-run setup.

10. The scheduled scan didn’t run

The nightly scan runs via Vercel cron at midnight UTC. If a cron run failed (rare but possible), no scan happens that night.

Workaround: Manually trigger a scan from /admin/courses → 🔍 Run scan now.

Diagnosing: what to do

If you can’t figure out why a specific file wasn’t surfaced:

Check the basics — is it a supported file type? Modified recently?
Run a manual scan — sometimes timing alone is the cause.
Adjust the AI Training Profile if the topic might be falling outside the declared scope.
Email support@getwisteria.com with the file name (or share link), workspace, and integration provider. We can run a diagnostic scan that traces what happened to that specific file.

When all else fails: manual upload

If a file isn’t surfacing despite multiple scan attempts, just manually AI upload it via /admin/courses → +New course → AI upload. Wisteria treats it identically to a scanner-surfaced file once you confirm the proposal.

The scanner is for ambient discovery — it surfaces content you didn’t know to look for. For content you know about, manual upload is faster.