Content Moderation
File content written viawrite_file, create_file, and staged uploads (upload_session action: “complete”) is asynchronously scanned for harmful content using Cloudflare Workers AI (@cf/meta/llama-guard-3-8b).
How it works: After a successful write, content is enqueued for background moderation. The scan runs asynchronously — writes are never blocked or delayed by moderation.
What agents see: Moderation is transparent — read_file only includes moderation metadata when content is flagged. Pending and approved statuses are internal bookkeeping and are never surfaced. When flagged, the response includes moderationStatus: "flagged" and a moderationCategory indicating the hazard type.
What happens when content is flagged: Flagged content is not blocked from reads — it remains accessible. Flagged files cannot be shared via share_with_public. To dispute a flag, contact support@undisk.app.
Hazard categories: violent_crimes, non_violent_crimes, sex_related_crimes, child_sexual_exploitation, defamation, specialized_advice, privacy, intellectual_property, indiscriminate_weapons, hate, suicide_self_harm, sexual_content.
Deduplication: Moderation is keyed by content SHA-256 hash. Identical content is only scanned once regardless of how many files reference it.
Note:append_logandrestore_versiondo not trigger moderation. Content moderation is optional — it degrades gracefully if the Workers AI binding is not configured.
Recommended agent behavior on flag
- Stop share/export actions for the flagged file.
- Record the event in audit context (for example with
append_log). - Surface
moderationCategoryto a human reviewer. - Continue unrelated non-flagged tasks to avoid full pipeline stalls.