User-Generated Content Safety Policy
Effective: Conflab v0.1.12+
This document defines Conflab's posture on user-generated content: what we accept, how we moderate it, and what recourse exists for users and operators.
UGC Surfaces
Conflab accepts user-generated content on these surfaces:
| Surface | Resource | Constraint | Since |
|---|---|---|---|
| Reviews | Review |
One per user per entry | ST0067/WP-08 |
| Ratings | Rating |
One per user per entry (1--5) | ST0067/WP-08 |
| Published lenses and shapes | Entry + EntryContent |
Via publish flow | ST0067/WP-11 |
| Flags | Flag |
One per user per target | ST0072/WP-02 |
Future surfaces (threaded comments, reply-to-review, review-on-theme) will follow the same moderation primitives.
Threat Model
T1: Review Spam
Scenario: A user posts promotional or nonsense reviews on popular entries to drive traffic or degrade quality.
Mitigations: One-review-per-entry identity constraint (exists). Per-user daily review creation quota of 20/day (WP-04). Community flagging with threshold-based hiding (WP-02). Admin moderation queue (WP-05).
T2: Review Bombing
Scenario: Coordinated users post many negative reviews on a single entry to suppress its rating.
Mitigations: One-review-per-entry constraint limits each attacker to one review. Flag system lets legitimate users flag abusive reviews (WP-02). Admin can unflag incorrectly suppressed reviews. Aggregate rating is decoupled from review text, so even hidden reviews retain their rating contribution -- admin can bulk-delete ratings from flagged accounts if needed.
T3: Coordinated Flag Abuse
Scenario: A group flags legitimate reviews or entries to silence them.
Mitigations: Admin moderation queue (WP-05) provides human oversight. Per-user daily flag quota of 50/day (WP-04). Flag counts are visible to admins, so patterns of coordinated flagging are detectable. Unflag action allows admins to clear flags. Future: automated flag-abuse detection (deferred).
T4: Spam Publishes
Scenario: A user publishes many low-quality or spam entries to pollute the catalog.
Mitigations: Publish quarantine (WP-03) -- untrusted users' publishes land in moderation_status: :pending, invisible in the public catalog until admin-approved. Per-user daily publish quota of 10/day (WP-04). Trusted users bypass quarantine (see Trusted User Criteria below).
T5: Malicious Lens Content
Scenario: A published lens contains Lua code designed to exfiltrate data, harvest credentials, or abuse the LLM context.
Mitigations: conflabd's Lua runtime sandbox constrains execution (no filesystem access, no network access, no system calls). Publish quarantine (WP-03) adds a human checkpoint before untrusted content enters the catalog. Content scanning at publish time is deferred (see Scope below).
T6: PII Leakage
Scenario: A user inadvertently includes personal information in a review body or lens content.
Mitigations: No automated PII detection in this phase. Admin moderation queue (WP-05) provides a manual checkpoint. The privacy policy covers user responsibility for content they submit.
Lifecycle States
Flag States (Reviews and Entries)
Flags use dynamic threshold semantics: a target is considered flagged when flag_count >= threshold. Unflagging (withdrawing a flag) decreases the count and can restore visibility. There is no sticky "flagged" state -- the system recalculates based on current flag count.
Rationale: Dynamic semantics are simpler, more forgiving of false flags, and avoid the need for admin intervention on every threshold crossing. The trade-off (a coordinated group could flag and unflag repeatedly) is acceptable at current scale and mitigated by per-user flag quotas.
flag_count < threshold-- visible in public readsflag_count >= threshold-- hidden from public reads, visible in admin moderation queue- Admin can override: explicitly approve (resets flags) or remove content
Moderation States (Published Entries)
moderation_status is a separate attribute from visibility, because they represent orthogonal concerns:
visibilityis the author's intent::private,:unlisted,:publicmoderation_statusis the platform's decision::approved,:pending,:rejected
An entry is publicly discoverable only when visibility == :public AND moderation_status == :approved.
| State | Meaning | Visible in catalog? |
|---|---|---|
:approved |
Passed moderation (or auto-approved for trusted users) | Yes (if visibility is :public) |
:pending |
Awaiting moderation review | No |
:rejected |
Rejected by moderator, with reason | No |
Seed/curated entries are created with moderation_status: :approved. User-published entries default to :pending unless the author is trusted.
Flag Reasons
Flags carry a reason enum to support triage in the moderation queue:
:spam-- promotional, off-platform advertising, SEO manipulation:abuse-- harassment, hate speech, threats, personal attacks:off_topic-- content unrelated to the entry being reviewed:malicious-- suspected malware, credential harvesting, data exfiltration:other-- anything not covered above (requiresnotetext)
Trusted User Criteria
A user is considered "trusted" for the purpose of publish auto-approval when any of the following are true:
- User has role
:adminor:superadmin - User has
verified_publisher: true(set by admin)
The verified_publisher flag is the primary trust signal. It is deliberately manual -- an admin must grant it. Automated trust escalation (N prior approved publishes, account age, etc.) is deferred until operational data justifies the thresholds.
Rationale: At launch scale, the number of publishers is small enough that manual verification is practical. Premature automation risks creating a trust ladder that attackers can climb.
Self-Flag Policy
Users can flag their own reviews. This serves as a self-retraction mechanism: a user who regrets a review can flag it rather than deleting it, which preserves the audit trail. Self-flagging an entry is also permitted for the same reason, though the more common path for entry authors is to set visibility to :private.
Rate Limits
Per-user creation quotas enforced at the service layer:
| Action | Limit | Window |
|---|---|---|
| Review creation | 20 | 24 hours |
| Flag creation | 50 | 24 hours |
| Lens/shape publish | 10 | 24 hours |
Rate limit hits return {:error, :rate_limited} and surface as a toast notification in the UI. Limits are configurable via application config. Admin and superadmin users are exempt.
Backend: Hammer (in-memory, ETS-backed). Chosen over Oban-backed counters because rate limiting is a hot path that benefits from in-memory speed, and the deployment target is single-node for the foreseeable future. Rate limit state is lost on restart, which is acceptable -- the worst case is a brief window of no enforcement after a deploy.
Notification Channel
Moderation state changes (approval, rejection) notify the publisher via in-app notification only in this phase. Email notifications are deferred until the platform has a transactional email pipeline.
Rejection notifications include the moderator's reason text.
Scope Exclusions
The following are explicitly out of scope for the initial UGC hardening and will be addressed in dedicated steel threads if operational data warrants:
- Automated content classification (LLM-based spam/abuse/PII detection)
- Community trust levels (Discourse-style new/member/regular/leader)
- Structured appeals flow for rejected content
- Image/attachment moderation
- Federated reputation or external trust signals
- Lua content analysis at publish time (conflabd sandbox is the safety layer)
- Automated flag-abuse detection