crowd-source-faq

Public Guest FAQ Page — Architecture Plan

Standalone, no-auth public discovery page at /explore that reads from the existing FAQ DB and ships its own anonymous analytics.


1. Goals & non-goals

Goals

Non-goals


2. Database schema changes

2.1 New fields on FAQ (additive, non-breaking)

popularityScore     : Number  // recomputed every 5 min by background job
guestViewCount     : Number  // anonymous view count (separate from auth views)
avgReadCompletion  : Number  // 0..1 — mean scroll depth of guests
avgTimeSpentRatio  : Number  // 0..1 — actual time / expected reading time
guestViewLast24h   : Number  // rolling counter, drives "trending"
wordCount          : Number  // cached word count of (question + answer)
expectedReadMs     : Number  // 200 wpm × wordCount (cached for fast scoring)
popularityUpdatedAt: Date    // last score recompute

Indexes added:

2.2 New collection GuestEvent (raw event buffer)

{
  faqId      : ObjectId,
  guestId    : String,    // random UUID stored in httpOnly cookie
  sessionId  : String,    // per-tab session for view-dedup
  type       : 'view' | 'read' | 'completion' | 'scroll',
  dwellMs    : Number,    // for read events
  scrollPct  : Number,    // 0..1
  faqLength  : Number,    // word count, snapshotted at event time
  createdAt  : Date
}

Indexes:

2.3 No changes to existing collections

The FAQ.views field stays as the authenticated-user view counter. The new guestViewCount is the anonymous equivalent — we do not merge them, so admin analytics remain meaningful.


3. Popularity scoring algorithm

Implemented in backend/utils/popularityScore.ts. Pure function over aggregated metrics, runs every 5 min in a background job, not on read path.

popularity_score = (view_weight    * log1p(guestViewCount))
                 + (recency_weight * recencyBoost)
                 + (engagement_weight * (avgReadCompletion * avgTimeSpentRatio))
                 + (trust_weight   * trustBoost)

Component details (all normalised to 0..1):

Component Formula Default weight
view_weight min(1, log10(1 + guestViewCount) / log10(500)) 0.40
recency_weight exp(-ageDays / 30) (half-life ~21 days) 0.20
engagement_weight 0.5 * avgReadCompletion + 0.5 * avgTimeSpentRatio 0.30
trust_weight expert=1.0, high=0.8, medium=0.5, low=0.2 0.10

Why this is robust to gaming:

Re-compute trigger: every 5 min via setInterval (reuses the existing runRetention-style pattern in server.ts).

Re-compute query: single Mongo aggregation pipeline over FAQ collection using $set with arithmetic expressions on the cached metric fields — O(N) once per 5 min, no per-request work on the hot path.


4. API design — /api/public/*

All routes public, no auth, no PII. Soft rate limit per IP (200 req/min) on read endpoints, tighter (60 req/min) on tracking endpoints to absorb analytics write amplification.

Method Path Purpose
GET /api/public/popular-faqs Top N by popularityScore
GET /api/public/recent-faqs Newest N by createdAt
GET /api/public/categories All categories with counts (and per-category top 3)
GET /api/public/search?q=&category= Text search, optional category filter
GET /api/public/faqs/:id Single FAQ (anonymous-safe view)
POST /api/public/track-view Page open event (idempotent within 30 min/guest)
POST /api/public/track-reading Dwell time + scroll depth + completion event

Tracking payloads — PII-free by construction:

// POST /api/public/track-view
{ "faqId": "6655...", "sessionId": "tab-uuid" }

// POST /api/public/track-reading
{ "faqId": "6655...", "sessionId": "tab-uuid", "dwellMs": 18200,
  "scrollPct": 0.83, "faqLength": 240 }

guestId comes from an httpOnly, SameSite=Lax cookie set on first public-page hit. The server never reads IP for analytics, never stores User-Agent beyond what’s already in standard request logs.

Response shape (popular):

{
  "faqs": [
    { "_id": "...", "question": "...", "answer": "...",
      "category": "...", "tags": [...], "createdAt": "...",
      "popularityScore": 12.4, "guestViewCount": 217,
      "avgReadCompletion": 0.72, "wordCount": 240 }
  ],
  "generatedAt": "2026-06-10T..."
}

Caching: popular and recent endpoints are wrapped in an in-process LRU (5-min TTL) and an optional Redis cache (already wired via utils/cache.ts). Categories and search are not cached (filter dimensions are too varied for hit-rate).


5. Tracking strategy

Two client-side hooks, both fire-and-forget with navigator.sendBeacon on pagehide to survive tab close:

  1. useViewTracker(faqId) — fires on mount, idempotent per (guestId, faqId, 30-min-bucket). Server checks GuestEvent for a recent view event with the same key and skips the increment.

  2. useReadingTracker(faqId) — observes scroll depth every 250 ms (rAF-throttled), tracks total dwell from mount, computes completionPct = maxScrollY / articleHeight. On pagehide or visibilitychange→hidden it POSTs one read event with { dwellMs, scrollPct, faqLength }.

Both endpoints accept the event but never echo back per-user data and never require identification. Events buffer to GuestEvent and are folded into FAQ metrics by the next aggregation tick.


6. Frontend architecture

frontend/src/
├── pages/
│   └── ExplorePage.tsx              // route: /explore
└── components/
    └── explore/
        ├── ExploreHero.tsx          // title + search + tags
        ├── ExploreSearchBar.tsx     // debounced, highlights
        ├── PopularFaqsCard.tsx      // "Most Popular" column
        ├── RecentFaqsCard.tsx       // "Recent FAQs" column
        ├── CategoriesCard.tsx       // "Browse Categories" column
        ├── FaqListItem.tsx          // shared numbered row
        ├── CategoryAccordion.tsx    // collapsible section
        ├── CategoryFaqList.tsx      // list inside accordion
        ├── ReadingTracker.tsx       // mount-time scroll/dwell observer
        ├── ExploreSkeleton.tsx      // loading placeholders
        ├── ExploreEmpty.tsx         // empty states
        ├── highlightMatch.ts        // <mark> for search results
        └── usePublicFaqApi.ts       // cached fetcher with abort

Matching the existing FAQ Hive aesthetic:

State / data flow:


7. Security & abuse prevention

Concern Mitigation
View count inflation 30-min dedup per (guestId, faqId) in track-view
Read-event spam 60 req/min/IP rate limit on /track-*
Search abuse 30 req/min/IP rate limit on /search
DDoS amplification Helmet + global IP limiter (already in server.ts)
PII collection No IP, no UA, no fingerprint stored — only UUID cookie
XSS via answer body sanitizeHtml already applied to FAQ.answer (admin write path)
Cookie scope httpOnly, SameSite=Lax, no Secure in dev
Cookie expiry 90 days, sliding
Score manipulation Score uses log + engagement, not raw views

8. Performance & scalability

Read path (popular / recent / categories / search):

Write path (tracking events):

Background aggregation (every 5 min):

Scale targets (designed for):


9. Folder structure (final)

backend/
├── models/
│   ├── FAQ.ts                        (extended)
│   └── GuestEvent.ts                 (new)
├── utils/
│   └── popularityScore.ts            (new)
├── controllers/
│   └── publicFaqController.ts        (new)
├── routes/
│   └── publicFaq.ts                  (new)
└── server.ts                         (mount + scheduler)

frontend/src/
├── pages/
│   └── ExplorePage.tsx               (new)
├── components/explore/               (new dir, 11 files)
└── App.tsx                           (add /explore route)

10. Rollout sequence

  1. Backend models + utils.
  2. Backend controller + routes.
  3. Mount + scheduler.
  4. Smoke-test all 6 endpoints with curl.
  5. Frontend ExplorePage + components.
  6. Add route, build, type-check.
  7. Verify in browser via Playwright if available.