crowd-source-faq

Project Context — Crowd Source FAQ FAQ & Community Platform

Semantic search-powered FAQ and community Q&A platform targeting 10,00,000 users (10 lakh / 1 million). Project name: Crowd Source FAQ (also known internally as “yaksha-faq-portal”).


1. Overview

What it does: Resolves FAQs and manages community Q&A for internship students at scale. Users search for answers semantically; unanswered questions flow into a community board; admins moderate and respond.

Target scale: 10 lakh (1 million) registered users, high concurrent search load.

Current status: MVP complete. TypeScript migration done. Local embeddings (@xenova/transformers) working. 130 FAQs seeded. Admin features (role edit, notification persistence) implemented. Notification system live. Vercel-deploy skill created. Production: Atlas autoEmbed (Option B) pending. Runtime smoke test pending Atlas setup.


2. Tech Stack

Layer Technology Notes
Frontend React 18, Tailwind CSS, Vite SPA; no SSR; deployed on Vercel
Backend Node.js, Express.js ES modules; deployed on Vercel serverless
Database MongoDB Atlas M0 free tier at dev; M10+ for production
Search MongoDB Atlas Vector Search (cosine similarity) + MongoDB $text keyword search Hybrid merge via Reciprocal Rank Fusion (RRF_K=60)
Embeddings @xenova/transformers Xenova/multi-qa-mpnet-base-dot-v1 (768-dim, singleton pipeline) Local dev: no API key needed. Production: switch to Atlas autoEmbed (Option B)
Auth JWT (7d expiry) + bcrypt (salt factor 12) Passwords hashed pre-save via Mongoose pre-hook
Rate limiting express-rate-limit 300 req/15min general; 1000 req/15min admin routes
Security Helmet.js, CORS (whitelist + Vercel subdomain auto-allow)  

3. Project Structure

crowd-source-faq/
├── apps/
│   ├── backend/               # Express + TypeScript API
│   │   ├── src/
│   │   │   ├── bootstrap/     # App startup & routes registration
│   │   │   ├── config/        # Environment and DB loaders
│   │   │   ├── core/          # Shared core infrastructure
│   │   │   ├── integrations/  # Zoom, Discord, and Cloudinary integrations
│   │   │   ├── middleware/    # Auth, RBAC, and validation middlewares
│   │   │   ├── modules/       # Modular features (FAQ, Community, Search, Auth, etc.)
│   │   │   │   ├── admin/     # Admin routes and controller
│   │   │   │   ├── community/ # Community posts & comments controller and routes
│   │   │   │   ├── faq/       # FAQ logic, freshness & review queues
│   │   │   │   └── search/    # Hybrid semantic vector search controller and routes
│   │   │   ├── scripts/       # Seeding and migration scripts
│   │   │   └── utils/         # Common utility helpers
│   │   ├── Dockerfile
│   │   └── package.json
│   └── frontend/              # React + Vite SPA
│       ├── src/
│       │   ├── admin/         # Admin pages and sub-components
│       │   ├── components/    # Reusable UI & modular feature components (FAQ, community, search)
│       │   ├── context/       # Program, Auth, and Batch React contexts
│       │   ├── hooks/         # Custom React hooks (notifications, auth, etc.)
│       │   ├── pages/         # User pages (HomePage, FAQPage, CommunityPage, etc.)
│       │   ├── routes/        # App routing definitions and route guards
│       │   └── styles/        # Global styles & Tailwind tokens
│       ├── index.html
│       └── package.json
├── packages/                  # Workspace shared packages
│   ├── config/                # Shared typescript & workspace configs
│   ├── types/                 # Shared TypeScript models and API types
│   ├── utils/                 # Common shared javascript utilities
│   └── validation/            # Centralized Zod request schemas
├── docs/                      # Centralized documentation
│   ├── design/                # System design plans & blueprints
│   ├── reference/             # Database schemas & route lists
│   └── ARCHITECTURE.md        # Technical architecture document
├── run.sh                     # Full-stack developer runner
├── docker-compose.yml         # Local development Docker Compose
└── pnpm-workspace.yaml        # Workspace package layout

4. Features

4.1 Semantic Search (Core Feature)

4.2 FAQ System

4.3 Community Q&A Board

4.4 Admin Dashboard

4.5 Notifications (SpillTheTea)

4.6 Zoom Integration

4.7 AI Duplicate Detection

4.8 Analytics


5. Data Models

User

{ name, email, password (hashed, select:false), role: 'user'|'moderator'|'admin'|'ai_moderator' }
// Pre-save: bcrypt salt factor 12
// comparePassword() instance method

FAQ

{ question, answer, category, embedding (select:false), searchCount, views, helpfulVotes, unhelpfulVotes,
  status: 'pending'|'approved'|'rejected', createdBy,
  // Freshness system
  freshnessTier: 'evergreen'|'seasonal'|'volatile',
  reviewIntervalDays, reviewStatus: 'verified'|'pending_review'|'update_requested',
  lastVerifiedDate, flaggedAt, flagType: 'auto'|'manual'|null, flagReason, flaggedBy, reviewCycle,
  // Promotion system
  trustLevel: 'low'|'medium'|'high'|'expert', sourceType: 'manual'|'community_promotion'|'expert_verified'|'zoom_transcript',
  sourceMeetingId, sourceCommunityPostId, sourceCommentId, promotedAt, objectionStatus: 'none'|'objected'|'resolved',
  promotionMetadata: { upvotesAtPromotion, helpfulVotesAtPromotion, communityAnswerAuthorId, promotedBy, objectionReason, objectionRaisedBy, objectionRaisedAt }
}
// Text index on question + answer
// Collection: yaksha_faq_faqs

CommunityPost

{ title, body, tags[], author, status: 'unanswered'|'answered', answer, answerAuthorId, answerIsExpert,
  upvotes[], embedding (select:false),
  comments: [{ author, body, upvotes[], downvotes[], verified, isExpertAnswer, isFirstResponder,
    firstResponderAwardedAt, parentId, depth, replies, solutionDNA: { keyPoints, summary, tags } }],
  dna: { steps[], tools[], timeToComplete, difficulty: 'Easy'|'Moderate'|'Tricky' },
  reports: [{ reportedBy, reason, createdAt }],
  escalationStatus: 'none'|'escalated'|'resolved'|'dismissed',
  escalatedAt, escalationReason, escalatedBy, escalationResolvedAt, escalationResolvedBy, escalationOutcome,
  answeredFromKnowledgeId, // ID from Zoom transcript knowledge base
  timeTrialStatus: 'none'|'pending'|'awarded', timeTrialStartedAt, timeTrialFirstResponder, timeTrialFirstResponderAt,
  // Promotion
  eligibleForPromotion, promotionPendingAt, promotionCandidateCommentId,
  promotionObjectedBy, promotionObjectedAt, promotionObjectionReason
}
// Text index on title + body
// Collection: yaksha_faq_communityposts

Notification

{ recipient, type: 'post_resolved'|'comment_received'|'faq_match_found'|'mention'|'accepted_answer',
  title, message, link, read, createdAt }

TeaNotification (SpillTheTea)

{ userId, eventType: 'faq_published'|'post_answered'|'post_deleted'|'post_answered_user'|'post_upvoted'|'comment_received',
  faqId, faqQuestion, postId, postTitle, triggeredBy, triggeredByName, content, read, createdAt }

SearchLog

{ query, resultsCount, topResultId, topResultSource: 'faq'|'community'|null }
// TTL index: auto-delete after 90 days
// Collection: yaksha_faq_searchlogs

AdminLog

{ adminId, action, targetId, targetType, details }
// Collection: yaksha_faq_adminlogs

6. API Reference (Summary)

Public

| Method | Route | Description | |——–|——-|————-| | POST | /api/auth/register | Create account, returns JWT | | POST | /api/auth/login | Login, returns JWT |

User (Authenticated)

| Method | Route | Description | |——–|——-|————-| | GET | /api/auth/me | Current user profile | | GET | /api/faq | All FAQs grouped by category | | GET | /api/faq/paginated | Paginated flat FAQ list | | GET | /api/faq/:id | Single FAQ | | POST | /api/faq/check-match | Check if query has FAQ duplicate | | GET | /api/community | Paginated community posts (cursor-based) | | GET | /api/community/:id | Single post with comments | | POST | /api/community | Create post | | POST | /api/community/:id/upvote | Toggle upvote | | PATCH | /api/community/:id/resolve | Mark post answered (mod/admin) | | PATCH | /api/community/:id/comments/:commentId/accept-answer | Accept comment as official answer (post author) | | PATCH | /api/community/:id/comments/:commentId/verify | Mark comment verified (mod/admin) | | PATCH | /api/community/:id/comments/:commentId/upvote | Toggle comment upvote | | POST | /api/community/:id/comments/:commentId/downvote | Toggle comment downvote (auto-deletes at net −5) | | POST | /api/community/:id/comments | Add comment | | GET | /api/community/solved | Posts resolved in last 24h | | GET | /api/community/search?q= | Hybrid search of community posts | | GET | /api/notifications | User notifications | | PATCH | /api/notifications/:id/read | Mark notification read | | PATCH | /api/notifications/read-all | Mark all read | | GET | /api/notifications/tea | SpillTheTea events | | PATCH | /api/notifications/tea/read-all | Mark all tea read | | GET | /api/community/review-queue | FAQs pending peer review | | POST | /api/search | Hybrid semantic search | | GET | /api/search/trending | Top 6 trending queries |

Admin / Moderator

| Method | Route | Description | |——–|——-|————-| | GET | /api/admin/stats | Dashboard summary | | GET | /api/admin/faq-growth | FAQ creation trend | | GET | /api/admin/top-categories | Category breakdown | | GET | /api/admin/search-insights | Search analytics | | GET | /api/admin/users | User list (paginated) | | GET | /api/admin/faqs | FAQ list (paginated, filterable) | | GET | /api/admin/reports | Date-range report export | | GET | /api/admin/activity-feed | Recent admin actions | | GET | /api/admin/user-activity-chart | Daily activity chart | | POST | /api/admin/faq | Create FAQ | | POST | /api/admin/faq/approve | Approve FAQ | | POST | /api/admin/faq/reject | Reject FAQ | | PUT | /api/admin/faq/:id | Update FAQ | | DELETE | /api/admin/faq/:id | Delete FAQ | | PATCH | /api/auth/users/:id/role | Update user role | | GET | /api/analytics | Search log analytics | | GET | /api/community | Paginated community posts | | DELETE | /api/community/:id | Delete community post | | PATCH | /api/community/:id/tags | Update post tags | | PATCH | /api/community/:id/dna | Set solution DNA | | GET | /api/admin/escalated | Escalated posts | | PATCH | /api/admin/escalated/:id/verify | Mod verifies escalated FAQ | | PATCH | /api/admin/escalated/:id/dismiss | Mod dismisses | | POST | /api/admin/community/:id/object | Object to promotion | | GET | /api/admin/community/pending-promotions | Pending promotions list | | POST | /api/admin/community/pending-promotions/:id/promote | Promote to admin-approved/official | | GET | /api/zoom/health | Zoom integration health | | GET | /api/zoom/meetings | Paginated Zoom meetings | | POST | /api/zoom/transcripts | Ingest transcript | | GET | /api/admin/zoom-insights | Zoom insights (FAQ/Announcements) |

Test Credentials

| Role | Email | Password | |——|——-|———-| | Student | user@yaksha.com | password123 | | Admin | admin@yaksha.com | admin123 |


7. Architecture Decisions

7.1 Embedding: Local Transformers (not OpenAI)

7.2 Hybrid Search — RRF over naive union

7.3 Embedding field select: false

7.4 Lazy DB connection

7.5 Community posts go live immediately


8. Scale Readiness — What’s Done vs. What’s Pending

✅ Done (v0.2)

Feature Details
Pagination Community posts paginated (20/page); FAQ paginated endpoint (/faq/paginated)
SearchLog TTL 90-day auto-expiry via MongoDB TTL index; prevents unbounded growth
Compound indexes { category, status, createdAt } on FAQs; { status, createdAt } on posts; { query, createdAt } for search logs
Local embeddings @xenova/transformers runs Xenova/multi-qa-mpnet-base-dot-v1 in-process in Node.js — no API key needed
Redis shared cache Upstash Redis (TTL cache for search results and Zoom data, shared across serverless instances)
Graceful shutdown SIGTERM/SIGINT handlers flush search log buffer before exit
Sentry Error tracking wired to server.ts
Cursor-based pagination Community posts use keyset cursor on _id desc; nextCursor base64 encoded
FAQ Freshness system Backend fully implemented; frontend FreshnessBadge on FAQ cards
Solution DNA Post-level and comment-level structured answer metadata (steps, tools, time, difficulty)
Community FAQ promotion Auto-promotion: answered posts with 10+ upvotes → Community Approved FAQ after 24h review window

⚠️ Partially Done

Feature Status
Zoom OAuth OAuth flow implemented, scopes configured, Zoom insights extraction working, circuit breaker + fallback cache active
Time-Trial 16h countdown → first top-level comment wins FirstResponder badge + 20 points; atomic findOneAndUpdate prevents race
Escalation Posts with 3+ unanswered comments can be escalated; admin resolves/dismisses via escalationController

❌ Not Yet Done (blocking 1M users)

Priority Feature Why It Matters
P0 User-level rate limiting IP-based 300 req/15min breaks legitimate users behind corporate NAT; need per-user JWT-based limits
P0 Auth endpoint rate limiting /api/auth/login and /api/auth/register have NO per-IP rate limit — brute-force possible
P0 XSS sanitization User-generated answer and body fields have no HTML sanitization; stored XSS possible
P0 JWT token refresh revocation refreshTokenHash set on login but NOT verified on refresh — stolen token valid for 7 days
P1 Email domain restriction Anyone can register; no spam guard
P1 Admin 2FA (TOTP) No 2FA for admin accounts
P1 Freshness cron wired runFreshnessCheck() defined in freshnessController.ts but NOT called by any scheduler
P1 FAQ FreshnessBadge on public FAQ page Only visible to admins; public FAQ page does not show freshness status
P2 GDPR data export No GET /api/auth/export endpoint
P2 Failed search → FAQ workflow Failed queries identify gaps but not routed to admin FAQ creation
P2 Bulk CSV FAQ import Admins must create FAQs one-by-one
P3 Sentry / error tracking Wired but not verified end-to-end
P3 Load testing suite No k6/Artillery tests
P3 Multi-language support Embedding model + UI are English-only; India user base needs Indic language support

9. MongoDB Atlas — Required Setup

9.1 Vector Search Index

Each collection (faqs, communityposts) needs a search index named vector_index:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 768,
      "similarity": "cosine"
    }
  ]
}

Note: Vector search is free on M0 (free tier) as of 2024+. No cluster upgrade needed. The apps/backend/src/scripts/addIndexes.ts creates non-vector indexes only — vector index must be created manually in Atlas UI.

9.2 Running the Migration

After pulling, set env vars and run:

cd apps/backend
export MONGODB_URI="mongodb+srv://<user>:***@cluster0.xxxxx.mongodb.net/yaksha_faq"
npm run migrate        # Creates TTL + compound indexes
npm run backfill:embeddings  # Regenerate stored embeddings (if switching models)

10. Environment Variables

Backend (apps/backend/.env)

Variable Required Default Description
MONGODB_URI Yes MongoDB Atlas connection string
JWT_SECRET Yes JWT signing secret
JWT_EXPIRES_IN No 7d Token expiry
PORT No 6767 Server port (Vercel overrides automatically)
CLIENT_URL No Frontend URL for CORS

Frontend (apps/frontend/.env)

Variable Default Description
VITE_API_URL http://localhost:6767/api Backend API base URL

11. Database Collections

Collection Purpose
yaksha_faq_users User accounts
yaksha_faq_faqs FAQ entries with embeddings
yaksha_faq_communityposts Community Q&A posts with embedded comments
yaksha_faq_searchlogs Search analytics (90-day TTL)
yaksha_faq_adminlogs Admin action audit log

12. Role-Based Access Control

Role Access
user Browse FAQs, community, search
moderator All user access + resolve posts, delete posts, verify comments, manage FAQs
admin All moderator access + user management, all admin endpoints
ai_moderator Placeholder for future AI auto-moderation integration

13. Key Implementation Notes


14. Glossary

Term Definition
RRF Reciprocal Rank Fusion — algorithm for merging ranked lists from different rankers
Embedding Dense numerical vector representation of text for semantic similarity comparison
Cosine similarity Similarity metric between vectors; 1.0 = identical, 0.0 = orthogonal
TTL index MongoDB feature that auto-deletes documents after a set time
select:false Mongoose option that excludes a field from default queries (privacy + perf)
LRU Least Recently Used — cache eviction strategy
RRF_K Constant in RRF formula controlling rank smoothing (k=60 is standard)

15. Modular Refactoring Details (v0.3)

In Version 0.3, monolithic controllers and pages were split into focused, single-responsibility files to enhance code maintainability and layout clarity.

15.1 Backend Controller Split

The original communityController.ts grew too large and covered two distinct sets of database operations (Posts and Comments). It was split into:

15.2 FAQ Page Decomposition

To improve readability and simplify state management, the monolithic FAQPage.tsx was decomposed into modular components in components/faq/:

15.3 Community Page Dialogs Extraction

The dialogs for creating posts and viewing detailed discussions were extracted into components/community/: