Senior SDE review of the 29 Mongoose models in backend/models/*.ts
plus the live data in the production cluster.
Audited at commit 9708249 on what is now main (post-model-swap to
mxbai-embed-large-v1, post-1024-dim vector index recreation).
interface IUser and the actual schema, double unique: true +
index: true)AttendanceGuidance still in the codebase, inconsistent
targetId types across log models)details field no
maxlength on AdminLog, etc.)0 data-quality blockers for production — the existing
migrate-and-clean.ts and migrateTierNames.ts already
cover the big historical migrations. New fixes are
mostly preventive (constraints + indexes for the next
data that comes in).
The fix-pass below is split into 5 commits:
AttendanceGuidancePlus a data-quality script (scripts/auditData.ts) that the
operator can run on demand to print a per-collection summary
of orphan refs, stale flags, and inconsistent state.
User.suspendidoUntil is in the interface but not the schemaFile: backend/models/User.ts:60-63 (interface), 180-186 (schema)
The TS interface declares:
suspendidoUntil?: Date;
The schema has no matching suspendidoUntil field. Mongoose
silently drops interface-only fields, so any write to
user.suspendidoUntil = ... succeeds (no validation) but
the value never persists. Any read returns undefined. If
any code path attempts to use this for the suspension
gate, it always evaluates to “not suspended” → silent
authz bypass.
Fix: Add the field to the schema (or remove from the interface if suspension isn’t actually a feature).
User.role enum mismatch between TS type and schemaFile: backend/models/User.ts:8 (type) vs schema enum
TS type:
export type UserRole = 'user' | 'moderator' | 'admin' | 'ai_moderator' | 'expert';
Schema enum is the same (verified at the IUser interface), so
this is just a reminder to keep the two in sync as new
roles are added. Will add a runtime assertion in the
pre('save') hook that the value is in the TS union.
User.bookmarks double-nested typeFile: backend/models/User.ts:195
bookmarks: { type: [{ type: MongooseSchema.Types.ObjectId, ref: 'CommunityPost' }], default: [] },
The outer { type: [...] } is redundant — the value
should be an array of ObjectId refs, not an object
containing an array. Mongo accepts it (Mongoose strips
the outer object) but a find({ bookmarks: { $size: 3 } })
query silently returns nothing.
Fix: Flatten to:
bookmarks: [{ type: MongooseSchema.Types.ObjectId, ref: 'CommunityPost' }],
User.isSuspended and suspendidoUntil in interface onlySame as C1 — both fields are interface-only. Compounded
by the fact that the typo suspendidoUntil (Spanish
pendido = pended) suggests this was a partial
implementation. Either ship it (add to schema) or remove
from interface.
RevokedToken.jti is unique: true AND index: trueFile: backend/models/RevokedToken.ts:36-37
jti: { type: String, required: true, unique: true, index: true },
unique: true already creates a unique index. The
explicit index: true is redundant. Mongoose accepts
the duplicate declaration but it pollutes the schema
metadata.
Fix: Drop index: true, keep unique: true.
CommunityPost.author has no indexFile: backend/models/CommunityPost.ts
The “all posts by user X” endpoint is a common hot path
(my-profile → my posts, admin → user’s history). The
author field is not indexed. Every such query is a
full collection scan. At 31 posts today it’s a no-op;
at 100k+ it’s catastrophic.
Fix: communityPostSchema.index({ author: 1, createdAt: -1 })
(common pagination + sort).
Notification has no TTL on read: true recordsFile: backend/models/Notification.ts
Read notifications are kept forever. For a high-traffic
app this collection grows unbounded. The user-unread
count query (per recipient, read=false) gets slower
as the collection grows.
Fix: Add a TTL index on createdAt for read: true
notifications (~30 days). Don’t expire unread.
SearchLog has no TTLFile: backend/models/SearchLog.ts
Search logs accumulate forever. The trending-topics
aggregation only needs the last N days. A TTL of 90
days on createdAt bounds the collection size without
losing analytical value.
Fix: Add index({ createdAt: 1 }, { expireAfterSeconds: 90 * 24 * 60 * 60 }).
ModerationLog and AdminLog lack an updatedBy fieldBoth log models have moderatorId / adminId but no way
to track who edited a log entry (the audit trail itself
should be append-only — flag this in the schema via
strict: 'throw' on update).
SupportRequest.status enum has redundant valuesFile: backend/models/SupportRequest.ts
export type SupportStatus = 'Pending' | 'In Review' | 'Resolved' | 'Rejected' | 'open' | 'closed';
Pending and open are both “initial state”. The
‘open’ and ‘closed’ casing is inconsistent with the rest
of the enum. The supportInbox controller filters on
status: 'Pending' — but new tickets could land in
either bucket depending on the route.
Fix: Pick one casing scheme (lowercase + state machine make the most sense) and deprecate the other.
Category.slug has no schema-level kebab-case enforcementFile: backend/models/Category.ts
slugifyCategoryName() exists as a helper but the schema
allows any string up to 140 chars. An admin could create
/category/MyCategory! and the URL helper would explode
later.
Fix: Add match: /^[a-z0-9-]+$/ to the slug field.
DocumentInsight.sources sub-doc has no min validationThe sources: [{ id, title, type }] array is allowed to
be empty. The type field has no enum.
Fix: Add type enum + default: [] already exists;
the sub-doc should be a named sub-schema with validation.
ReputationLog.targetType is a free string{ type: String } — accepts any value. Should be a
literal union: 'faq' | 'comment' | 'post' | 'support'.
FeatureFlag.key is string in schema but FeatureFlagKey in TSThe schema declares key: { type: String, ..., unique: true }.
The TS interface uses the narrow FeatureFlagKey union
(‘sessionSupport’ | string). Mongoose accepts any string;
a typo in the controller bypasses the type system at
runtime.
Fix: Validate the key at write time against a known set.
AttendanceGuidance is deprecated but still in the schemaFile: backend/models/AttendanceGuidance.ts
The model file’s own comment says it’s superseded by
SupportCategory. The script seedSupportCategories.ts
never imports it. The model is read by no one in the
current codebase.
Fix: Add a deprecation banner + a one-line deprecation
flag. Delete on the next major version.
ReputationLog.userId has no compound index with actionThe “show me all answer_accepted events for user X” query
is a common moderation view. The current single
(userId, createdAt) index doesn’t help that.
Fix: reputationLogSchema.index({ userId: 1, action: 1, createdAt: -1 }).
AdminLog.details has no maxlengthCould be megabytes. Add maxlength: 2000.
Batch.endDate is not validated against startDate{ required: true } on both, but no validate function
to ensure end > start.
GuestEvent.scrollPct has no min: 0, max: 1Should be bounded.
UnresolvedSearch.resolution enum includes null literallyThe enum: [..., null] declaration in a Mongoose schema
adds null to the allowed values. The TS type already
allows null. Just inconsistent.
DocumentRecord.rawExtractedText has no maxlengthA 25MB PDF’s extracted text could be 1MB+. Add a sane cap.
FreshReviewVote.voterId has no index“My votes” view needs it.
The scripts/auditData.ts script reports the following
snapshot. Run it on demand:
$ npm run audit:data
Output is a per-collection summary of:
User._id referenced by SupportRequest that
doesn’t exist)isGolden=true with no goldenConvertedAt,
isBanned=true with no bannedBy, etc.)tier doesn’t match points per
the calculateTier ladder, embedding arrays that
aren’t the expected EMBEDDING_DIM length, etc.)Fixes for live data:
scripts/migrate-and-clean.ts already handles the
historical data migrationspoints=200 and tier='newcomer')
is auto-corrected by the user-save hook in
models/User.ts (computes tier from points on save)| # | Commit | What | Risk |
|---|---|---|---|
| 1 | schema-fix-security | Add suspendidoUntil + isSuspended to User schema, fix bookmarks double-nest, RevokedToken.jti redundant index, Category.slug regex, ReputationLog.targetType enum, GuestEvent.scrollPct bounds |
Low (additive + tightening) |
| 2 | schema-fix-indexes | CommunityPost.author idx, ReputationLog(userId, action), FreshReviewVote.voterId |
Low (additive) |
| 3 | schema-fix-ttls | SearchLog 90d, Notification(read=true) 30d |
Medium (existing docs unaffected; only future writes expire) |
| 4 | schema-fix-cleanup | AttendanceGuidance deprecation banner, AdminLog.details maxlength, DocumentRecord.rawExtractedText maxlength, Batch.endDate>startDate validate, SupportRequest.status enum cleanup |
Low |
| 5 | data-audit-script | Add scripts/auditData.ts + npm run audit:data |
None (read-only) |
Each commit keeps the build clean (npx tsc --noEmit) and
re-runs the audit script before + after to show the
delta.
The current seed.ts only seeds 130 FAQs. For a Yaksha-class
app to look “alive” in screenshots / demos / investor decks,
a few more collections need realistic data:
That’s the third track, separated as seedLiveData.ts. It
runs idempotently: detects existing data and skips.
After every fix:
cd backend && npx tsc --noEmit # type check
cd backend && npm run audit:data # data quality snapshot
The audit script will print “(no changes)” if the data matches expectations, or a delta if anything drifted.