CipherRank
A gamified CompTIA exam prep platform with three certification tracks — Security+, Network+, and SecAI+ — with 648 curated scenario-based missions, RPG progression, and an AI generation pipeline designed for v2.0 with two-stage validation.
Passing CompTIA certification exams — Security+, Network+, and the new SecAI+ — requires mastering dozens of complex domains across security, networking, and AI governance. Existing study tools are flashcard decks and sterile practice tests that offer no narrative, no stakes, and no sense of progress. Learners study in isolation, disengage quickly, and either cram ineffectively or abandon the certification entirely.
The deeper problem is pedagogical. These are decision-making disciplines, but every exam prep tool on the market tests recall. Pocket Prep, Jason Dion courses, and Professor Messer are linear, passive, and detached from real-world context. They ask "which protocol uses port 443?" when the real exam — and the real job — asks "your network was breached and here are the logs, what do you do next?"
CipherRank transforms certification exam preparation into a progressive RPG experience across three tracks. Learners are field operatives advancing through a career arc — from Recruit to Command Sentinel — by completing scenario-based missions that map directly to each exam's blueprint. Every session is 3–7 minutes. Every wrong answer has consequences and teaches through scenario-contextual feedback. Study stops feeling like work and starts feeling like progress.
Event-sourced architecture: Mission Attempts are the append-only source of truth. All player state (XP, rank, mastery, streak) is derived from them. The server stores nothing — it proxies AI generation and validates receipts.
Six-step design-before-build
Before writing any code, I completed six sequential design phases: data model and progression, UX flow and screen map, XP simulation, backend architecture, content authoring, and AI generation system — plus visual design as a parallel track. Each phase produced a versioned spec document. Each step's output became the input for the next.
This meant I had 120 curated missions validated against the schema before the AI generation prompt was designed — because the AI system needed those missions as few-shot training examples. I reordered the original step sequence when I recognized this dependency. The engineering phase then executed against locked specs rather than discovering design questions mid-build.
Tradeoff: weeks of design work before any visible output vs. zero rework during engineering and no discovered dependencies blocking progress
Three few-shot examples per generation
The AI generation pipeline includes three dynamically selected few-shot examples in every prompt — one per difficulty tier, at least one with branching logic. This costs roughly 7,150 input tokens per call. Two examples would have saved 30% on tokens, but testing showed that dropping below three degraded schema compliance on complex branching missions.
The system is provider-agnostic — a config change on the Cloudflare Worker switches between models without an app update. This gives a documented fallback path if per-generation costs need to come down as usage scales.
Tradeoff: higher per-generation token cost vs. reliable schema compliance on the hardest mission type, with a provider-agnostic fallback path
Two-stage validation with retry
Every AI-generated mission passes through two validation stages before reaching the user. Stage 1 checks schema compliance — correct JSON structure, valid subdomain IDs, decision point counts within range. A Stage 1 failure triggers one retry with a diagnostic hint appended to the prompt. Stage 2 checks quality and safety — content safety blocklist (no working exploit code, no real entities in defamatory roles), text-length floors, placeholder detection, and XP recalculation.
The Worker recalculates and overwrites XP on every mission regardless of what the LLM returns. The LLM's XP output is treated as unreliable by design. If both attempts fail validation, the device falls back to a curated mission — the user never sees a broken AI response.
Tradeoff: added latency and complexity vs. guaranteed content quality with a graceful degradation path
Zero free-text input to the LLM
The AI generation endpoint accepts exactly two parameters from the device: subdomain_id and difficulty. Both are whitelist-validated — 28 valid subdomain values and 3 valid difficulty values. No free-text user input is ever injected into the prompt. This eliminates the entire category of prompt injection attacks by design rather than by detection.
Three failed whitelist validation attempts from the same device within 10 minutes triggers a 1-hour block. App Attest verifies device authenticity. StoreKit 2 JWS verifies subscription entitlement. The attack surface for the AI endpoint is effectively zero after parameter validation.
Tradeoff: users can't request custom topics (only their weakest subdomain is targeted) vs. an AI endpoint that cannot be manipulated through its inputs
AI generation pipeline
The AI generation pipeline is built and functional — prompt assembly, LLM integration via Cloudflare Worker, two-stage validation (schema compliance + quality/safety), retry logic with diagnostic hints, and per-generation cost logging. The system is provider-agnostic with a config-level migration path between models.
The current bottleneck is pass rate. Missions sometimes require more than one generation attempt to clear the framework validation gate, which drives the effective cost above target. The pipeline works — it needs optimisation before it's production-ready at scale.
v1.0 ships with 648 curated missions across three certification tracks. The AI generation system is scheduled for v2.0 once pass-rate refinement is complete.
Project continuity across 65+ chats
CipherRank was designed and built across 65+ individual chats with Claude as an engineering partner. The challenge: Claude has no memory between chats. The chats themselves have limited capacity for context. Every new conversation starts cold. A project this complex — with interdependent specs, design decisions that ripple across documents, and engineering work that depends on locked design choices — cannot survive context loss.
The solution was a structured session briefing system. Each working session — which might span several chats — closes with an updated briefing document that carries the full project state: current status of every component, what was done, next priorities, a cumulative design decisions log, a discrepancy tracker for cross-document conflicts, and a ready-to-paste opening prompt for the next chat. The briefing is versioned and every section carries forward — the iron rule is that sections are updated but never dropped.
This is a context architecture problem. The briefing system is to multi-chat AI collaboration what a well-maintained internal wiki is to a distributed engineering team — the institutional memory that prevents decisions from being revisited and dependencies from being missed.
The design-before-build approach worked well for architectural coherence, but it front-loaded all design decisions into a period when I had the least context about how the system would actually feel in use. Some decisions made during the UX Flow phase (Step 2) were later contradicted by implementation reality — the free-tier weekly cap, the streak multiplier values, and the feedback mode naming all changed during engineering. The discrepancy tracker in the session briefing caught these, but each one was a small rework.
The XP simulation (Step 3) was valuable for validating progression pacing, but I'd run it again after the content library hit 278 missions instead of only against the original 120. The difficulty distribution shifted as I authored more content, and the simulation was calibrated against the earlier mix.
If starting again, I'd keep the phased approach but build a minimal playable prototype after Step 2 — even before the backend architecture. Playing through two or three missions in a real UI would have surfaced the feedback mode naming issue, the weekly cap friction, and the sequential ordering need months earlier.