Case Study

CipherRank

A gamified CompTIA exam prep platform with three certification tracks — Security+, Network+, and SecAI+ — with 648 curated scenario-based missions, RPG progression, and an AI generation pipeline designed for v2.0 with two-stage validation.

Swift 6 SwiftUI SwiftData CloudKit Cloudflare Workers StoreKit 2 AI Generation

648

Missions

2,242

Decisions

Subdomains

Ranks

Tracks

65+

Chats

The Problem

Passing CompTIA certification exams — Security+, Network+, and the new SecAI+ — requires mastering dozens of complex domains across security, networking, and AI governance. Existing study tools are flashcard decks and sterile practice tests that offer no narrative, no stakes, and no sense of progress. Learners study in isolation, disengage quickly, and either cram ineffectively or abandon the certification entirely.

The deeper problem is pedagogical. These are decision-making disciplines, but every exam prep tool on the market tests recall. Pocket Prep, Jason Dion courses, and Professor Messer are linear, passive, and detached from real-world context. They ask "which protocol uses port 443?" when the real exam — and the real job — asks "your network was breached and here are the logs, what do you do next?"

CipherRank transforms certification exam preparation into a progressive RPG experience across three tracks. Learners are field operatives advancing through a career arc — from Recruit to Command Sentinel — by completing scenario-based missions that map directly to each exam's blueprint. Every session is 3–7 minutes. Every wrong answer has consequences and teaches through scenario-contextual feedback. Study stops feeling like work and starts feeling like progress.

Architecture

On-Device (iOS)

SwiftData

7 entities, local-first

State Pipeline

XP, rank, mastery, streak

Mission Engine

Play loop, dual feedback

StoreKit 2

3 tiers, offline grace

CloudKit Sync

Cross-device, event-sourced

648 Missions

Bundled in app binary

AI Targeting

Weakest subdomain logic

Notifications

Daily study reminders

App Attest + StoreKit JWS

Cloudflare Worker (Stateless Edge)

AI Generation

Prompt assembly, LLM call, two-stage validation

Receipt Validation

App Store Server API verification

Rate Limiting

10/day, 150/month per device

Provider-Agnostic Abstraction

Claude Sonnet 4 (Primary)

Best schema compliance for nested JSON

Claude Haiku 4.5 (Fallback)

Cost fallback if usage exceeds targets

Local-First · No Server Database · No User Accounts · Offline-Capable

Event-sourced architecture: Mission Attempts are the append-only source of truth. All player state (XP, rank, mastery, streak) is derived from them. The server stores nothing — it proxies AI generation and validates receipts.

Key Decisions

Six-step design-before-build

Before writing any code, I completed six sequential design phases: data model and progression, UX flow and screen map, XP simulation, backend architecture, content authoring, and AI generation system — plus visual design as a parallel track. Each phase produced a versioned spec document. Each step's output became the input for the next.

This meant I had 120 curated missions validated against the schema before the AI generation prompt was designed — because the AI system needed those missions as few-shot training examples. I reordered the original step sequence when I recognized this dependency. The engineering phase then executed against locked specs rather than discovering design questions mid-build.

Tradeoff: weeks of design work before any visible output vs. zero rework during engineering and no discovered dependencies blocking progress

Three few-shot examples per generation

The AI generation pipeline includes three dynamically selected few-shot examples in every prompt — one per difficulty tier, at least one with branching logic. This costs roughly 7,150 input tokens per call. Two examples would have saved 30% on tokens, but testing showed that dropping below three degraded schema compliance on complex branching missions.

The system is provider-agnostic — a config change on the Cloudflare Worker switches between models without an app update. This gives a documented fallback path if per-generation costs need to come down as usage scales.

Tradeoff: higher per-generation token cost vs. reliable schema compliance on the hardest mission type, with a provider-agnostic fallback path

Two-stage validation with retry

Every AI-generated mission passes through two validation stages before reaching the user. Stage 1 checks schema compliance — correct JSON structure, valid subdomain IDs, decision point counts within range. A Stage 1 failure triggers one retry with a diagnostic hint appended to the prompt. Stage 2 checks quality and safety — content safety blocklist (no working exploit code, no real entities in defamatory roles), text-length floors, placeholder detection, and XP recalculation.

The Worker recalculates and overwrites XP on every mission regardless of what the LLM returns. The LLM's XP output is treated as unreliable by design. If both attempts fail validation, the device falls back to a curated mission — the user never sees a broken AI response.

Tradeoff: added latency and complexity vs. guaranteed content quality with a graceful degradation path

Zero free-text input to the LLM

The AI generation endpoint accepts exactly two parameters from the device: subdomain_id and difficulty. Both are whitelist-validated — 28 valid subdomain values and 3 valid difficulty values. No free-text user input is ever injected into the prompt. This eliminates the entire category of prompt injection attacks by design rather than by detection.

Three failed whitelist validation attempts from the same device within 10 minutes triggers a 1-hour block. App Attest verifies device authenticity. StoreKit 2 JWS verifies subscription entitlement. The attack surface for the AI endpoint is effectively zero after parameter validation.

Tradeoff: users can't request custom topics (only their weakest subdomain is targeted) vs. an AI endpoint that cannot be manipulated through its inputs

Technical Depth

AI generation pipeline

Designed & Prototyped — Scheduled for v2.0

The AI generation pipeline is built and functional — prompt assembly, LLM integration via Cloudflare Worker, two-stage validation (schema compliance + quality/safety), retry logic with diagnostic hints, and per-generation cost logging. The system is provider-agnostic with a config-level migration path between models.

The current bottleneck is pass rate. Missions sometimes require more than one generation attempt to clear the framework validation gate, which drives the effective cost above target. The pipeline works — it needs optimisation before it's production-ready at scale.

v1.0 ships with 648 curated missions across three certification tracks. The AI generation system is scheduled for v2.0 once pass-rate refinement is complete.

Project continuity across 65+ chats

CipherRank was designed and built across 65+ individual chats with Claude as an engineering partner. The challenge: Claude has no memory between chats. The chats themselves have limited capacity for context. Every new conversation starts cold. A project this complex — with interdependent specs, design decisions that ripple across documents, and engineering work that depends on locked design choices — cannot survive context loss.

The solution was a structured session briefing system. Each working session — which might span several chats — closes with an updated briefing document that carries the full project state: current status of every component, what was done, next priorities, a cumulative design decisions log, a discrepancy tracker for cross-document conflicts, and a ready-to-paste opening prompt for the next chat. The briefing is versioned and every section carries forward — the iron rule is that sections are updated but never dropped.

This is a context architecture problem. The briefing system is to multi-chat AI collaboration what a well-maintained internal wiki is to a distributed engineering team — the institutional memory that prevents decisions from being revisited and dependencies from being missed.

What I'd Do Differently

The design-before-build approach worked well for architectural coherence, but it front-loaded all design decisions into a period when I had the least context about how the system would actually feel in use. Some decisions made during the UX Flow phase (Step 2) were later contradicted by implementation reality — the free-tier weekly cap, the streak multiplier values, and the feedback mode naming all changed during engineering. The discrepancy tracker in the session briefing caught these, but each one was a small rework.

The XP simulation (Step 3) was valuable for validating progression pacing, but I'd run it again after the content library hit 278 missions instead of only against the original 120. The difficulty distribution shifted as I authored more content, and the simulation was calibrated against the earlier mix.

If starting again, I'd keep the phased approach but build a minimal playable prototype after Step 2 — even before the backend architecture. Playing through two or three missions in a real UI would have surfaced the feedback mode naming issue, the weekly cap friction, and the sequential ordering need months earlier.

See BLEKit Case Study All Projects