# PAI Skills Deep Dive — All 52, Individually Explained + The 80/20

> Generated 2026-06-11 from each skill's actual SKILL.md (mechanics, workflows, gotchas). Companion to SKILLS_GUIDE.md (the quick tier map). Same priority order. Every entry ends with **The 20%** — the core understanding that unlocks 80% of the skill's value.

---

# Tier 1 — Earn + Verify

### 1. Research

**What it is:** Your single most-used skill (13 logged runs). A multi-engine research machine that fans your question out to several AIs — Claude, Gemini, Grok, Perplexity — and cross-checks what comes back.
**How it works:** Four depth modes. Quick = 1 Perplexity agent, ~15 seconds. Standard (the default) = 4 different AI researchers cross-checked, ~30-60s. Extensive = 7 explorers plus 2 independent verifiers. Deep Investigation = progressive iteration with a persistent research vault that builds over hours. Findings come back tagged [HIGH], [MED], [LOW], or [CONFLICT] so you know what to trust. Every URL is verified before delivery — a hallucinated link is treated as a catastrophic failure.
**Use it when:** "research X," "find out about Y," prospect research, market questions, anything where being wrong costs money.
**Watch out:** Even [HIGH]-tagged stats deserve a re-read of the source page when a number is load-bearing — a real audit caught 15 mis-quoted figures in one big run.
**The 20%:** Say "research X" and you get four AIs cross-checking each other with verified links. Escalate depth when the stakes rise; never settle for one model's opinion on anything that matters.

### 2. Sales

**What it is:** Turns dry product documentation into a sales-ready story package — narrative, talking points, and a signature visual.
**How it works:** A pipeline: extract the narrative arc → pick the emotional register (wonder, determination, hope) → derive a visual scene → generate assets. Three workflows: CreateSalesPackage (everything), CreateNarrative (8-24 numbered first-person story points capturing *why it matters*, not what it does), CreateVisual (charcoal gestural sketch, transparent background).
**Use it when:** Building the roofer/HVAC pitch, Arete proposals, any "turn this into a pitch."
**Watch out:** It tells stories, not feature lists — and the charcoal sketch style is the deliberate brand. For Hormozi-style offer math, that's a different tool.
**The 20%:** Feed it what you're selling; it returns the story version. People buy the why — this skill manufactures the why.

### 3. Interceptor

**What it is:** Remote control for your REAL Chrome — the browser you're logged into — driven from inside via an extension, so there's no automation fingerprint. Passes every major bot-detection test.
**How it works:** Compound commands collapse multi-step flows: `interceptor open <url>`, `read`, `act`, `inspect`. It auto-captures all network traffic, can record your manual actions and replay them as scripts (monitor/replay), and understands rich editors like Google Docs via a scene graph. On this WSL box: run `interceptor-up` once per session first.
**Use it when:** Verifying ANY web deploy or UI claim (this is doctrine — "it works" requires an Interceptor screenshot), reproducing reported bugs before reading code, posting to Nextdoor/Facebook with your real session, automating anything behind a login.
**Watch out:** Screenshots always capture from the top of the page — for content below the fold, give the section its own short page. Duplicate tabs at the same URL confuse routing; close extras.
**The 20%:** It's your actual browser, your actual logins, invisible to bot detection. One rule to remember: nothing on the web counts as "verified" until Interceptor has seen it.

### 4. Browser

**What it is:** Interceptor's headless sibling — a fast Rust-based browser daemon (agent-browser) built for volume, not verification.
**How it works:** Persistent auth profiles (log in headed once, run headless forever after), parallel isolated sessions via `--session`, batch command execution, network interception, device emulation. Workflows: ReviewStories fans YAML user stories out to parallel reviewers; Automate runs parameterized recipe templates.
**Use it when:** Scraping 20 pages in parallel, batch screenshots, background data extraction, dev-server testing at speed.
**Watch out:** Never use it to verify deploys — it misses rendering issues real Chrome catches. Bot-detected sites need Interceptor or BrightData instead.
**The 20%:** Volume = Browser, verification = Interceptor. When the job is "do this 50 times fast," this is the tool.

### 5. Apify

**What it is:** Structured scraping of the platforms that matter commercially — Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps, Amazon — through purpose-built cloud actors.
**How it works:** TypeScript wrappers (searchGoogleMaps, scrapeInstagramProfile, scrapeAmazonProduct...) call the right actor and filter the data in code before it reaches the conversation — 95-99% token savings. Parallel multi-platform queries supported. The lead pipeline: Google Maps search → qualify filter → optional LinkedIn enrichment.
**Use it when:** Building contractor lead lists, competitor social analysis, review mining, business contact extraction.
**Watch out:** Each platform has a dedicated actor — generic scrapers lose to specialized ones. Output schemas differ per actor.
**The 20%:** Google Maps → filtered business list with contacts is the move that feeds LeadRadar's sell side. One sentence in, structured lead data out.

### 6. BrightData

**What it is:** The escalation ladder for sites that block you.
**How it works:** Four tiers, always starting at the bottom: Tier 1 WebFetch (free, instant) → Tier 2 curl with Chrome headers → Tier 3 headless rendered browser → Tier 4 Bright Data residential proxies with CAPTCHA solving (costs real money). A Crawl workflow maps multi-page sites up to ~50 pages.
**Use it when:** 403s, CAPTCHAs, "can't access this site," whole-site crawls.
**Watch out:** Never start at Tier 4 — most blocks fall to Tier 2 or 3. Playwright is banned across PAI.
**The 20%:** Climb the ladder one rung at a time and stop at the first rung that works; you'll rarely pay for Tier 4.

### 7. ISA

**What it is:** The "definition of done" machine — the single most important primitive in PAI. An ISA (Ideal State Artifact) is one document that defines what done means; ISCs (Ideal State Criteria) are its checklist lines, each a binary yes/no test. No partial credit — exactly like making weight.
**How it works:** Up to twelve fixed sections (Problem, Vision, Out of Scope, Constraints, Goal, Criteria, Test Strategy, Features, Decisions, Changelog, Verification...). Six workflows: Scaffold (create from a prompt), Interview (deepen by asking you questions), CheckCompleteness (audit), Reconcile (merge parallel work back), Seed (bootstrap from existing code), Append (canonical log entries). Project ISAs live in the project repo as its system of record; one-off task ISAs live in working memory.
**Use it when:** Any serious work — the Algorithm creates one automatically. Directly: "scaffold an ISA for X," "check this ISA's completeness."
**Watch out:** Criteria IDs never renumber once written — splits become ISC-7.1, ISC-7.2. That stability is what lets parallel agents merge work back safely.
**The 20%:** One document is simultaneously the spec, the test suite, and the receipt. If you understand the ISA, you understand how PAI decides anything is true.

---

# Tier 2 — Thinking Multipliers

### 8. FirstPrinciples

**What it is:** The Musk-style reasoning method: stop copying how everyone does it, rebuild from what's physically true. Your #2 most-used skill.
**How it works:** Three steps. DECONSTRUCT — what is this actually made of, and what does each part actually cost? CHALLENGE — classify every "constraint" as hard (physics), soft (someone's policy), or assumption (unvalidated belief). RECONSTRUCT — build the optimal solution from only the hard constraints, ignoring inherited form.
**Use it when:** "Is this a real constraint?", pricing decisions, "why does this cost so much," anything where the inherited answer smells lazy.
**Watch out:** Decompose to axioms, not just smaller pieces — and it's an analysis tool; once the assumption is exposed, fixing it is normal work.
**The 20%:** Only physics is immutable. Run the classification table on any "we have to" and watch most of them turn out to be choices.

### 9. RedTeam

**What it is:** A 32-agent adversarial firing squad for your ideas, plans, and offers — not for networks or servers.
**How it works:** Five phases: decompose the idea into ~24 atomic claims → 32 specialized agents (engineers, architects, pentesters, interns) attack in parallel → synthesis → steelman (the strongest version of your idea) → counter-argument. Findings ranked by severity, each with a remediation path. A second workflow, AdversarialValidation, makes competing proposals fight and synthesizes the winner.
**Use it when:** Before the $250 offer goes out, before a pivot, "poke holes in this," "what's the strongest objection."
**Watch out:** 32 agents generate volume — the ranked severity list is the signal, not the raw count.
**The 20%:** Let it break your plan in private before a roofer, a professor, or the market breaks it in public. The output isn't "no" — it's the fixed version.

### 10. SystemsThinking

**What it is:** The structure-finder, built on Donella Meadows and Peter Senge: recurring problems aren't bad luck, they're loops.
**How it works:** Five workflows: Iceberg (walk Events → Patterns → Structures → Mental Models), CausalLoop (draw reinforcing/balancing loops), FindArchetype (match to ~10 canonical patterns like Shifting the Burden, each with a documented intervention), FindLeverage (Meadows' 12 leverage points, weakest to strongest), ConceptMap.
**Use it when:** "Why does this keep happening," motivation cycles, business flywheels, anything with feedback over time.
**Watch out:** If you can't draw at least one loop, you have a list, not a system. Delays between action and feedback cause most of the pain — mark them.
**The 20%:** Behavior comes from structure. Find the loop generating the recurring problem, then intervene at the highest leverage point you can reach — not at the event.

### 11. Council

**What it is:** A real debate between custom-built experts — visible transcripts, genuine friction, then synthesis.
**How it works:** Members are composed per topic via the Agents skill (never generic stock personas) with expertise, personality, and a stake in disagreeing. DEBATE = 3 rounds plus synthesis (~40-90s); QUICK = 1 round. 4-6 well-composed agents beat 12 generic ones.
**Use it when:** Two-sided decisions ("take the internship or go all-in"), strategy weighing, anywhere you'd want a panel instead of one advisor.
**Watch out:** If everyone would agree anyway, the topic doesn't deserve a Council.
**The 20%:** Use it for decisions, not validation — the value is watching positions collide before you commit.

### 12. BeCreative

**What it is:** Research-backed divergence (Verbalized Sampling, arXiv 2510.01171): forces the model away from its most-probable answer.
**How it works:** Single-shot mode generates 5 internally diverse candidates — each deliberately under 10% probability — and surfaces the strongest. Multi-turn mode expands a small seed set into a full diverse corpus (for tests, evals, content variants). Workflows include TreeOfThoughts and MaximumCreativity. Measured: 1.6-2.1x diversity gain, +25.7% quality.
**Use it when:** Names, angles, hooks, "the obvious idea feels stale," generating test datasets.
**Watch out:** Quick divergence only — multi-cycle evolution is Ideate's job. No lift on questions with one right answer.
**The 20%:** The first idea any model gives you is the most statistically average one. This skill is the anti-average button.

### 13. Science

**What it is:** The scientific method as a debugging and decision loop — guesses become hypotheses, hypotheses get falsified.
**How it works:** DefineGoal → GenerateHypotheses (minimum THREE — one hypothesis is just confirmation bias) → DesignExperiment (falsifiable) → Measure → Analyze → Iterate. QuickDiagnosis is the 15-minute fast lane for everyday debugging.
**Use it when:** "Why is this happening," A/B questions, optimizing anything measurable, content experiments.
**Watch out:** "It seems better" is not a measurement. Skipping the hypothesis step is just trial-and-error with extra steps.
**The 20%:** Three guesses minimum, then design the test that could prove you wrong. That single habit separates investigation from flailing.

### 14. RootCauseAnalysis

**What it is:** Structured incident investigation — Toyota's 5 Whys, Ishikawa fishbones, blameless postmortems, fault trees.
**How it works:** Five workflows matched to the failure shape: FiveWhys (linear chains), Fishbone (multiple suspect areas), Postmortem (timeline + contributing factors + actions), FaultTree (safety-critical AND/OR logic), IS/IS-NOT (subtle hard-to-reproduce defects).
**Use it when:** Recurring bugs, "what actually broke," post-launch incidents, pre-launch risk inversion.
**Watch out:** "Human error" is where investigation starts, never where it ends — if a human could make the mistake, the system allowed it. Stop at the deepest *actionable* level, not at thermodynamics.
**The 20%:** The first answer to "why" is never the root. Fix where the bad state *enters* the system, not where it becomes visible.

### 15. IterativeDepth

**What it is:** The blind-spot eliminator — the same problem examined through 2-8 systematically different lenses, one pass each.
**How it works:** Each pass (stakeholder lens, failure lens, temporal lens, constraint-inversion lens...) surfaces requirements invisible from the others; passes stop when they start repeating. A 4-lens pass routinely finds 30-50% more requirements than direct analysis. Fast mode = 2 lenses.
**Use it when:** Before building anything important, "what am I missing," novel work where you don't know what you don't know.
**Watch out:** Diminishing returns after ~5 passes; if a pass repeats earlier findings, stop early.
**The 20%:** Mid-project surprises cost 5-10x what upfront passes cost. Three lenses before you build is the cheapest insurance PAI sells.

### 16. ApertureOscillation

**What it is:** A zoom discipline: examine the thing you're building (narrow), then the system it lives in (wide), then hunt the divergence.
**How it works:** Exactly 3 passes — never more. Pass 1: the component's own internal logic. Pass 2: what the larger system needs it to be. Pass 3: where those views disagree — that delta is the entire output, surfaced as design tensions and scope recommendations.
**Use it when:** "Build this inside that," feature-fits-system questions, scope disputes, architecture reviews.
**Watch out:** Needs two genuinely distinct inputs (component + system) — if they're the same thing, use IterativeDepth instead. "No divergence found" is itself a valuable result.
**The 20%:** What the part wants and what the whole needs are usually different — the gap between them is where designs quietly go wrong.

### 17. Ideate

**What it is:** Evolution for ideas — not a brainstorm, a multi-generation breeding program.
**How it works:** Nine phases: CONSUME → DREAM (noise 0.9) → DAYDREAM (0.5) → CONTEMPLATE (0.1) → STEAL (cross-domain borrowing) → MATE (recombination via true random shuffle — deliberately not LLM-picked pairs, because the LLM's taste is the bias being defeated) → TEST (fitness scoring, RedTeam inside) → EVOLVE → META-LEARN (strategies adjust between cycles). A loop controller decides continue/pivot/stop.
**Use it when:** Hard problems where good-enough ideas keep failing, "I need something genuinely new."
**Watch out:** Expensive by design. For a fast idea burst, BeCreative is the right size.
**The 20%:** Generations beat sessions — ideas that survive testing get bred together, and the breeding strategy itself learns. Save it for problems worth that machinery.

---

# Tier 3 — Self-Training (You, Not the System)

### 18. PitchRehearsal

**What it is:** Your public-speaking anxiety killer — G0 is a stated goal, and this attacks it with data instead of vibes.
**How it works:** BuildOutline pulls evidence from your knowledge vault into a 5-7 minute beat-based outline (beats only — you choose words live) with a callback close that returns to the hook. You record on your phone, drop the audio + transcript in the session folder; ScoreRecording computes fillers-per-minute, words-per-minute, and flags your three weakest 20-second windows. ReviewProgress draws the trend lines across sessions.
**Use it when:** Class presentations, sales calls, any rep where you'd otherwise just "wing it again."
**Watch out:** The hook is never "Hi, my name is" — open with the observation. Give the scorer the recording duration for accurate rates.
**The 20%:** Outline → record once → score → repeat. Anxiety retires when fillers-per-minute visibly falls week over week — the graph does what willpower can't.

### 19. Defend

**What it is:** A sparring ring for arguments. Stake a one-sentence thesis; a hostile AI attacks it for 6+ turns; an independent judge scores you.
**How it works:** The adversary never praises — it concedes in one neutral clause and attacks your next weakest premise. The judge (a separate call, blind to the adversary's stance) scores premise integrity, counter-response quality, concession honesty, and citation accuracy. Below 6/10 the thesis re-queues for tomorrow; above 8 it retires.
**Use it when:** Prepping for negotiations, pitches, debates, any room where someone will push back hard.
**Watch out:** Unverifiable claims get penalized even if they sound right — cite real numbers or don't cite.
**The 20%:** It's live-gos for your mouth. Lose in private on Tuesday so you win in the room on Friday.

### 20. ThesisVault

**What it is:** An append-only record of your investment calls — the artifact that replaces a resume in a finance interview.
**How it works:** Every Sunday: one ~400-word thesis, three falsifiable *dated* predictions, and a mandatory inversion paragraph (what would prove me wrong — the CLI refuses to save without it). Nothing is editable after writing. At 90 days it auto-schedules a post-mortem: live price/news pulled in, four structured questions, verdict tagged hit/miss/mixed. Stats computes your batting average.
**Use it when:** Weekly, as a discipline. "Log a thesis on NVDA," "what's due," "my batting average."
**Watch out:** "Stripe wins" is not a prediction. "Stripe TPV exceeds $1.5T trailing-twelve by Q4" is. Lazy predictions surface at post-mortem time.
**The 20%:** Dated, uneditable calls + measured accuracy = the thing you hand a managing director at 25 that nobody else in the room has.

### 21. TenK

**What it is:** A daily 25-minute trainer that turns you into someone who reads a 10-K in 12 minutes instead of 45.
**How it works:** Three blocks: RSVP overspeed reading at 115% of your target speed (words flash one at a time), a comfortable-pace re-read with a thesis freewrite, then 8 adversarial recall questions generated fresh. Speed ratchets +5 WPM after two straight ≥85% recall sessions, -10 on a miss. Corpus: pre-cached real 10-Ks (Apple, Berkshire, Tesla...) from SEC EDGAR.
**Use it when:** Daily — it's a workout, not a tool.
**Watch out:** Quiz scoring is fuzzy token-overlap — sloppy answers correctly fail. The speed cap that matters is comprehension, not the terminal.
**The 20%:** Speed is earned by recall, never claimed — the 85% gate is the whole system. Target: 230 → 520 WPM in 60 days.

### 22. MajorLeague

**What it is:** Daily 10-minute training on the Major System — the phonetic code that turns numbers into images.
**How it works:** Digits map to consonant sounds (3=M, 4=R...), sounds become words, words chain into vivid stories. Three stages per session: Encode (2-digit → word, scored on match + speed), Chain (6-30 digit sequences → image story), Recall (re-quiz yesterday's and 3-day-old chains). Chain length grows only after a perfect previous session.
**Use it when:** Daily drill; instant utilities: "encode 472," "decode mirror."
**Watch out:** The parser reads spelling, not pronunciation — pick phonetically-spelled words or the score lies.
**The 20%:** Numbers become pictures, pictures don't fade. End state: 15 key figures from a 10-K, quoted cold three days after one read.

### 23. PalaceArchitect

**What it is:** Daily 8-minute memory-palace training — the method of loci on real places you know (dorm, wrestling room, parents' house).
**How it works:** Designate a location, list 10-20 stations in walking order, encode a rotating content pack onto them, quiz forward/backward/random-access. Every drill auto-schedules a 48-hour retention quiz. Progression: 5 stations week 1 → 25 stations across 3 palaces by week 8.
**Use it when:** Daily drill; `palace retention` when a quiz comes due.
**Watch out:** The 48-hour number is the real score — initial recall is mostly working memory. Don't reuse the same location twice in 48 hours; the associations interfere.
**The 20%:** Spatial memory is the strongest memory you own. The 48-hour quiz is the only number that counts.

### 24. AIChampionLesson

**What it is:** Turns any AI concept you just learned into a 10-minute teachable for peers — and teaching is the strongest way to lock learning in.
**How it works:** Strict five-part structure: (1) why this matters to YOU, anchored in the audience's life, (2) the push-mode wrong way most people use AI, (3) the pull-mode right way in a relatable scenario (student paper, sales call, wrestling cut), (4) three copy-paste-ready prompts, (5) one named trap. Outputs lesson.md (600-900 words) plus a 60-90 second video script with an on-screen hook.
**Use it when:** Friday ritual: "make a lesson on what clicked this week." Direct: "teach this," "AI for normies."
**Watch out:** ONE concept per lesson — if the draft covers three things, it's three lessons. Three prompts means three, all copy-paste ready.
**The 20%:** Learn → teach → post. Every concept you master becomes mission content (helping people catch up on AI) and funnel material at the same time.

---

# Tier 4 — Memory & Life OS

### 25. Knowledge

**What it is:** Your curated second brain — a typed graph, not a notes pile.
**How it works:** Four entity types only: People, Companies, Ideas, Research. Every note carries typed links — supports, contradicts, extends, caused-by — so the graph can argue with itself. Operations: 3-pass search, add, harvest (auto-pull from PAI activity), ingest (URL/file → note + ripple updates to related notes), contradictions (find conflicting claims), graph traversal, BM25 retrieval.
**Use it when:** "What do we know about X," archiving a great article, "find contradictions in my notes on Y."
**Watch out:** The lookup test gates entry: would you ever look this up by name? If not, it's not knowledge.
**The 20%:** Typed links are the difference between a pile and a brain — `contradicts` edges surface conflicts your notes were silently carrying.

### 26. ContextSearch (`/cs`)

**What it is:** The cold-start killer — instant recovery of any past thread of work.
**How it works:** Phase 1 scans four indexes in parallel (session registry, session names, work-directory names, ISA titles) and loads the top 3 ISA summaries — first 10 lines only, under 40 lines of output. Phase 2 (git histories) fires only if Phase 1 found fewer than 3 hits.
**Use it when:** "What did we do with X," "pick up where we left off," before starting anything that smells like it happened before.
**Watch out:** Session descriptions are AI-generated summaries — if results look thin, ask for the deep phase.
**The 20%:** Thirty seconds of `/cs` before any task beats re-explaining context for ten minutes. Your past work is indexed; use the index.

### 27. Telos

**What it is:** The reader/writer for your life files — mission, goals, beliefs, wisdom, books, challenges, predictions — the context every other skill silently aims by.
**How it works:** Update workflow edits any TELOS file with timestamped backups and change logging. A second mode analyzes whole project directories: dependency chains (PROBLEMS→GOALS→STRATEGIES→PROJECTS), McKinsey-style reports, slide-ready narrative points, interactive dashboards.
**Use it when:** "Add a goal," "update my beliefs," "what am I wrong about," quarterly reviews.
**Watch out:** This data is deeply personal — it never leaves into public repos or outputs.
**The 20%:** Stale goals = misaimed AI. Keep TELOS current and every skill in this document automatically points at what you actually want.

### 28. Interview (`/interview`)

**What it is:** The conversational deep-fill — a phased walkthrough that updates the system's entire model of you.
**How it works:** Four phases in leverage order: foundational TELOS (mission, goals, problems...) → ideal-state files (health, money, freedom...) → preferences (books, music, food...) → identity. Files ≥80% complete get Review mode (targeted "still accurate?" questions); sparse files get Fill mode. One question at a time, natural-language answers, automatic backups before bulk edits.
**Use it when:** Quarterly, after major life changes, or when answers feel "off."
**The 20%:** The system is only as good as its model of you — this is the maintenance ritual that keeps the model honest.

### 29. Migrate

**What it is:** The intake pipe for your pre-PAI life — old notes, journals, Notion/Obsidian exports, other AI configs.
**How it works:** Scans and chunks the source, classifies each chunk against the PAI taxonomy (TELOS files, knowledge, AI rules, identity), and presents a routing table with per-target confidence. ≥70% auto-approves, 40-70% asks, <40% walks through one by one. Every committed chunk carries a provenance comment; duplicates are detected.
**Use it when:** "Import my old notes," bringing in CLAUDE.md/.cursorrules from another setup, journal dumps.
**The 20%:** Years of scattered notes become system context in one approval session — the routing table is the whole interaction.

### 30. Daemon

**What it is:** Your public digital presence, generated from PAI and sanitized by code.
**How it works:** An aggregator reads PAI sources (missions, goals, books, project themes, ideas) into a profile; a deterministic SecurityFilter — pattern-matching code, not AI judgment — strips names, paths, and credentials, and structurally excludes contacts/finances/health entirely. Deploys as a static VitePress site to Cloudflare.
**Use it when:** "Update my daemon," "deploy daemon," sharing what you're building publicly.
**Watch out:** Site is static — changes require redeploying; new privacy patterns go in the filter code, not in prompts.
**The 20%:** A living "what I'm working on" page where privacy is enforced by code, not by hoping the AI is careful.

---

# Tier 5 — Builder & Meta

### 31. CreateSkill

**What it is:** The skill that makes skills — and proves they actually work.
**How it works:** Two tracks. Structure: scaffold new skills in canonical format, validate, canonicalize old ones. Effectiveness (the special part): TestSkill runs a with-skill agent vs a baseline agent on the same task in parallel and compares outputs; ImproveSkill diagnoses root causes; OptimizeDescription generates 20 should-trigger/shouldn't-trigger queries to tune activation.
**Use it when:** "Create a skill for X," "this skill isn't triggering," "test if this skill helps."
**Watch out:** Your own curriculum's warning applies: shipping skills nobody needs is the dopamine trap. Build skills for tasks you actually repeat.
**The 20%:** A/B the skill against no-skill before trusting it — most "improvements" don't survive that test, and the ones that do are real.

### 32. CreateCLI

**What it is:** Generator for production-grade TypeScript command-line tools — the form factor most PAI tooling takes.
**How it works:** Three tiers: Tier 1 zero-dependency manual parsing (~300-400 lines — fits 80% of cases), Tier 2 Commander.js (10+ subcommands), Tier 3 oclif (reference only). Every output ships complete: README, QUICKSTART, strict TypeScript, JSON output, proper exit codes.
**Use it when:** "Wrap this API in a CLI," "build a command-line tool for X."
**The 20%:** Start at Tier 1 — most real tools never need more than 300 dependency-free lines. Code before prompts is a PAI founding principle; this is how that code gets made.

### 33. Prompting

**What it is:** The meta-prompting library — prompts that write prompts. Structure is code, content is data.
**How it works:** Three pillars: Standards (Anthropic best practices distilled from 1,500+ papers), Handlebars templates (Briefing, Judge, Rubric, DynamicAgent...), and RenderTemplate.ts to render them. Cut PAI's own prompt weight 65% (53K → 18K tokens). Output is always a prompt for use elsewhere, never final content.
**Use it when:** "Write a system prompt for X," "optimize this prompt," building agent briefings or eval judges.
**Watch out:** Test generated prompts before declaring them ready — looking good and performing well are different things.
**The 20%:** Stop hand-writing prompts — render them from proven templates. (Your Amp project is this philosophy as a product.)

### 34. Delegation

**What it is:** The parallelization playbook — when one AI should become five.
**How it works:** Six patterns: built-in specialist agents, worktree-isolated agents (parallel file edits without conflicts), background agents (non-blocking), custom agents, coordinated teams with shared task lists, and parallel dispatch of N identical jobs. The decision rule: agents that must talk to each other → team; independent one-shots → subagents.
**Use it when:** 3+ genuinely independent workstreams. Below that, direct work wins.
**Watch out:** The logged lesson: "one agent that can read code AND write JSX beats three specialists who can't coordinate." Delegate breadth, never depth.
**The 20%:** Parallelism pays only when streams are independent — coordination overhead eats everything else.

### 35. Agents

**What it is:** The agent factory — composes custom AI personas from traits, and manages predefined teams.
**How it works:** ComposeAgent.ts merges expertise (security, research...), personality (skeptical, enthusiastic...), and approach (thorough, rapid...) into a unique prompt with its own ElevenLabs voice. Predefined teams (engineering, security, marketing...) come YAML-configured with roles and built-in tensions. Observer-team variant: read-only overseers that vote continue/halt against the audit log for risky unattended runs.
**Use it when:** "Spin up a skeptical security reviewer," "get the marketing team on this." Council uses this under the hood.
**Watch out:** Three different systems share the word "agent": this skill (custom personas), the Agent tool (one-off subagents), TeamCreate (coordination). Naming the right one matters.
**The 20%:** Personality + expertise composition is what makes multi-agent debate real instead of one model talking to itself in different fonts.

### 36. Evals

**What it is:** The measurement system — if you can't score it, you can't improve it.
**How it works:** Three grader types: code-based (deterministic, fast, cheap), model-based (LLM rubric for nuance), human (gold standard, for calibration). Scores via pass@k (capability — ~70% target) and pass^k (regression — ~99% target). Evaluates whole agent transcripts and tool-call sequences, not just final outputs. Hooks into ISA criteria for automated verification.
**Use it when:** "Compare these prompts," "did the model change regress anything," "create a judge for X."
**Watch out:** Single runs prove nothing — pass@3 minimum. Transcript capture must be on *before* the run.
**The 20%:** Code graders first, LLM judges second, humans to calibrate the judges. That ordering keeps evals cheap, fast, and honest.

### 37. Loop (`/loop`)

**What it is:** Iterative improvement with you in the loop — the same target refined across multiple full Algorithm cycles.
**How it works:** `/loop --target <path> --iterations N`. Each iteration is a complete OBSERVE→LEARN pass; each cycle's LEARN feeds the next cycle's OBSERVE; criteria evolve between passes. You review and redirect between iterations. `--resume` and `--status` manage long loops.
**Use it when:** Targets that get better in passes — a skill file, a design, a document — where each round deserves human judgment.
**Watch out:** Each cycle is a full Algorithm run — expensive. Set a clear exit condition or it runs forever.
**The 20%:** Iteration with checkpoints: the machine does full passes, you steer between them. For unsupervised improvement, that's Optimize.

### 38. Optimize (`/optimize`)

**What it is:** Autonomous hill-climbing — give it a number, walk away.
**How it works:** Two modes. Metric mode: any shell command that produces a number (Lighthouse score, bundle size, latency) becomes the fitness function — modify, measure, keep if better, discard if worse, repeat within budget. Eval mode: skills/prompts/agents judged by LLM-as-judge binary evals instead of a number. Regression tolerance protects secondary metrics.
**Use it when:** "Get this page under 2 seconds," "improve this skill's quality," any target with a measurable score.
**Watch out:** Hill-climbers get stuck on local peaks — if the score plateaus, restart from different conditions.
**The 20%:** Keep-if-better is the entire algorithm. Your only job is choosing a metric that actually means what you want.

### 39. PAIUpgrade (`/pu`)

**What it is:** The system's self-improvement scanner — what's new in AI, filtered to what YOUR setup should adopt.
**How it works:** Four parallel threads: a prior-work audit (reads your actual hooks/settings/ISAs and tags every idea NEW/PARTIAL/DONE with file:line evidence), your goals and projects, external sources (Anthropic releases, GitHub trending, your X bookmarks), and mined self-reflections. Output: a ranked discoveries table + tiered recommendations; already-implemented items are auto-skipped, never re-pitched.
**Use it when:** "/pu," "check my bookmarks," "what should I upgrade," monthly.
**Watch out:** A full check takes 5-7 minutes — run it in the background.
**The 20%:** It audits what you already have before recommending — so every suggestion is genuinely new to your system, with evidence.

### 40. BitterPillEngineering

**What it is:** The over-prompting auditor. Named for the bitter pill: as models get smarter, most of your clever instructions become dead weight.
**How it works:** Runs every rule through five questions — does Claude already do this? contradiction? redundant? one-off fix? vague? — and classifies each CUT / RESOLVE / MERGE / SHARPEN / KEEP, with token-savings estimates. Knows what's anti-fragile (verification harnesses, data pipelines, routing rules) vs fragile (format parsers, personality scales, abstract values).
**Use it when:** "Audit my CLAUDE.md," "is this rule dead weight," after any model upgrade.
**Watch out:** A rule that looks redundant may exist because the default kept failing — check the failure history before cutting.
**The 20%:** Every unnecessary rule competes for attention against the rules that matter. The test: would a smarter model make this rule pointless? Then cut it now.

---

# Tier 6 — Content & Media

### 41. Art

**What it is:** The visual factory — 20+ static formats through three image models (Flux 1.1 Pro, Nano Banana Pro, GPT-Image-1).
**How it works:** Named workflows per format: essay headers, Mermaid flowcharts, technical architecture diagrams, D3 dashboards, timelines, 2x2 matrices, comparisons, quote cards, comics, YouTube thumbnails, icons. Supports up to 14 reference images; `--remove-bg` for transparency. Everything stages to ~/Downloads/ for your preview before touching any project.
**Use it when:** "Make a diagram of X," thumbnails, blog art, "visualize this."
**Watch out:** Never straight into a repo — preview first (a hard rule born from real failures). The image gets visually confirmed with a Read before "done" is claimed.
**The 20%:** Name the format + the subject and let the workflow pick the model. Downloads is the airlock; nothing ships unseen.

### 42. Fabric

**What it is:** 240+ battle-tested prompt patterns from Daniel Miessler's Fabric project, runnable natively.
**How it works:** Patterns live as system.md files applied directly — extract_wisdom, summarize, create_threat_model, analyze_claims, improve_writing, review_code, judge_output... The CLI is only used for YouTube transcript extraction (`fabric -y URL`) and URL fallback. Patterns auto-sync from upstream.
**Use it when:** "Run extract_wisdom on this," "threat model this," "rate this content" — any moment a proven pattern beats improvising.
**Watch out:** Pattern names are exact: `extract_wisdom`, not `extractwisdom`.
**The 20%:** It's a prompt library you didn't have to write. Learn five patterns (extract_wisdom, summarize, analyze_claims, improve_writing, youtube_summary) and you've got most of the value.

### 43. ExtractWisdom

**What it is:** The smarter cousin of Fabric's extract_wisdom — it reads the content FIRST, then builds sections around what's actually in it.
**How it works:** Detects which wisdom domains are present — a security talk gets "Threat Model Insights," a business podcast gets "Contrarian Business Takes." Five depth levels (Instant → Comprehensive). Always includes a One-Sentence Takeaway and an "If You Only Have 2 Minutes" block. Spicy/contrarian takes are mandatory, never softened. YouTube via `fabric -y` first.
**Use it when:** "What's interesting in this video," podcast/article distillation, deciding if a 2-hour interview is worth your time.
**The 20%:** Sections fit the content, not a template — paste a link, read the 2-minute block, decide if the rest deserves you.

### 44. Webdesign

**What it is:** The web/UI design pipeline, driven by Anthropic's Claude Design tool.
**How it works:** Claude Design is web-only (no API), so this skill drives it through Interceptor in your authenticated session. Output is a handoff bundle — PROMPT.md, design tokens, components, assets — which the frontend-design plugin picks up automatically for production code. Works on existing apps, not just new ones.
**Use it when:** "Design a landing page," "redesign this dashboard," "polish this UI," design audits.
**Watch out:** Treat the handoff bundle as one unit (it's a directory); don't manually invoke the frontend plugin — it auto-activates.
**The 20%:** Design in Claude Design, hand the bundle to code. Interceptor is the only bridge, and the bundle is the contract.

### 45. Remotion

**What it is:** Video as React code — compositions, sequences, and motion graphics rendered to MP4.
**How it works:** All animation derives from `useCurrentFrame()` — never CSS animations. PAI theme constants keep visuals consistent with your Art aesthetic. Render: `bunx remotion render <id> ~/Downloads/<name>.mp4`. Supports ElevenLabs captions and Lambda cloud rendering.
**Use it when:** Explainer videos, animated shorts, content-to-video, anything that moves.
**Watch out:** Rendering is CPU-heavy — always background it. Static images belong to Art.
**The 20%:** If you can write a React component, you can ship motion graphics. Frame-based thinking is the only new idea to learn.

### 46. AudioEditor

**What it is:** AI cleanup for your recordings — fillers, false starts, stutters, and dead air removed without sounding robotic.
**How it works:** Pipeline: Whisper word-level transcription → Claude classifies every segment (KEEP / CUT_FILLER / CUT_FALSE_START / CUT_STUTTER / CUT_DEAD_AIR) → ffmpeg executes cuts with 40ms crossfades and room-tone fill. It distinguishes rhetorical pauses from accidents; breaths get quieted to 50%, not deleted. `--preview` shows proposed cuts first; optional Cleanvoice cloud polish for mouth sounds.
**Use it when:** Pitch recordings, lesson videos' audio, interview cleanup.
**Watch out:** Always preview — automated cuts can grab an intentional pause. Cloud polish uploads audio externally; confirm first for anything sensitive.
**The 20%:** `--preview` before commit is the whole discipline. Pairs naturally with PitchRehearsal: record, clean, score.

### 47. WriteStory

**What it is:** Fiction engineering — seven simultaneous narrative layers, built on Will Storr's Science of Storytelling.
**How it works:** Plans Meaning, Character Change, Plot, Mystery, World, Relationships, and Prose together in a Story Bible before any chapter gets written. Character arcs follow Storr's model: sacred flaw → crisis → transformation. Forsyth's rhetorical figures style the prose; an anti-cliché system bans stock AI patterns. Workflows: Interview → BuildBible → Explore → WriteChapter → Revise.
**Use it when:** Stories, novels, narrative content with real craft requirements.
**Watch out:** The Bible is the source of truth for continuity — read it before writing anything new in a series.
**The 20%:** Storr's insight runs the show: stories are about flawed characters' beliefs breaking. Build the Bible first; chapters are just execution.

### 48. Aphorisms

**What it is:** A curated quote database with memory — themes, attribution, context, and a record of where each quote was already used.
**How it works:** Four workflows: FindAphorism (match quotes to content themes, ranked with rationale), AddAphorism (parse, theme-tag, dedupe), ResearchThinker (deep-dive a philosopher, add sourced quotes), SearchAphorisms. Twelve+ theme categories from Stoicism to Risk.
**Use it when:** "Find a quote about discipline," speech openers, content garnish.
**Watch out:** Same idea in different words still counts as a duplicate; unattributed quotes are useless.
**The 20%:** Usage tracking is the killer feature — the right quote, never repeated to the same audience.

---

# Tier 7 — Situational

### 49. ArXiv

**What it is:** Direct line to academic AI research — search, retrieve, and digest papers without drowning.
**How it works:** Queries the arXiv API by title/abstract/author/category (cs.AI, cs.LG, cs.CL, cs.CR...) with boolean operators. The differentiator: AlphaXiv enrichment — AI-generated overviews per paper for fast triage. Workflows: Latest (new papers by category), Search, Paper (deep-dive one ID).
**Use it when:** "Latest LLM papers," "find the paper on X," checking if a claim has academic backing.
**Watch out:** Returns XML not JSON; 3-second rate limit between calls; AlphaXiv summaries are AI-generated — verify before citing.
**The 20%:** AlphaXiv summary first, full paper only if it survives triage. Don't read papers; filter them.

### 50. USMetrics

**What it is:** 68 live US economic and social indicators from five government APIs — a finance major's superpower.
**How it works:** Pulls FRED, EIA, Treasury, BLS, and Census across ten categories: growth, inflation, employment, housing, consumer finance, markets, trade, fiscal, demographics, health. UpdateData refreshes the dataset; GetCurrentState produces a multi-timeframe (10y/5y/2y/1y) trend report with cross-category patterns.
**Use it when:** "How's the economy," macro context for a thesis (feeds ThesisVault well), class arguments needing real numbers.
**Watch out:** Publication lag is real — GDP is quarterly and revised. Correlation across metrics is suggestive, never causal.
**The 20%:** One command turns "I feel like the economy is X" into dated government data across 68 indicators.

### 51. WorldThreatModel

**What it is:** A stress-test of any idea, strategy, or investment against 11 time horizons from 6 months to 50 years.
**How it works:** Each horizon is a maintained ~10-page world model (geopolitics, tech, economics, society, wildcards). TestIdea runs your input across all 11 and returns a probability-weighted scenario matrix. Three tiers: Fast (~2 min, one agent), Standard (~10 min, 11 parallel horizon agents + RedTeam + FirstPrinciples), Deep (up to an hour). Internally orchestrates four other thinking skills.
**Use it when:** Career bets, long-horizon investments, "does this business still make sense in 2035."
**Watch out:** These are scenarios with probability ranges, not predictions. Models decay — update after major world events.
**The 20%:** Anything you'll spend years on deserves ten minutes against eleven futures. The matrix shows which scenarios kill the idea.

### 52. PrivateInvestigator

**What it is:** Ethical people-finding — 15 parallel research agents across people-search aggregators, social platforms, public records, and reverse lookups.
**How it works:** Five agent types × 3 each = 45 concurrent threads: TruePeopleSearch/Spokeo-class aggregators, social x-ray searches, county records and court portals, phone/email/image reverse lookups, username enumeration. Results are confidence-scored and require 3+ matching identifiers before anything counts as found.
**Use it when:** Reconnecting with lost contacts, verifying someone is who they claim, "who owns this number," due-diligence basics.
**Watch out:** Hard ethical stops — the skill terminates if intent shifts toward harassment. Single-source results are unreliable by policy.
**The 20%:** It's a verification machine: three independent matching identifiers or it isn't a match. Use it before trusting strangers with money.

---

# Platform Built-ins (came with Claude Code, not in ~/.claude/skills/)

- **superpowers** — disciplined engineering workflows that auto-fire: brainstorming before building, test-driven development, systematic debugging, plan writing/execution, verification-before-completion, parallel dispatch, git worktrees. **The 20%:** they enforce the discipline you'd skip under pressure; let them fire.
- **code-review / pr-review-toolkit** — multi-agent code review: correctness, test coverage, silent failures, type design. **The 20%:** run before any merge that matters.
- **Docs plugins (context7, microsoft-docs, claude-api)** — current official docs on demand instead of stale training data. **The 20%:** library questions go to docs, not memory.
- **Harness utilities** — /verify (prove a change works), /run (launch the app), /simplify (clean changed code), /schedule (cron cloud agents), update-config (settings/hooks). **The 20%:** /verify after every change you care about.
- **Shortcuts** — `/cs` (ContextSearch), `/pu` (PAIUpgrade), `/e1`–`/e5` (force effort tier on any message). **The 20%:** `/e1` when you want fast, `/e3`+ when you want thorough.

---

## How to actually learn these (from your own curriculum)

Interleaved daily exposure beats deep-diving one skill a week — 3x retention (Rohrer & Taylor 2007). Touch a different tier each day, always on a REAL task: a prospect, the offer, a recording, a thesis. The only honest metric is artifacts you actually use. Start with Tier 1 — it pays for the rest.
