📐 The Rules Behind The AI

How this workspace is governed

Rules structure in .cursor/rules. Why it exists. How to use it without reading 47 MDC files.

(Yes, rules are the reason AI doesn't break your code every 5 minutes. You're welcome. 😌)

Common + Tech + CI/CD

How it's organized

🗂️

CommonCI/CDTech-specific

Common rules stay always-on, CI/CD covers releases/cost/scaling, tech rules switch on via project-config.

What it includes

Common: bugfix/refactor flows, Definition of Done.

CI/CD: release/checklists, pipelines, cost & scaling.

Tech: per-stack best practices (React, .NET, Java, etc.).

Config: project-config.mdc decides what loads.

Handler: activates only enabled stacks.

Common: workflows + Definition of Done for everyone.

CI/CD: release mgmt, cost, scaling, pipeline templates.

Tech: only loads when enabled (React, .NET, Java, etc.).

Consistency & speed

Why this setup

🚀

ConsistencySafetySpeed

Shared workflows cut regressions; optional tech rules keep noise low; pipelines and cost guides avoid wasted time.

What it includes

Consistent review steps for every change.

Mandatory tests/linting in bugfix flow.

Refactor guidance to prevent regressions.

Templates that ship faster with less setup.

Consistency: same expectations for all collaborators.

Safety: bugfix/refactor flows enforce tests and linting.

Speed: ready templates + scaling patterns shorten setup.

Simple flow

Everyday use

📅

BugfixRefactorPipelines

Check active stacks, follow the workflow, adapt a pipeline, and apply the matching tech standards.

What it includes

Bugfix steps: reproduce → locate → minimal fix → test → validate.

Refactor steps: plan impact → small increments → keep tests green.

CI/CD pick-list: choose platform template, then tailor.

Standards: use the rules of the stacks you enabled.

Before coding: verify enabled stacks in project-config.

Bugfix: reproduce → locate → minimal fix → test → validate.

Refactor: plan impact → keep tests green → validate.

Pipelines: pick a CI/CD template, then tailor.

Add, don't break

Extend safely

🧩

ExtensibleDocumented

Add new tech rules under `.cursor/rules/tech/<stack>/`, list them in project-config, and document overrides.

What it includes

Add tech folders without touching common rules.

Update project-config.mdc to activate.

Document overrides on /rules so the team aligns.

Prefer additive over destructive changes.

Add tech rules per stack; keep common rules intact.

Document overrides on a /rules page for the team.

Prefer additive changes to preserve shared assumptions.

Leaderboard updates

Benchmark refresh rule

🏆

CodeClashSWE-benchUpdate

Keep benchmark pages (homepage snapshots + dedicated pages) synced with external leaderboards: CodeClash for goal-oriented coding and SWE-bench for real-world GitHub issues.

What it includes

🎯 CodeClash Source: https://codeclash.ai/ (mirrored into data/benchmark.json).

Methodology: "Goals, not tasks" — real software development is goal-driven, not isolated issue-solving.

Two-phase approach: Edit phase (models improve codebase) + Compete phase (arena battles).

Scale: 8 models × 6 arenas × 1680 tournaments × 15 rounds each = 25,200 rounds total, generating 50k agent trajectories.

Arenas: Halite, Poker, CoreWar, RobotRumble, Robocode, BattleSnake (each testing different strategic and coding skills).

Insights: Models accumulate tech debt rapidly; humans still beat best LLMs in some arenas; progress over perfection mindset.

Data structure: Overall ELO + per-arena breakdown + methodology details + key insights.

🔧 SWE-bench Source: https://www.swebench.com/ (mirrored into data/swebench.json).

Benchmark: Evaluates models on 2,294 real-world software engineering problems from 12 popular Python repositories.

Variants: Full (2294), Verified (500 human-filtered), Lite (300 cost-efficient), Bash Only (500 mini-SWE-agent), Multimodal (517 with visuals).

Metric: % Resolved — percentage of GitHub issues successfully fixed by the model.

Real tasks: Actual issues from Django, Flask, Matplotlib, Pandas, Scikit-learn, Requests, etc.

Context understanding: Tests models' ability to navigate complex codebases and make appropriate changes.

Data structure: Rank + model + % resolved + organization + date + release version.

⚙️ Update Process:

To refresh CodeClash: update data/benchmark.json (or wire getBenchmarkData to live JSON) and rebuild.

To refresh SWE-bench: update data/swebench.json (or wire getSWEBenchData to live JSON) and rebuild.

Track lastUpdated + source URL so viewers know staleness.

Do not scrape automatically without permission; prefer a published JSON feed or manual update.

Both benchmarks appear on homepage as condensed leaderboards with bar charts and logos.

Full details available at /benchmark (CodeClash) and linked to external swebench.com (SWE-bench).

Trigger: when asked "update codeclash", "update swebench", or "update benchmarks", refresh appropriate data files and redeploy.

CodeClash: goal-oriented coding benchmark with ELO ratings across competitive arenas.

SWE-bench: real-world GitHub issue resolution benchmark across Python repositories.

Scope: leaderboard content on homepage snapshots + dedicated pages; keep formatting consistent.

Data files: data/benchmark.json (CodeClash) and data/swebench.json (SWE-bench).

How this workspace is governed

How it's organized

Why this setup

Everyday use

Extend safely

Benchmark refresh rule

Where to dive deeper

Implementation details

Rules & prompts

Documentation hub