📐 The Rules Behind The AI

How this workspace is governed

Rules structure in .cursor/rules. Why it exists. How to use it without reading 47 MDC files.

(Yes, rules are the reason AI doesn't break your code every 5 minutes. You're welcome. 😌)

Common + Tech + CI/CD

How it's organized

🗂️
CommonCI/CDTech-specific

Common rules stay always-on, CI/CD covers releases/cost/scaling, tech rules switch on via project-config.

What it includes

Common: bugfix/refactor flows, Definition of Done.
CI/CD: release/checklists, pipelines, cost & scaling.
Tech: per-stack best practices (React, .NET, Java, etc.).
Config: project-config.mdc decides what loads.
Handler: activates only enabled stacks.
Common: workflows + Definition of Done for everyone.
CI/CD: release mgmt, cost, scaling, pipeline templates.
Tech: only loads when enabled (React, .NET, Java, etc.).

Consistency & speed

Why this setup

🚀
ConsistencySafetySpeed

Shared workflows cut regressions; optional tech rules keep noise low; pipelines and cost guides avoid wasted time.

What it includes

Consistent review steps for every change.
Mandatory tests/linting in bugfix flow.
Refactor guidance to prevent regressions.
Templates that ship faster with less setup.
Consistency: same expectations for all collaborators.
Safety: bugfix/refactor flows enforce tests and linting.
Speed: ready templates + scaling patterns shorten setup.

Simple flow

Everyday use

📅
BugfixRefactorPipelines

Check active stacks, follow the workflow, adapt a pipeline, and apply the matching tech standards.

What it includes

Bugfix steps: reproduce → locate → minimal fix → test → validate.
Refactor steps: plan impact → small increments → keep tests green.
CI/CD pick-list: choose platform template, then tailor.
Standards: use the rules of the stacks you enabled.
Before coding: verify enabled stacks in project-config.
Bugfix: reproduce → locate → minimal fix → test → validate.
Refactor: plan impact → keep tests green → validate.
Pipelines: pick a CI/CD template, then tailor.

Add, don't break

Extend safely

🧩
ExtensibleDocumented

Add new tech rules under `.cursor/rules/tech/<stack>/`, list them in project-config, and document overrides.

What it includes

Add tech folders without touching common rules.
Update project-config.mdc to activate.
Document overrides on /rules so the team aligns.
Prefer additive over destructive changes.
Add tech rules per stack; keep common rules intact.
Document overrides on a /rules page for the team.
Prefer additive changes to preserve shared assumptions.

Leaderboard updates

Benchmark refresh rule

🏆
CodeClashSWE-benchUpdate

Keep benchmark pages (homepage snapshots + dedicated pages) synced with external leaderboards: CodeClash for goal-oriented coding and SWE-bench for real-world GitHub issues.

What it includes

🎯 CodeClash Source: https://codeclash.ai/ (mirrored into data/benchmark.json).
Methodology: "Goals, not tasks" — real software development is goal-driven, not isolated issue-solving.
Two-phase approach: Edit phase (models improve codebase) + Compete phase (arena battles).
Scale: 8 models × 6 arenas × 1680 tournaments × 15 rounds each = 25,200 rounds total, generating 50k agent trajectories.
Arenas: Halite, Poker, CoreWar, RobotRumble, Robocode, BattleSnake (each testing different strategic and coding skills).
Insights: Models accumulate tech debt rapidly; humans still beat best LLMs in some arenas; progress over perfection mindset.
Data structure: Overall ELO + per-arena breakdown + methodology details + key insights.
🔧 SWE-bench Source: https://www.swebench.com/ (mirrored into data/swebench.json).
Benchmark: Evaluates models on 2,294 real-world software engineering problems from 12 popular Python repositories.
Variants: Full (2294), Verified (500 human-filtered), Lite (300 cost-efficient), Bash Only (500 mini-SWE-agent), Multimodal (517 with visuals).
Metric: % Resolved — percentage of GitHub issues successfully fixed by the model.
Real tasks: Actual issues from Django, Flask, Matplotlib, Pandas, Scikit-learn, Requests, etc.
Context understanding: Tests models' ability to navigate complex codebases and make appropriate changes.
Data structure: Rank + model + % resolved + organization + date + release version.
⚙️ Update Process:
To refresh CodeClash: update data/benchmark.json (or wire getBenchmarkData to live JSON) and rebuild.
To refresh SWE-bench: update data/swebench.json (or wire getSWEBenchData to live JSON) and rebuild.
Track lastUpdated + source URL so viewers know staleness.
Do not scrape automatically without permission; prefer a published JSON feed or manual update.
Both benchmarks appear on homepage as condensed leaderboards with bar charts and logos.
Full details available at /benchmark (CodeClash) and linked to external swebench.com (SWE-bench).
Trigger: when asked "update codeclash", "update swebench", or "update benchmarks", refresh appropriate data files and redeploy.
CodeClash: goal-oriented coding benchmark with ELO ratings across competitive arenas.
SWE-bench: real-world GitHub issue resolution benchmark across Python repositories.
Scope: leaderboard content on homepage snapshots + dedicated pages; keep formatting consistent.
Data files: data/benchmark.json (CodeClash) and data/swebench.json (SWE-bench).