An AI can navigate this codebase. Here is where it still cannot be trusted

An AI coding assistant is only as good as the codebase you point it at. Aim one at a sprawling, layered system with hidden conventions and implicit boundaries, and you get confident, plausible code that compiles and is quietly wrong. Aim it at a codebase with an obvious shape and fast feedback, and it lands changes you can review in minutes. The catch, and the part the hype skips, is that the feedback only verifies a narrow slice of what “correct” means. This post is about both halves: where an agent genuinely thrives in this codebase, and exactly where the automated guardrails stop and your judgment has to take over.

Why predictable structure helps a model

The single idea is the vertical slice. Instead of smearing one feature across a stack of horizontal layers (controllers, then services, then repositories, then models), Slicekit keeps a feature in one folder you can read, change, or delete as a unit. A new engineer learns one slice and infers the rest. A coding agent gets the same gift: a concrete, working template to mirror instead of a blank page to guess at. The best prompt in this codebase is a pointer, not a paragraph.

Add a feature like Features/ApiKeys/CreateApiKey for X

That one line does more than a careful description would, because it hands the model an example with the wiring already correct. And because Slicekit is full-stack, the slice does not stop at the API boundary. The same feature has a mirrored shape on the frontend, so the layout an agent learned on one side is the layout it finds on the other.

api/ · Features/ApiKeys/CreateApiKey/

Command.cs the request shape
Handler.cs domain logic
Validator.cs input rules
Result.cs the response shape

frontend/ · features/api-keys/

api.ts typed client call
hooks.ts query + mutation
schemas.ts zod, localized
components/ the UI

Learn one slice and you can navigate every slice; the layout is the same on either side.

When every feature lives in the same place with the same parts, the search space collapses. The model spends its tokens on your actual problem instead of reverse-engineering where things go. On top of the layout, AGENTS.md router files load the right per-side conventions before the agent writes a line, and the codebase’s types give it a contract to follow rather than guess at. None of this is AI-specific magic. It is the same legibility that helps the next human, pointed at a reader with a context window instead of a pulse.

What the guardrails actually catch

Predictability gets a model moving in the right direction. Three mechanical guardrails then catch its mistakes fast, and it is worth being precise about which mistakes:

Types catch wrong and renamed fields. Within each side, TypeScript and C# mean a hallucinated property or a stale field name fails at compile time, not in production. Fast, precise, and free. (The client’s types mirror the API by hand rather than by codegen, so the wire itself is not auto-checked; that is its own honest caveat, covered in the typed client post.)
Architecture tests catch boundary violations. NetArchTest-backed checks fail the build the moment a slice reaches across a boundary it should not. An agent has no feel for your architecture, and with these it does not need one. The technique itself is worth its own read: architecture tests that guard boundaries.
The build catches what does not compile. The lowest bar, but a real one. A surprising amount of confident AI output does not survive contact with the compiler.

Notice the shape of that list. Every item is a structural check. They verify that the code is well-formed, wired correctly, and inside the lines you drew. That is genuinely valuable, because it makes an agent’s output cheap to check instead of expensive to untangle. But cheap to check is not the same as correct.

The honest limit

Here is the part the demos leave out. A green build tells you the code is structurally sound. It tells you nothing about whether the code is right.

No automated check in this repo verifies your domain logic. Architecture tests will happily wave through a handler that computes the wrong total, applies a discount twice, or transitions an aggregate into a state your business rules forbid. The types are satisfied; the math is not.

Nothing catches the missing edge case. The agent handled the happy path and three of the five failure modes, and the two it skipped are exactly the ones that page you at 2am. The compiler has no opinion on the cases you never wrote.

Nothing reasons about security for you. Whether an endpoint checks the right permission, whether a query leaks another tenant’s rows, whether user input reaches somewhere it should not: these are judgment calls. A type system cannot tell you that an authorization check is correct, only that it compiles.

Nothing flags a performance regression. The N+1 query the agent introduced passes every test in the suite. It just gets slow once real data shows up.

And no check on earth tells you whether this is the right feature. An agent will build precisely what you asked for, including when what you asked for was a mistake. Direction is yours alone.

So the boundary is sharp. The guardrails catch a specific, narrow class of errors: wrong fields, crossed boundaries, broken builds. They do not catch domain-logic errors, missing edge cases, security reasoning, performance regressions, or wrong-feature errors. Those still require a human who understands the behavior reading the diff and asking whether it is correct, not just whether it is green. The structure makes that review fast. It does not make it optional.

In the loop, not on autopilot

This is why Slicekit is built with AI in the loop and never on autopilot. The workflow that holds up is narrow and repeatable: the agent lands a slice, the type system and architecture tests catch the structural mistakes within seconds, and a human reviews the behavior and the design. The agent does the typing; the engineer owns the correctness. Each does the part it is actually good at.

The failure mode is treating the green build as the finish line. An agent turned loose on a shapeless codebase with no boundaries and no tests is a liability dressed as velocity, because there is nothing to catch the structural mistakes and nothing to make the behavioral review fast. But an agent turned loose on a well-structured codebase, with a human who mistakes “it compiles” for “it works,” is the quieter version of the same liability. The guardrails make the output cheap to check. They never make it automatically correct, and pretending otherwise is how the plausible-but-wrong code ships.

Used honestly, the combination is genuinely faster than either half alone. The structure points the model in the right direction, the checks catch the mechanical errors before they reach you, and your review lands where it actually matters: on the logic, the edges, the security, and the question of whether you are building the right thing at all.

If that is how you want to work, read AI-assisted development for how to set your agent up to get the most out of this codebase, with the same caveats kept firmly in view.