Architecture tests that fail CI, and the violations they cannot catch

Every codebase starts with an architecture diagram everyone agrees on. Six months later the diagram is a lie. Not because anyone decided to abandon it, but because conventions decay one reasonable-looking commit at a time. A vertical-slice codebase is especially exposed: each slice is supposed to be an island, but the moment one slice imports a type from a sibling because it was right there and saving a few minutes felt harmless, the islands grow a bridge. Do that a dozen times and the clean set of independent slices is quietly a tangle, and nobody can point to the commit where it happened.

The usual defenses leak for the same reason. A doc that says slices must not depend on each other is the cheapest defense and the first to rot, because a document cannot fail a pull request. Code review catches the cross-slice import right up until the reviewer is busy, the diff is large, or the violation is buried three files deep in an otherwise good change. Both are advisory, and the whole problem is that eventually nobody is paying attention.

Boundaries as a build-failing test

Slicekit takes the boundary rules and writes them as executable tests in Slicekit.Architecture.Tests, using NetArchTest to assert against the compiled dependency graph. This is the architectural fitness function idea from Building Evolutionary Architectures: an automated check that measures whether a change moved the system closer to or further from a desired architectural characteristic. Here the characteristic is dependency direction, and the check fails the build when the direction is wrong.

You do not write a new test per feature. The existing rules scan every type in Slicekit.Core and fail if your slice breaks one. The two that bite most often:

Feature_Slices_Must_Not_Depend_On_Each_Other walks each feature namespace and asserts it has no dependency on any other feature. Reference a type from a sibling slice and this fails, naming the offending type. The fix is to share through Slicekit.Core.Domain, not across feature folders.
Domain_Must_Not_Depend_On keeps the domain model free of EF Core, Identity, Wolverine and HTTP, with one case per forbidden dependency.

The shift is small but total. A boundary violation is no longer a comment you might get in review three days later, or a paragraph in a doc nobody opened. It is a red build that points at the exact type. The convention stops depending on goodwill, because the only way to merge is to respect the boundary, and the architecture is now guarded by the same machinery that guards correctness.

Where these tests sit, and why one layer needs a real database

Architecture tests are only useful if they run constantly, which means they have to be fast. Slicekit splits its tests across four projects, and the split is the classic test-pyramid tradeoff: many cheap tests at the wide base, a few expensive ones at the narrow top.

Slicekit.Unit.Tests

pure logic: aggregates, handlers with mocked ports, validators

milliseconds, run on every save

Slicekit.Architecture.Tests

NetArchTest rules: slices stay isolated, layers stay clean

milliseconds, no Docker

A crossed layer or slice boundary turns the CI build red and names the offending type.

Slicekit.Feature.Tests

integration against real Postgres via Testcontainers, no HTTP

seconds, a container per fixture

Slicekit.Api.Tests

HTTP end to end through the host

slowest, fewest

A convention only holds if the build enforces it: the wide base of fast tests is where you live, and the Architecture band is the one that fails CI the moment a boundary is crossed.

The base is Slicekit.Unit.Tests: pure logic, aggregate invariants, handlers with mocked ports, validators. Milliseconds each, no infrastructure. Directly above sit the architecture tests, also Docker-free and millisecond-fast, which is the point: because they live in the fast loop you run on every change, the boundary feedback is immediate rather than something CI surfaces hours later.

dotnet test api/tests/Slicekit.Unit.Tests api/tests/Slicekit.Architecture.Tests --nologo

Higher up, the tests get slower and you write fewer of them. Slicekit.Feature.Tests runs integration tests against a real Postgres through Testcontainers, and Slicekit.Api.Tests exercises the HTTP path end to end through the host. The reason to pay for a real container is honesty. When a slice touches the database, the cheap thing is to mock AppDbContext and assert against a fiction, and the trouble with that fiction is that it always agrees with you. A mock does not enforce a unique constraint, does not run a migration, and does not behave like the query translator in production. A test that only talks to a mock proves your code is internally consistent with your assumptions, which is precisely the thing most likely to be wrong. So DatabaseFixture spins up one real Postgres container per xUnit collection, applies every migration, and seeds the permission catalog. You still mock the ports you do not own, like IAuditService, and run the real thing for what you do. That is the difference between a test that passes and a test that tells the truth, and it is why you want exactly enough of these, not hundreds.

What these tests cannot catch

Here is the part the tooling will not tell you, and the part that matters most. A green architecture test is a floor, not proof of clean architecture. NetArchTest reasons about the static, compile-time dependency graph: which type references which other type after the compiler is done. That is a narrow question, and several real coupling problems sit entirely outside it.

It cannot see coupling introduced at runtime. If one slice reaches another through reflection, resolves a sibling’s service out of the DI container by interface, dispatches dynamically, or talks to it through a message contract on the bus, there is no static reference for the rule to find. The dependency is real, your slices are entangled, and the build stays green. Messaging indirection in particular is easy to read as decoupling when it is just coupling you cannot see in the type graph.

It also says nothing about whether the design inside a slice is any good. Feature_Slices_Must_Not_Depend_On_Each_Other passing means no slice imports another slice. It does not mean the handler is cohesive, the aggregate protects its invariants, the names make sense, or the abstraction earns its keep. A slice can be a 600-line god handler with five responsibilities and still satisfy every architecture rule, because none of those are dependency-direction questions.

So read a passing fitness function for exactly what it asserts: no obvious, statically visible boundary violation. That is genuinely valuable, because static cross-slice imports are the most common way these architectures rot, and catching them automatically removes a whole class of drift. But it is not a substitute for design review, for the unit tests that pin down behavior, or for the integration tests that prove the slice works against a real schema. The architecture tests guard one boundary well. The judgment about whether the thing inside the boundary is worth keeping stays yours.

The payoff still compounds. Every new slice inherits the static guard for free and the integration tests stay honest because they never stopped talking to a real database, so the codebase that usually drifts into a tangle stays a set of clean, independent slices, as long as you remember what the green checkmark is and is not promising. For the full walkthrough, see the feature testing guide.