DEV_NET_CORE
GET_STARTED
.NETTesting strategy and integration testing

Code Coverage, Useful Assertions, Flaky Test Prevention, and CI Test Execution

Overview

Code coverage, assertions, flaky test prevention, and CI test execution are core parts of a practical testing strategy. They help teams understand how much code is exercised, whether tests verify meaningful behavior, whether test results are trustworthy, and whether automated checks can protect the codebase during pull requests and deployments.

Code coverage measures which parts of the code run during tests. It can show lines, branches, methods, and sometimes conditions covered by tests. However, coverage only tells whether code was executed. It does not prove that the test checked the right behavior. A test can execute many lines and still assert almost nothing useful.

Useful assertions are what make tests valuable. Good assertions check observable behavior, outputs, state changes, HTTP responses, database effects, events, exceptions, logs when relevant, and important boundary cases. Weak assertions only check that code does not throw or that a response is 200 OK without verifying the actual result.

Flaky tests are tests that sometimes pass and sometimes fail without a relevant code change. They are dangerous because they reduce trust in the test suite. Once developers believe CI is unreliable, they may ignore real failures. Flaky tests are common in integration and end-to-end tests, but they can also happen in unit tests when tests depend on time, randomness, shared state, parallel execution, external services, or ordering.

CI test execution is the process of running tests automatically in a continuous integration pipeline. A good CI pipeline restores dependencies, builds the solution, runs tests, collects test results, collects code coverage, publishes reports, enforces quality gates, and provides useful failure diagnostics. A poor CI pipeline is slow, unreliable, hard to debug, or too easy to bypass.

This topic matters because automated tests are only useful when they are trustworthy. Interviewers often ask:

  • What does code coverage measure?
  • Is high code coverage always good?
  • What is the difference between line coverage and branch coverage?
  • What makes an assertion useful?
  • How do you avoid brittle assertions?
  • What is a flaky test?
  • How do you prevent flaky tests?
  • Should CI retry failed tests?
  • How do you run .NET tests in CI?
  • How do you publish TRX and coverage reports?
  • How do you decide which tests run on every pull request?
  • How do you make integration and E2E tests reliable in CI?
  • How do you handle slow test suites?

A strong answer should explain that coverage is a signal, not a goal by itself. Good tests assert meaningful behavior. CI should fail for real problems, produce useful diagnostics, and keep the feedback loop fast enough for developers to trust it.

Core Concepts

What Code Coverage Means

Code coverage measures how much production code was executed by tests.

Common coverage types:

Coverage TypeMeaning
Line coveragePercentage of executable lines run by tests
Branch coveragePercentage of decision branches run by tests
Method coveragePercentage of methods called by tests
Statement coveragePercentage of statements executed
Condition coveragePercentage of boolean conditions evaluated in different ways
Path coveragePercentage of possible execution paths covered

Example:

Code
public decimal CalculateDiscount(decimal total)
{
    if (total >= 1000)
    {
        return total * 0.10m;
    }

    return total * 0.02m;
}

A test that checks only total = 1500 executes the true branch but not the false branch. Line coverage may look decent, but branch coverage shows that one path is missing.

Better tests:

Code
public sealed class DiscountCalculatorTests
{
    [Fact]
    public void CalculateDiscount_WhenTotalIsAtLeast1000_ReturnsTenPercent()
    {
        var calculator = new DiscountCalculator();

        var discount = calculator.CalculateDiscount(1500m);

        Assert.Equal(150m, discount);
    }

    [Fact]
    public void CalculateDiscount_WhenTotalIsLessThan1000_ReturnsTwoPercent()
    {
        var calculator = new DiscountCalculator();

        var discount = calculator.CalculateDiscount(500m);

        Assert.Equal(10m, discount);
    }
}

These tests cover both branches and verify the expected behavior.

What Code Coverage Does Not Prove

Code coverage does not prove correctness.

A test can have high coverage but poor assertions.

Bad test:

Code
[Fact]
public void CalculateDiscount_DoesNotThrow()
{
    var calculator = new DiscountCalculator();

    calculator.CalculateDiscount(1500m);
}

This test executes code but does not verify the result.

Another weak test:

Code
[Fact]
public void CalculateDiscount_ReturnsSomeValue()
{
    var calculator = new DiscountCalculator();

    var result = calculator.CalculateDiscount(1500m);

    Assert.True(result >= 0);
}

This assertion is too broad. Many wrong implementations would pass.

Better:

Code
[Fact]
public void CalculateDiscount_WhenTotalIs1500_Returns150()
{
    var calculator = new DiscountCalculator();

    var result = calculator.CalculateDiscount(1500m);

    Assert.Equal(150m, result);
}

Coverage tells you what was executed. Assertions tell you whether the behavior was verified.

Line Coverage vs Branch Coverage

Line coverage measures whether lines executed.

Branch coverage measures whether decision paths executed.

Example:

Code
public string GetRiskLevel(int score)
{
    if (score >= 80)
    {
        return "High";
    }

    return "Normal";
}

One test:

Code
[Fact]
public void GetRiskLevel_WhenScoreIs90_ReturnsHigh()
{
    var service = new RiskService();

    var result = service.GetRiskLevel(90);

    Assert.Equal("High", result);
}

This test may cover most lines, but it covers only one branch. It does not check the Normal path.

Add another test:

Code
[Fact]
public void GetRiskLevel_WhenScoreIs50_ReturnsNormal()
{
    var service = new RiskService();

    var result = service.GetRiskLevel(50);

    Assert.Equal("Normal", result);
}

Branch coverage is often more useful than line coverage for business logic because it reveals untested decision paths.

Coverage Thresholds

A coverage threshold is a minimum coverage percentage required by the build.

Example goals:

Code
Line coverage >= 80%
Branch coverage >= 70%
No decrease in coverage for changed code
Critical modules >= 90%

Coverage thresholds can be useful, but they can also be harmful if used blindly.

Benefits:

  • Prevents coverage from silently dropping.
  • Encourages testing new code.
  • Gives a measurable quality signal.
  • Helps identify untested areas.
  • Makes test discipline visible in CI.

Risks:

  • Developers may write shallow tests just to satisfy a percentage.
  • High coverage can create false confidence.
  • Some code is hard or low-value to test directly.
  • Generated code can distort metrics.
  • Integration code may be covered differently from domain logic.
  • Teams may optimize for numbers instead of risk reduction.

A good strategy is to use coverage thresholds as a guardrail, not as the only measure of test quality.

Meaningful Coverage Targets

Not all code deserves the same coverage target.

High coverage is valuable for:

  • Domain rules.
  • Financial calculations.
  • Authorization rules.
  • Validation logic.
  • Security-sensitive code.
  • Complex branching logic.
  • Data transformations.
  • Error handling.
  • Public API contract behavior.
  • Regression-prone areas.

Lower direct coverage may be acceptable for:

  • Simple DTOs.
  • Auto-generated code.
  • Thin framework glue.
  • Configuration-only code.
  • Boilerplate.
  • Migrations.
  • UI layout details.
  • Code better verified through integration tests.

A mature testing strategy focuses coverage expectations by risk.

Example:

Code
Payment calculation: high branch coverage expected.
DTO property class: no direct unit test required.
Controller routing: integration test preferred.
External gateway adapter: fake/mocked unit tests plus contract/integration tests.

Collecting Code Coverage in .NET

A common .NET approach is to use dotnet test with Coverlet's cross-platform coverage collector.

Command:

Code
dotnet test --collect:"XPlat Code Coverage"

This produces coverage output, commonly in Cobertura XML format, under a TestResults directory.

You can also produce TRX test results:

Code
dotnet test \
  --configuration Release \
  --logger "trx" \
  --results-directory ./TestResults \
  --collect:"XPlat Code Coverage"

A test project usually references packages like:

Code
<ItemGroup>
  <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.14.0" />
  <PackageReference Include="coverlet.collector" Version="6.0.4" />
  <PackageReference Include="xunit" Version="2.9.3" />
  <PackageReference Include="xunit.runner.visualstudio" Version="3.1.1" />
</ItemGroup>

Package versions change over time, so keep them aligned with the SDK and test framework version used by the project.

Generating Human-Readable Coverage Reports

Raw coverage XML is useful for tools, but developers usually need readable reports.

A common tool is ReportGenerator.

Install:

Code
dotnet tool install --global dotnet-reportgenerator-globaltool

Generate report:

Code
reportgenerator \
  -reports:"./TestResults/**/coverage.cobertura.xml" \
  -targetdir:"./CoverageReport" \
  -reporttypes:"Html;Cobertura"

This can produce:

  • HTML report for local review.
  • Cobertura XML for CI systems.
  • Summary reports.
  • Badges or history if configured.

A useful coverage workflow is:

  1. Run tests.
  2. Collect coverage.
  3. Generate a report.
  4. Publish the report in CI.
  5. Enforce reasonable thresholds.
  6. Review uncovered high-risk areas.

Excluding Code from Coverage

Some code may be excluded from coverage when direct testing provides little value.

Example:

Code
using System.Diagnostics.CodeAnalysis;

[ExcludeFromCodeCoverage]
public sealed class GeneratedDto
{
    public string Name { get; set; } = string.Empty;
}

Common exclusions:

  • Generated code.
  • Designer files.
  • Simple DTOs.
  • Migrations.
  • Program startup boilerplate, depending on strategy.
  • Code covered indirectly through integration tests but noisy in unit coverage.
  • Third-party generated clients.

Use exclusions carefully. Do not exclude difficult code just to improve coverage numbers.

Useful Assertions

An assertion should verify behavior that matters.

Useful assertions check:

  • Return values.
  • State changes.
  • Exceptions.
  • Error messages when part of the contract.
  • HTTP status codes.
  • Response bodies.
  • Response headers.
  • Database effects.
  • Published events.
  • Calls to important dependencies.
  • Logs when logs are part of the operational contract.
  • Time-sensitive behavior through a fake clock.
  • Authorization outcomes.
  • Validation errors.
  • Idempotency behavior.
  • Boundary cases.

Example service:

Code
public sealed class OrderService
{
    public Order Submit(Order order)
    {
        if (order.Lines.Count == 0)
        {
            throw new InvalidOperationException("An order must have at least one line.");
        }

        order.Status = OrderStatus.Submitted;
        order.SubmittedAtUtc = DateTime.UtcNow;

        return order;
    }
}

Weak test:

Code
[Fact]
public void Submit_WithValidOrder_DoesNotThrow()
{
    var service = new OrderService();
    var order = OrderFactory.CreateValid();

    service.Submit(order);
}

Better test:

Code
[Fact]
public void Submit_WithValidOrder_MarksOrderAsSubmitted()
{
    var service = new OrderService();
    var order = OrderFactory.CreateValid();

    var result = service.Submit(order);

    Assert.Equal(OrderStatus.Submitted, result.Status);
    Assert.NotNull(result.SubmittedAtUtc);
}

Good tests should fail when important behavior breaks.

Arrange-Act-Assert

The Arrange-Act-Assert pattern gives tests a clear structure.

Code
[Fact]
public void ApplyDiscount_WhenCustomerIsPremium_AppliesPremiumDiscount()
{
    // Arrange
    var calculator = new DiscountCalculator();
    var customer = new Customer
    {
        IsPremium = true
    };

    // Act
    var discount = calculator.ApplyDiscount(customer, 100m);

    // Assert
    Assert.Equal(15m, discount);
}

Benefits:

  • Easy to read.
  • Separates setup from behavior from verification.
  • Helps avoid multiple actions in one test.
  • Makes failures easier to diagnose.
  • Supports consistent test style.

For very small tests, comments may not be necessary, but the structure should still be visible.

Assert One Behavior, Not Always One Assertion

A common guideline says one test should verify one behavior. This does not always mean one assertion.

Example:

Code
[Fact]
public async Task CreateOrder_WithValidRequest_ReturnsCreatedOrder()
{
    using var client = _factory.CreateClient();

    var response = await client.PostAsJsonAsync("/api/orders", new
    {
        CustomerId = 1,
        ProductId = 10,
        Quantity = 2
    });

    Assert.Equal(HttpStatusCode.Created, response.StatusCode);
    Assert.Equal("application/json", response.Content.Headers.ContentType?.MediaType);

    var body = await response.Content.ReadFromJsonAsync<OrderDto>();

    Assert.NotNull(body);
    Assert.Equal(1, body.CustomerId);
    Assert.Equal(10, body.ProductId);
    Assert.Equal(2, body.Quantity);
}

This test has several assertions, but they all verify one behavior: creating an order returns the expected HTTP response.

Avoid unrelated assertions in the same test.

Bad:

Code
[Fact]
public async Task OrderApi_Works()
{
    // Tests create, update, delete, permissions, validation, and email sending.
}

This is hard to debug and maintain.

Assertion Specificity

Assertions should be specific enough to catch real bugs.

Weak:

Code
Assert.NotNull(result);

Better:

Code
Assert.Equal("Submitted", result.Status);
Assert.Equal(3, result.LineCount);
Assert.Equal(120.50m, result.Total);

Weak:

Code
Assert.True(response.IsSuccessStatusCode);

Better:

Code
Assert.Equal(HttpStatusCode.Created, response.StatusCode);
Assert.Equal("/api/orders/123", response.Headers.Location?.OriginalString);

Weak:

Code
Assert.Contains("error", body);

Better:

Code
var problem = await response.Content.ReadFromJsonAsync<ValidationProblemDetails>();

Assert.NotNull(problem);
Assert.True(problem.Errors.ContainsKey("Email"));
Assert.Contains("Email is required.", problem.Errors["Email"]);

Specific assertions make failures more useful.

Avoiding Over-Specified Assertions

Assertions can also be too specific.

Over-specified tests check implementation details instead of behavior.

Example:

Code
mockRepository.Verify(r => r.GetByIdAsync(1, It.IsAny<CancellationToken>()), Times.Once);
mockRepository.Verify(r => r.SaveChangesAsync(It.IsAny<CancellationToken>()), Times.Once);
mockLogger.Verify(l => l.Log(...), Times.Once);

This may be useful for some tests, but if every test verifies internal calls, refactoring becomes painful.

Prefer behavior assertions:

Code
Assert.Equal(OrderStatus.Submitted, order.Status);

Use interaction verification when the interaction is the behavior.

Good examples for interaction verification:

  • Email was sent.
  • Message was published.
  • Payment gateway was called.
  • Cache invalidation happened.
  • Audit log was written.
  • Repository save was required by the use case.

Avoid verifying every internal method call just because mocking makes it possible.

Testing Exceptions

Good exception tests verify the type and sometimes the message or properties if they are part of the contract.

Example:

Code
[Fact]
public void Submit_WhenOrderHasNoLines_ThrowsInvalidOperationException()
{
    var service = new OrderService();
    var order = new Order();

    var exception = Assert.Throws<InvalidOperationException>(
        () => service.Submit(order));

    Assert.Equal("An order must have at least one line.", exception.Message);
}

For async methods:

Code
[Fact]
public async Task SubmitAsync_WhenOrderDoesNotExist_ThrowsNotFoundException()
{
    var service = CreateService();

    var exception = await Assert.ThrowsAsync<NotFoundException>(
        () => service.SubmitAsync(999, CancellationToken.None));

    Assert.Equal("Order was not found.", exception.Message);
}

Avoid catching exceptions manually unless needed.

Bad:

Code
try
{
    service.Submit(order);
}
catch
{
    Assert.True(true);
}

This can pass for the wrong exception type.

Testing Collections

For collections, assert both count and content when relevant.

Example:

Code
Assert.Collection(result,
    first =>
    {
        Assert.Equal("Alice", first.Name);
        Assert.Equal("Admin", first.Role);
    },
    second =>
    {
        Assert.Equal("Bob", second.Name);
        Assert.Equal("User", second.Role);
    });

If order does not matter:

Code
Assert.Contains(result, user => user.Email == "[email protected]");
Assert.Contains(result, user => user.Email == "[email protected]");
Assert.Equal(2, result.Count);

Avoid relying on ordering unless ordering is part of the contract.

Bad:

Code
Assert.Equal("Alice", result[0].Name);

Good if ordering is intentional:

Code
Assert.Equal(
    ["Alice", "Bob", "Charlie"],
    result.Select(user => user.Name).ToArray());

Testing HTTP APIs with Useful Assertions

For API tests, useful assertions often include:

  • Status code.
  • Content type.
  • Response DTO.
  • Validation errors.
  • Headers.
  • Cookies.
  • Database state.
  • Side effects.
  • Authorization behavior.

Example:

Code
[Fact]
public async Task CreateProduct_WhenRequestIsInvalid_ReturnsValidationProblemDetails()
{
    using var client = _factory.CreateClient();

    var response = await client.PostAsJsonAsync("/api/products", new
    {
        Name = "",
        Price = -1
    });

    Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);

    var problem = await response.Content
        .ReadFromJsonAsync<ValidationProblemDetails>();

    Assert.NotNull(problem);
    Assert.Contains("Name", problem.Errors.Keys);
    Assert.Contains("Price", problem.Errors.Keys);
}

This is more useful than:

Code
Assert.False(response.IsSuccessStatusCode);

Snapshot Testing

Snapshot testing compares output against a stored approved result.

It can be useful for:

  • Large JSON responses.
  • Generated documents.
  • UI component output.
  • API contract snapshots.
  • Complex serialization output.

Benefits:

  • Easy to verify large outputs.
  • Helps detect unexpected changes.
  • Useful for contract-like responses.

Risks:

  • Snapshots can become too large.
  • Developers may approve changes without reviewing them.
  • Snapshots can include unstable data.
  • Tests can be brittle if output changes frequently.
  • Snapshots are poor for behavior that needs targeted assertions.

Best practice:

  • Normalize dynamic values.
  • Keep snapshots focused.
  • Review snapshot diffs carefully.
  • Combine snapshots with targeted assertions.
  • Avoid snapshotting huge unrelated objects.

Flaky Tests

A flaky test is a test that sometimes passes and sometimes fails without a relevant code change.

Common causes:

CauseExample
Time dependencyTest expects current date/time
RandomnessTest uses random data without controlling seed
Shared stateTests modify same database rows
Test order dependencyTest B depends on Test A
Parallel executionTests interfere when run together
Async race conditionTest asserts before async work finishes
Threading bugNon-thread-safe shared object
External service dependencyReal API is slow or unavailable
Network instabilityE2E test depends on unstable network
UI timingElement not ready yet
Hard waitsTask.Delay or fixed sleeps
Resource limitsCI has less CPU/memory than local machine
Time zonesLocal and CI use different time zones
Culture settingsParsing/formatting differs by culture
File system dependencyTests use same file path
Port conflictsTests bind same port
CachingShared cache state leaks between tests

Flaky tests are dangerous because they reduce trust in CI.

Preventing Flaky Unit Tests

Unit tests should be deterministic.

Avoid:

Code
[Fact]
public void Token_IsNotExpired()
{
    var token = new Token
    {
        ExpiresAtUtc = DateTime.UtcNow.AddMinutes(5)
    };

    Assert.True(token.ExpiresAtUtc > DateTime.UtcNow);
}

Better: inject time.

Code
public sealed class TokenService
{
    private readonly TimeProvider _timeProvider;

    public TokenService(TimeProvider timeProvider)
    {
        _timeProvider = timeProvider;
    }

    public bool IsExpired(Token token)
    {
        return token.ExpiresAtUtc <= _timeProvider.GetUtcNow();
    }
}

Test:

Code
[Fact]
public void IsExpired_WhenTokenExpiresBeforeNow_ReturnsTrue()
{
    var timeProvider = new FakeTimeProvider(
        new DateTimeOffset(2026, 5, 17, 10, 0, 0, TimeSpan.Zero));

    var service = new TokenService(timeProvider);

    var token = new Token
    {
        ExpiresAtUtc = new DateTimeOffset(2026, 5, 17, 9, 59, 0, TimeSpan.Zero)
    };

    var result = service.IsExpired(token);

    Assert.True(result);
}

Other unit test stability practices:

  • Avoid real time.
  • Avoid uncontrolled randomness.
  • Avoid real network.
  • Avoid shared static mutable state.
  • Avoid file paths shared by tests.
  • Avoid relying on test order.
  • Use deterministic data.
  • Use explicit cultures/time zones where relevant.
  • Keep tests small and isolated.

Preventing Flaky Integration Tests

Integration tests commonly fail due to shared state or environment differences.

Best practices:

  • Use a clean database or reset database state.
  • Seed deterministic test data.
  • Avoid depending on test order.
  • Disable parallelization for tests sharing mutable resources.
  • Use unique names/IDs per test.
  • Replace external services with fakes or test containers.
  • Use TimeProvider or a fake clock.
  • Avoid real queues unless the test is specifically about queue integration.
  • Wait for eventual consistency with bounded polling, not fixed sleeps.
  • Capture logs and test output on failure.
  • Clean up files, containers, and database state.
  • Avoid global mutable configuration.
  • Use realistic providers for database behavior.

Example bounded polling:

Code
public static async Task EventuallyAsync(
    Func<Task<bool>> condition,
    TimeSpan timeout,
    TimeSpan interval)
{
    var deadline = DateTimeOffset.UtcNow.Add(timeout);

    while (DateTimeOffset.UtcNow < deadline)
    {
        if (await condition())
        {
            return;
        }

        await Task.Delay(interval);
    }

    throw new TimeoutException("Condition was not met before timeout.");
}

Use this for eventual asynchronous side effects, not for ordinary synchronous logic.

Preventing Flaky E2E Tests

E2E tests are often flaky because real UI and browser behavior is asynchronous.

Good practices:

  • Prefer stable locators.
  • Use role/text/test-id locators instead of brittle CSS when possible.
  • Use web-first assertions.
  • Avoid fixed sleeps.
  • Wait for meaningful UI state.
  • Avoid relying on animation timing.
  • Avoid relying on test order.
  • Isolate test users and test data.
  • Reset backend state.
  • Keep E2E tests short and focused.
  • Run fewer critical E2E tests rather than many brittle workflows.
  • Capture trace, screenshot, video, and console/network logs on failure.
  • Configure CI workers based on available CPU.
  • Use browser/container versions consistently.
  • Avoid testing third-party systems directly in normal E2E runs.

Bad Playwright-style example:

Code
await page.click('#submit');
await page.waitForTimeout(3000);
expect(await page.isVisible('.success')).toBeTruthy();

Better:

Code
await page.getByRole('button', { name: 'Submit' }).click();

await expect(page.getByText('Order submitted successfully')).toBeVisible();

A web-first assertion waits for the expected UI condition instead of sleeping for a fixed time.

Retries and Flaky Tests

Retries can reduce noise, especially for E2E tests, but they can also hide real problems.

Benefits of retries:

  • Reduces temporary CI noise.
  • Helps with rare infrastructure issues.
  • Gives time to collect trace/video on retry.
  • Can keep deployment pipelines moving.

Risks:

  • Masks real bugs.
  • Normalizes instability.
  • Makes test results harder to trust.
  • Increases test duration.
  • Delays root-cause analysis.
  • Can let flaky tests remain for months.

A good retry policy:

  • Avoid retries for unit tests.
  • Use limited retries for E2E tests if needed.
  • Track flaky tests separately.
  • Do not treat "passed after retry" as fully healthy.
  • Fail or alert on repeated flaky tests.
  • Quarantine only with ownership and expiry.
  • Fix or delete unreliable tests.

Example Playwright configuration:

Code
import { defineConfig } from '@playwright/test';

export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 2 : undefined,
  use: {
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
});

This captures useful diagnostics when a retry happens.

Quarantining Flaky Tests

Quarantining means temporarily separating known flaky tests from the required CI gate.

It should be a controlled process, not a hiding place.

Good quarantine policy:

  • Create a ticket for every quarantined test.
  • Assign an owner.
  • Record the reason.
  • Set an expiry date.
  • Run quarantined tests separately.
  • Track failure rate.
  • Fix, rewrite, or delete the test.
  • Do not allow indefinite quarantine.

Bad quarantine policy:

Code
Move flaky tests to ignored category forever.

This reduces confidence and creates test debt.

Useful Failure Diagnostics

CI failures should provide enough information to debug quickly.

Useful artifacts:

  • TRX test result files.
  • JUnit XML files.
  • Coverage reports.
  • Console logs.
  • Application logs.
  • Screenshots.
  • Playwright traces.
  • Videos.
  • Browser console logs.
  • Network logs.
  • Database logs.
  • Container logs.
  • Test output files.
  • Failed request/response payloads.
  • Environment details.
  • Random seed values.
  • Correlation IDs.

A failure that says only "test failed" is not enough.

A useful failure should answer:

  • Which test failed?
  • What was expected?
  • What was actual?
  • What input was used?
  • What environment ran the test?
  • What logs are available?
  • What screenshot or trace exists?
  • Was this a first-run failure or retry success?
  • Did the failure happen before or after deployment?

CI Test Execution

A CI pipeline should run tests automatically on pull requests and important branches.

A typical .NET CI flow:

Code
checkout
setup .NET SDK
restore
build
run unit tests
run integration tests
collect test results
collect coverage
publish reports
enforce quality gates
upload failure artifacts

Example GitHub Actions workflow:

Code
name: build-and-test

on:
  pull_request:
  push:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup .NET
        uses: actions/setup-dotnet@v4
        with:
          dotnet-version: '10.0.x'

      - name: Restore
        run: dotnet restore

      - name: Build
        run: dotnet build --configuration Release --no-restore

      - name: Test
        run: |
          dotnet test \
            --configuration Release \
            --no-build \
            --logger "trx;LogFileName=test-results.trx" \
            --results-directory ./TestResults \
            --collect:"XPlat Code Coverage"

      - name: Upload test results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-results
          path: ./TestResults

This keeps test results available even when tests fail.

Azure Pipelines Test Execution

Example Azure Pipelines YAML:

Code
trigger:
  branches:
    include:
      - main

pool:
  vmImage: ubuntu-latest

variables:
  buildConfiguration: Release

steps:
  - task: UseDotNet@2
    inputs:
      packageType: sdk
      version: 10.0.x

  - script: dotnet restore
    displayName: Restore

  - script: dotnet build --configuration $(buildConfiguration) --no-restore
    displayName: Build

  - script: |
      dotnet test \
        --configuration $(buildConfiguration) \
        --no-build \
        --logger "trx" \
        --results-directory "$(Agent.TempDirectory)/TestResults" \
        --collect:"XPlat Code Coverage"
    displayName: Test
    continueOnError: false

  - task: PublishTestResults@2
    condition: always()
    inputs:
      testResultsFormat: VSTest
      testResultsFiles: '$(Agent.TempDirectory)/TestResults/**/*.trx'
      failTaskOnFailedTests: true

  - task: PublishCodeCoverageResults@2
    condition: always()
    inputs:
      summaryFileLocation: '$(Agent.TempDirectory)/TestResults/**/coverage.cobertura.xml'

The exact coverage publishing task and inputs may vary by pipeline version and report format, but the principle is the same: publish results even when tests fail so developers can inspect the failure.

Splitting Test Suites in CI

Not every test needs to run at the same frequency.

Example categories:

Test TypeRun Frequency
Fast unit testsEvery pull request
Integration testsEvery pull request or important branches
Database container testsPull request and main branch, depending on speed
E2E smoke testsPull request or pre-merge for critical flows
Full E2E suiteNightly or before release
Load testsScheduled or release candidate
Security testsScheduled, PR for critical checks, release pipeline

A practical strategy:

  • Pull request: fast unit tests + important integration tests.
  • Main branch: all PR tests + broader integration/E2E tests.
  • Nightly: full E2E, mutation tests, long-running compatibility tests.
  • Release: smoke tests against deployed environment.

This balances fast feedback with broad confidence.

Test Filtering and Traits

Test frameworks support categories or traits to control which tests run.

xUnit example:

Code
[Trait("Category", "Integration")]
public sealed class OrdersApiTests
{
    [Fact]
    public async Task CreateOrder_ReturnsCreated()
    {
        // ...
    }
}

Run only non-integration tests:

Code
dotnet test --filter "Category!=Integration"

Run only integration tests:

Code
dotnet test --filter "Category=Integration"

This is useful for separating fast unit tests from slower integration tests.

Parallel Test Execution

Parallel execution can make CI faster, but it can also create flakiness.

Parallelization is safe when tests are independent.

Parallelization is risky when tests share:

  • Database state.
  • Files.
  • Static mutable variables.
  • Ports.
  • External services.
  • Test users.
  • Queues.
  • Caches.
  • Browser contexts.
  • Containers with shared state.

xUnit supports test collections to control shared context and parallelization.

Example collection:

Code
[CollectionDefinition("Database collection", DisableParallelization = true)]
public sealed class DatabaseCollection
{
}

Usage:

Code
[Collection("Database collection")]
public sealed class OrdersDatabaseTests
{
}

Use this when tests share a database fixture and cannot safely run in parallel.

Better long-term solution:

  • Isolate test data.
  • Use unique databases or schemas.
  • Reset state per test.
  • Avoid shared mutable state.
  • Then enable parallelization safely.

CI Resource Constraints

CI is often slower and more constrained than a developer machine.

Common issues:

  • Fewer CPU cores.
  • Less memory.
  • Slower disk.
  • Noisy neighbors on shared runners.
  • Cold dependency cache.
  • Docker image pulls.
  • Browser tests competing for CPU.
  • Database containers starting slowly.
  • Network variability.
  • Different time zone or culture.
  • Different environment variables.

Do not assume CI is just "local but slower."

Stability practices:

  • Set explicit timeouts.
  • Limit parallel workers.
  • Cache dependencies.
  • Use deterministic environment variables.
  • Pin SDK versions.
  • Pin container image versions.
  • Use CI-specific test configuration.
  • Capture artifacts.
  • Avoid fixed sleeps.
  • Separate slow tests.
  • Monitor test duration trends.

Build Once, Test Many

A CI pipeline should avoid rebuilding unnecessarily.

Example:

Code
dotnet restore
dotnet build --configuration Release --no-restore
dotnet test --configuration Release --no-build

This reduces duplicate work and ensures tests run against the built output.

For large solutions:

Code
dotnet test MySolution.sln \
  --configuration Release \
  --no-build \
  --filter "Category!=E2E"

For test projects separately:

Code
dotnet test tests/UnitTests/UnitTests.csproj --no-build
dotnet test tests/IntegrationTests/IntegrationTests.csproj --no-build

Keep CI scripts explicit and predictable.

Quality Gates

A quality gate is a rule that must pass before merging or deploying.

Common gates:

  • Build succeeds.
  • Unit tests pass.
  • Integration tests pass.
  • No critical static analysis issues.
  • Coverage does not drop below threshold.
  • Changed code coverage meets threshold.
  • No high-severity vulnerabilities.
  • E2E smoke tests pass.
  • No flaky tests in required suite.
  • Test results are published.
  • Required artifacts are uploaded.

Coverage gate example concept:

Code
Line coverage must be >= 80%.
Branch coverage must be >= 70%.
Coverage must not decrease by more than 1%.

Use gates carefully. Gates should encourage good behavior, not create meaningless checklists.

Handling Failing Tests in CI

When tests fail in CI:

  1. Do not immediately rerun without looking.
  2. Read the failure message.
  3. Check expected vs actual.
  4. Check logs and artifacts.
  5. Check whether the failure is deterministic.
  6. Run the test locally if needed.
  7. Reproduce under CI-like settings if possible.
  8. Identify whether it is a product bug, test bug, or environment issue.
  9. Fix the root cause.
  10. Add regression coverage if the product had a bug.

Avoid treating flaky tests as harmless.

A flaky test can still reveal real regressions.

Useful Test Naming

Good test names explain the scenario and expected result.

Common pattern:

Code
MethodName_WhenCondition_ExpectedResult

Example:

Code
[Fact]
public void CalculateTotal_WhenOrderHasMultipleLines_ReturnsSumOfLineTotals()
{
}

Another readable style:

Code
[Fact]
public void Should_return_bad_request_when_email_is_missing()
{
}

Good names help CI failures become understandable without opening the test file.

Bad:

Code
[Fact]
public void Test1()
{
}

Test Data Builders

Test data builders reduce noisy arrange code.

Example:

Code
public sealed class OrderBuilder
{
    private readonly Order _order = new()
    {
        CustomerId = 1
    };

    public OrderBuilder WithLine(decimal price, int quantity)
    {
        _order.Lines.Add(new OrderLine
        {
            UnitPrice = price,
            Quantity = quantity
        });

        return this;
    }

    public Order Build()
    {
        return _order;
    }
}

Usage:

Code
var order = new OrderBuilder()
    .WithLine(10m, 2)
    .WithLine(5m, 1)
    .Build();

Benefits:

  • Tests focus on relevant data.
  • Defaults are centralized.
  • Reduces duplication.
  • Makes test intent clearer.

Be careful not to hide too much. Test readers should still understand the important setup.

Mutation Testing

Mutation testing changes production code slightly and checks whether tests fail.

Example mutation:

Code
if (total >= 1000)

Changed to:

Code
if (total > 1000)

If tests still pass, the test suite may not be strong enough.

Mutation testing is useful because it measures test effectiveness better than line coverage. It asks: "Would the tests catch real mistakes?"

Trade-offs:

  • Slower than normal tests.
  • More complex to configure.
  • Not usually run on every pull request.
  • Better for critical modules or scheduled checks.

Mutation testing is often used as an advanced quality signal, not a replacement for coverage.

Common Mistakes

Common mistakes include:

  • Treating high coverage as proof of correctness.
  • Writing tests with no meaningful assertions.
  • Asserting only NotNull or IsSuccessStatusCode.
  • Testing implementation details instead of behavior.
  • Making tests depend on current time.
  • Using random data without controlling it.
  • Sharing database state between tests.
  • Depending on test execution order.
  • Ignoring flaky tests.
  • Using retries to hide test problems.
  • Running too many slow tests on every pull request.
  • Not publishing test results in CI.
  • Not collecting failure artifacts.
  • Letting integration tests call real external services.
  • Using fixed sleeps in async or UI tests.
  • Overusing mocks and verifying every internal call.
  • Not testing negative cases.
  • Not testing boundary cases.
  • Not running tests in Release configuration in CI.
  • Not pinning SDK or container versions.
  • Not separating unit, integration, and E2E test stages.
  • Allowing skipped tests without ownership.

Best Practices

Use code coverage as a signal, not the only goal.

Prefer branch coverage for complex decision logic.

Review uncovered high-risk code.

Write assertions that verify meaningful behavior.

Use Arrange-Act-Assert.

Prefer behavior assertions over implementation-detail assertions.

Test success, failure, boundary, and edge cases.

Keep tests deterministic.

Control time, randomness, culture, and external dependencies.

Isolate test data.

Avoid test order dependencies.

Treat flaky tests as defects.

Use retries carefully and track retry-based passes.

Publish TRX/JUnit results and coverage reports in CI.

Upload failure artifacts such as logs, screenshots, traces, and coverage reports.

Run fast tests on every pull request.

Run slower tests in separate stages, nightly jobs, or release gates when needed.

Keep CI scripts explicit and reproducible.

Use test categories or traits to split suites.

Limit parallelism when tests share resources.

Make failed CI output actionable.

Interview Practice

PreviousTracking vs no-tracking queries and identity resolutionNext UpEF Core InMemory Provider Caveats and When SQLite, Docker Databases, or Testcontainers Are Safer