Overview
Rate limiting, memory/GC awareness, and runtime diagnostics are practical .NET skills used to keep applications stable, scalable, and observable in production.
Rate limiting controls how many requests or operations are allowed during a period of time. It protects APIs from abuse, accidental overload, brute-force attempts, expensive endpoint misuse, and downstream dependency pressure. In ASP.NET Core, rate limiting is commonly applied through middleware and endpoint policies.
Memory and garbage collection awareness means understanding how .NET manages memory, how object allocations affect performance, why garbage collection pauses happen, and how to avoid allocation-heavy code in hot paths. A developer does not usually manage memory manually in C#, but they still need to design code that does not create avoidable pressure on the managed heap.
Runtime diagnostics means using logs, metrics, traces, counters, dumps, and profiling tools to understand what a running .NET application is doing. This is important because many production issues cannot be solved only by reading code. Slow requests, high CPU, high allocation rate, thread pool starvation, memory leaks, and GC pressure need runtime evidence.
This topic matters in interviews because it shows whether a developer can think beyond writing features. Interviewers often want to know if you can build APIs that survive real traffic, diagnose performance problems, explain memory behavior, and use production-safe troubleshooting techniques.
Core Concepts
Rate limiting
Rate limiting is the practice of restricting how many requests or operations can happen within a defined boundary.
Common boundaries include:
- per IP address
- per authenticated user
- per API key
- per tenant
- per endpoint
- globally for the entire application
- per downstream dependency, such as a payment gateway or external API
Rate limiting is not the same as authentication or authorization. Authentication identifies the caller. Authorization decides what the caller is allowed to do. Rate limiting decides how often the caller or system can perform an action.
Typical reasons to use rate limiting include:
- protecting public APIs from abuse
- reducing brute-force login attempts
- preventing one tenant from affecting other tenants
- avoiding overload on expensive endpoints
- protecting downstream services with strict quotas
- keeping system behavior predictable under traffic spikes
Common rate limiting algorithms
ASP.NET Core supports several common rate limiting strategies.
Fixed window
A fixed window limiter allows a maximum number of requests in a fixed time window.
Example:
- 100 requests per minute
- when the minute resets, the caller gets a new allowance
This is simple and cheap to understand, but it can allow traffic bursts around the boundary between two windows.
Example:
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter("fixed", limiterOptions =>
{
limiterOptions.PermitLimit = 100;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.QueueLimit = 0;
});
});
Sliding window
A sliding window limiter divides a time window into smaller segments. It smooths request distribution better than a fixed window.
Example:
builder.Services.AddRateLimiter(options =>
{
options.AddSlidingWindowLimiter("sliding", limiterOptions =>
{
limiterOptions.PermitLimit = 100;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.SegmentsPerWindow = 6;
limiterOptions.QueueLimit = 0;
});
});
This is useful when you want to reduce sudden bursts at time boundaries.
Token bucket
A token bucket limiter adds tokens over time. Each request consumes one or more tokens. If no token is available, the request is rejected or queued.
Example:
builder.Services.AddRateLimiter(options =>
{
options.AddTokenBucketLimiter("token", limiterOptions =>
{
limiterOptions.TokenLimit = 100;
limiterOptions.TokensPerPeriod = 20;
limiterOptions.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
limiterOptions.AutoReplenishment = true;
limiterOptions.QueueLimit = 0;
});
});
Token bucket is useful when you want to allow controlled bursts while still enforcing an average rate.
Concurrency limiter
A concurrency limiter limits how many operations can run at the same time. It does not limit total requests per time window.
Example:
builder.Services.AddRateLimiter(options =>
{
options.AddConcurrencyLimiter("concurrency", limiterOptions =>
{
limiterOptions.PermitLimit = 20;
limiterOptions.QueueLimit = 10;
});
});
This is useful for expensive operations such as file processing, report generation, image conversion, or calls to a slow downstream service.
Applying rate limiting in ASP.NET Core
A basic ASP.NET Core setup usually registers rate limiting services and enables the middleware in the request pipeline.
using Microsoft.AspNetCore.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.AddFixedWindowLimiter("standard-api", limiterOptions =>
{
limiterOptions.PermitLimit = 60;
limiterOptions.Window = TimeSpan.FromMinutes(1);
limiterOptions.QueueLimit = 0;
});
});
var app = builder.Build();
app.UseRateLimiter();
app.MapGet("/api/products", () => Results.Ok())
.RequireRateLimiting("standard-api");
app.Run();
For APIs, rejected requests normally return 429 Too Many Requests.
A production API should often return useful response information:
builder.Services.AddRateLimiter(options =>
{
options.OnRejected = async (context, cancellationToken) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
if (context.Lease.TryGetMetadata(
MetadataName.RetryAfter,
out var retryAfter))
{
context.HttpContext.Response.Headers.RetryAfter =
((int)retryAfter.TotalSeconds).ToString();
}
await context.HttpContext.Response.WriteAsync(
"Too many requests. Please try again later.",
cancellationToken);
};
});
Partitioned rate limiting
Partitioned rate limiting creates separate limits for different callers or groups.
A common example is per-user or per-IP rate limiting:
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("per-user", httpContext =>
{
var userName = httpContext.User.Identity?.Name;
var partitionKey = !string.IsNullOrWhiteSpace(userName)
? userName
: httpContext.Connection.RemoteIpAddress?.ToString() ?? "anonymous";
return RateLimitPartition.GetFixedWindowLimiter(
partitionKey,
_ => new FixedWindowRateLimiterOptions
{
PermitLimit = 30,
Window = TimeSpan.FromMinutes(1),
QueueLimit = 0
});
});
});
Partitioning is powerful, but it must be designed carefully. If every random input value creates a new partition, an attacker can cause memory growth by creating many unique keys. Good partition keys should be bounded, normalized, and meaningful.
Rate limiting in distributed systems
Built-in in-memory rate limiting works well for a single application instance. In a multi-instance environment, each instance has its own memory unless the rate limit state is stored in a shared system.
For distributed applications, common approaches include:
- applying rate limits at an API gateway or reverse proxy
- using a shared cache or data store such as Redis
- enforcing limits in an external identity/API management layer
- combining application-level limits with infrastructure-level limits
Example problem:
- instance A allows 100 requests per minute
- instance B allows 100 requests per minute
- the real global limit becomes 200 requests per minute if traffic is balanced across both instances
For interviews, it is important to explain whether the limit is per instance or global across the system.
Rate limiting trade-offs and mistakes
Common trade-offs:
Common mistakes include:
- treating rate limiting as authentication
- applying one global limit to all endpoints
- forgetting that expensive endpoints need stricter limits
- using client IP incorrectly behind proxies
- allowing unlimited rate limiter partitions
- enabling queueing without considering latency and memory
- not returning
429 Too Many Requests - not load testing rate limit behavior
- using only application-local limits when a distributed global limit is required
.NET managed memory model
.NET uses automatic memory management. Objects are allocated on the managed heap and cleaned up by the garbage collector when they are no longer reachable.
Important terms:
The GC is optimized around the idea that most objects are short-lived. For web applications, request-scoped objects should usually become unreachable after the request completes.
Generational garbage collection
The .NET GC divides objects into generations.
Generation 0
Generation 0 contains newly allocated small objects. Collections are frequent and usually fast.
Example short-lived allocations:
public string FormatCustomerName(Customer customer)
{
return $"{customer.FirstName} {customer.LastName}";
}
The resulting string may be short-lived if used only for one response.
Generation 1
Generation 1 acts as a middle area for objects that survived a Gen 0 collection but may still be short-lived.
Generation 2
Generation 2 contains longer-lived objects. Gen 2 collections are more expensive because they involve more of the heap.
Long-lived objects often include:
- static caches
- singleton service state
- large object graphs retained by references
- long-lived collections
- accidental memory leaks caused by event handlers or static references
Large Object Heap
Large objects are allocated on the Large Object Heap. Large arrays, large strings, and large buffers can increase memory pressure and trigger more expensive collections.
Examples that can create large allocations:
var bytes = new byte[100_000];
var json = await File.ReadAllTextAsync("large-file.json");
var allRows = await dbContext.Orders.ToListAsync();
Better approaches can include:
- streaming instead of loading everything into memory
- paging large result sets
- using buffers carefully
- reusing arrays with
ArrayPool<T>in hot paths - avoiding unnecessary large string concatenation
- compressing or chunking large payloads when appropriate
Example using ArrayPool<byte>:
using System.Buffers;
public async Task CopyWithPooledBufferAsync(Stream input, Stream output)
{
byte[] buffer = ArrayPool<byte>.Shared.Rent(64 * 1024);
try
{
int bytesRead;
while ((bytesRead = await input.ReadAsync(buffer)) > 0)
{
await output.WriteAsync(buffer.AsMemory(0, bytesRead));
}
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
}
Pooling is useful in hot paths, but it adds complexity. It should be used when measurements show allocation pressure.
Allocation awareness in C#
Allocation awareness means writing code that avoids unnecessary object creation, especially in hot paths.
Common allocation sources include:
- LINQ chains in performance-sensitive loops
- repeated string concatenation
- boxing value types
- closures and captured variables
- creating arrays repeatedly
- materializing large collections with
ToList() - unnecessary exceptions in normal control flow
- large JSON serialization/deserialization operations
Example of avoidable repeated allocation:
foreach (var item in items)
{
var message = "Item: " + item.Name + ", Status: " + item.Status;
logger.LogInformation(message);
}
Better logging pattern:
foreach (var item in items)
{
logger.LogInformation(
"Item: {ItemName}, Status: {Status}",
item.Name,
item.Status);
}
Structured logging avoids unnecessary string formatting when the log level is disabled and creates better searchable logs.
Memory leaks in managed code
Managed code can still leak memory. A memory leak happens when objects are no longer needed but are still reachable from active references.
Common causes include:
- static collections that keep growing
- event subscriptions that are never removed
- long-lived services holding references to scoped services or request data
- unbounded caches
- background queues without limits
- timers that are not disposed
- large object graphs attached to singleton services
- storing
HttpContextor request objects after a request ends
Example event subscription leak:
public sealed class OrderListener
{
private readonly OrderService _orderService;
public OrderListener(OrderService orderService)
{
_orderService = orderService;
_orderService.OrderCreated += OnOrderCreated;
}
private void OnOrderCreated(object? sender, OrderCreatedEventArgs e)
{
// Handle event
}
}
If OrderService is long-lived and OrderListener is expected to be short-lived, the event subscription can keep OrderListener alive.
A safer approach is to unsubscribe when appropriate:
public sealed class OrderListener : IDisposable
{
private readonly OrderService _orderService;
public OrderListener(OrderService orderService)
{
_orderService = orderService;
_orderService.OrderCreated += OnOrderCreated;
}
private void OnOrderCreated(object? sender, OrderCreatedEventArgs e)
{
// Handle event
}
public void Dispose()
{
_orderService.OrderCreated -= OnOrderCreated;
}
}
IDisposable and unmanaged resources
The GC manages memory, but it does not automatically release unmanaged resources immediately.
Examples of unmanaged or external resources include:
- file handles
- sockets
- database connections
- streams
- timers
- native handles
Use using or await using to release resources promptly:
await using var stream = File.OpenRead("report.pdf");
using var connection = new SqlConnection(connectionString);
await connection.OpenAsync();
Failing to dispose resources can create production issues even when managed memory appears normal.
GC performance habits
Good GC habits include:
- avoid unnecessary allocations in hot paths
- avoid loading large data sets into memory
- use streaming for large files and responses
- use pagination for database results
- prefer
StringBuilderfor repeated string building - use
Span<T>andMemory<T>only where they simplify or improve measured performance - reuse buffers carefully when appropriate
- keep caches bounded
- avoid calling
GC.Collect()manually in normal application code - measure before optimizing
Bad habits include:
- assuming GC means memory never matters
- blaming GC before checking allocation rate
- using object pooling everywhere without measurement
- keeping references longer than needed
- storing request-scoped data in singletons
- ignoring large object allocations
- using exceptions for expected control flow in high-volume paths
Runtime diagnostics
Runtime diagnostics is the process of observing and analyzing a running application.
Common diagnostic signals:
A practical investigation starts with symptoms and evidence.
Example symptoms:
- API latency increased
- CPU is high
- memory keeps growing
- many requests time out
- error rate increased
- Gen 2 collections are frequent
- thread pool queue length is high
- database calls are slow
dotnet-counters
dotnet-counters monitors live runtime counters.
Example:
dotnet-counters monitor --process-id 12345 System.Runtime
Useful counters include:
- CPU usage
- working set
- GC heap size
- allocation rate
- Gen 0/1/2 GC count
- time in GC
- exception count
- thread pool thread count
- thread pool queue length
Example interpretation:
dotnet-trace
dotnet-trace collects runtime events for deeper analysis.
Example:
dotnet-trace collect --process-id 12345
It is useful for:
- CPU investigation
- runtime event analysis
- GC events
- thread pool behavior
- request processing activity when combined with other telemetry
A trace is better than guessing when the question is, "Where is time being spent?"
dotnet-dump
dotnet-dump collects and analyzes process dumps.
Example:
dotnet-dump collect --process-id 12345 --output app.dmp
dotnet-dump analyze app.dmp
A dump is useful for:
- memory leak investigation
- high memory usage
- deadlocks
- thread inspection
- object heap analysis
- understanding what is still referenced
Common dump analysis questions include:
- What object types consume the most memory?
- Why are these objects still alive?
- Are many threads blocked?
- Is the application waiting on locks, I/O, or tasks?
dotnet-gcdump
dotnet-gcdump collects a GC heap snapshot.
Example:
dotnet-gcdump collect --process-id 12345 --output app.gcdump
It is useful for comparing heap usage over time and identifying which object types are growing.
Important caution: collecting a GC dump can trigger a full garbage collection, which may affect a performance-sensitive application. In production, use diagnostics carefully and follow operational safety procedures.
Logs, metrics, and traces
Modern production diagnostics normally combines three pillars:
Example:
- logs show a request failed
- metrics show error rate increased after deployment
- traces show most latency is inside a downstream payment API call
In .NET applications, common observability tools include:
ILogger- built-in .NET metrics
- ASP.NET Core metrics
- OpenTelemetry
- Application Insights
- Prometheus/Grafana
- distributed tracing with
ActivitySource
Example custom metric:
using System.Diagnostics.Metrics;
public sealed class OrderMetrics
{
private readonly Counter<int> _ordersCreated;
public OrderMetrics(IMeterFactory meterFactory)
{
var meter = meterFactory.Create("MyApp.Orders");
_ordersCreated = meter.CreateCounter<int>("orders.created");
}
public void OrderCreated()
{
_ordersCreated.Add(1);
}
}
Diagnosing high memory usage
A practical memory investigation might look like this:
- Check process memory and GC heap size.
- Check allocation rate.
- Check Gen 2 GC frequency.
- Capture multiple snapshots over time.
- Compare which object types are growing.
- Find roots that keep objects alive.
- Review caches, static references, queues, event handlers, and long-lived services.
- Fix the root cause.
- Validate with the same metrics after the fix.
Example diagnosis:
- memory grows continuously
- GC heap grows with it
dotnet-gcdumpshows manyOrderReportobjects- dump analysis shows a singleton service holds a list of generated reports
- fix by adding expiration, size limits, or external storage
Diagnosing high CPU
A practical CPU investigation might look like this:
- Confirm CPU is high using metrics.
- Check whether high CPU is constant or spike-based.
- Collect a trace or CPU profile.
- Identify hot methods.
- Check for tight loops, expensive serialization, regex usage, inefficient LINQ, repeated database query materialization, or lock contention.
- Optimize the measured hot path.
- Validate improvement with load testing.
Avoid optimizing random code without evidence. The slowest code path is often not where developers expect.
Diagnosing thread pool starvation
Thread pool starvation happens when work items wait too long because available thread pool threads are blocked or exhausted.
Common causes include:
- blocking async code with
.Resultor.Wait() - sync-over-async calls
- long-running CPU work on request threads
- blocking I/O
- lock contention
- too much parallelism
- thread pool misuse
Problematic example:
public IActionResult Get()
{
var result = _service.GetDataAsync().Result;
return Ok(result);
}
Better approach:
public async Task<IActionResult> Get(CancellationToken cancellationToken)
{
var result = await _service.GetDataAsync(cancellationToken);
return Ok(result);
}
Thread pool issues often show up as high latency, request timeouts, and growing queue length even when CPU is not fully used.
Production-safe diagnostics habits
Good production diagnostics habits include:
- collect the least invasive evidence first
- start with metrics and logs before dumps
- avoid collecting heavy dumps during peak traffic unless necessary
- protect dumps because they can contain sensitive data
- use staging or replica environments when possible
- capture baseline metrics before an incident
- add correlation IDs to logs and traces
- keep dashboards for request rate, latency, errors, CPU, memory, GC, and dependency calls
- document known performance limits and rate limits
- validate fixes with load testing
Rate limiting and diagnostics together
Rate limiting should be observable.
Track:
- total requests
- rejected requests
- rejection rate by endpoint
- rejection rate by user/tenant/API key
- queue length if queueing is enabled
- downstream dependency latency
429response rate- whether rate limits are too strict or too loose
A rate limiter that silently rejects requests without monitoring can create confusing production behavior.
Example:
options.OnRejected = async (context, cancellationToken) =>
{
var logger = context.HttpContext.RequestServices
.GetRequiredService<ILoggerFactory>()
.CreateLogger("RateLimiting");
logger.LogWarning(
"Rate limit rejected request. Path: {Path}, User: {User}",
context.HttpContext.Request.Path,
context.HttpContext.User.Identity?.Name ?? "anonymous");
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
await context.HttpContext.Response.WriteAsync(
"Too many requests.",
cancellationToken);
};
Common interview comparison: performance, scalability, and diagnostics
Best practices
Use rate limiting deliberately:
- apply different limits to different endpoint costs
- partition by authenticated identity when possible
- handle anonymous users carefully
- return
429 Too Many Requests - include retry information when appropriate
- load test rate limit behavior
- use gateway or distributed rate limiting for multi-instance global limits
- monitor rejected requests
Write memory-aware code:
- avoid unnecessary allocations in hot paths
- stream large files and responses
- paginate large database queries
- bound caches and queues
- dispose resources
- avoid keeping request data in singleton services
- avoid manual GC collection in normal code
- measure before optimizing
Use diagnostics professionally:
- use logs, metrics, and traces together
- start with lightweight tools
- use counters for live runtime behavior
- use traces for CPU and timing problems
- use dumps for deep memory/thread investigation
- protect diagnostic files
- validate fixes with repeatable measurements