How to Implement Rate Limiting in .NET Core 8 (with Examples)

Q: What’s the difference between fixed window and sliding window limiters?

A fixed window limiter counts requests in discrete intervals (e.g. per minute). It’s simple but can have bursts at window edges. A sliding window limiter divides the window into overlapping segments and carries unused quota forward, smoothing out bursts. Use sliding window when you want a more even request distribution; use fixed when simplicity is enough.

Q: When should I use a token bucket limiter?

Use a token bucket limiter when you want to allow short bursts of traffic but enforce a sustained rate. It’s defined by a bucket capacity and a refill rate. For example, you might allow 20 quick requests if your bucket holds 20 tokens, but then only 5 more per second after that. This is great for scenarios like user actions or messaging where occasional bursts are normal.

Q: Can I apply different limits per user or API key?

Yes. ASP.NET Core 8 supports partitioned rate limiting . You can define your policy with a partition key (like user ID or IP address) so each user gets their own bucket. This ensures one user’s behavior doesn’t affect others. For example, you can create a separate fixed-window limiter for each remote IP address. This is especially useful for multi-tenant or public APIs.

Muhammad Talha ( Senior Software Engineer – .NET Core / Angular / React / Sql Server Expert)

talhaawanonline123@gmail.com

I am 100% sure that after reading this blog, you will be able to implement Rate limiting in .NET Core 8 in optimized and efficient way. I have written all the explanations so don’t be bored by reading as it’s the first ingredient to implement anything in your software. Read in depth and then implement.

Rate limiting is an essential practice in modern API design to avoid overloading and abuse. Request throttling in a Web API avoids malicious or accidental high traffic from impacting user experience or leading to server crashes.

From a developer’s point of view, I’ve seen how an unexpected increase in requests (like a brute-force login attempt or a viral spike of API calls) can bring a defenseless service to its knees. Putting a rate limiter in place ensures that every client gets a reasonable share of the API capacity.

It’s also good ASP.NET Core API security: for instance, requiring a limit of X login attempts per minute can prevent credential stuffing, and limiting public endpoints can give everyone uniform performance. Rate limiting prevents abuse and DoS situations by discarding or deferring excess requests, typically using a 429 Too Many Requests response.

When a request arrives, the system checks how many calls the specific user (or IP/API key) has made in the current time window. If the client is under their allotted limit, the request is processed normally. If they’ve exceeded the limit, the API returns an HTTP 429 response telling the client to slow down. In ASP.NET Core 8, this flow is handled by built-in middleware that you configure at startup (more on that soon).

For now, the takeaway is that rate limiting in .NET Core 8 is about protecting resources, ensuring fairness, and maintaining stability. In 2025’s world of microservices, mobile apps, and IoT clients, having a solid rate-limiting strategy is non-negotiable.

Rate Limiting Strategies in .NET Core 8

Let me tell you about the best strategies before implementing the whole thing, as it’s good to have all these approaches in your mind. ASP.NET Core 8’s built-in rate limiting middleware (from Microsoft.AspNetCore.RateLimiting) supports several common algorithms. Each of these approaches has its advantages:

Fixed Window Limiter: This is the simplest strategy. Requests are counted in fixed time intervals (windows). For example, a policy might allow 100 requests per minute. Every minute (when the window resets), the count starts over. This can allow bursts at window boundaries but is easy to implement. In .NET, you configure it with AddFixedWindowLimiter, setting PermitLimit (max requests) and Window (timespan).

Sliding Window Limiter: Sliding window smooths out bursts by effectively using overlapping sub-windows. It divides each window into segments and “carries over” unused capacity from an expired segment. In effect, clients regain some request quota continuously as time passes, instead of all at once. In .NET Core 8 you use AddSlidingWindowLimiter and set PermitLimit, Window, and SegmentsPerWindow to define how the window slides. I’ve used sliding windows for more even throttling, especially on search APIs where I want to avoid big surges right at each minute tick.

Token Bucket Limiter: Token bucket is great for allowing short bursts while enforcing a long-term rate. Imagine a bucket that fills with tokens at a steady rate; each incoming request consumes a token. If the bucket has tokens, the request proceeds; if empty, the request is rejected or delayed until tokens replenish. The bucket capacity defines the maximum burst size. In .NET 8, AddTokenBucketLimiter is used. You configure TokenLimit (bucket capacity), TokensPerPeriod (tokens added each interval), and ReplenishmentPeriod (how often tokens are added). For example, a bucket might hold 20 tokens and add 5 every second. A login API could use this to allow a few quick retries but then impose a steady refill rate.

Concurrency Limiter: It limits simultaneous requests instead of total requests over time. You set a maximum number of requests that can run at the same time. Each active request occupies one “slot”; when it finishes, the slot is freed. This is useful for heavy operations where you want to cap parallel executions (e.g. large data exports). In .NET 8, use AddConcurrencyLimiter with PermitLimit (max concurrent) and optional queue settings. Unlike other limiters, concurrency does not enforce a time-based cap – only concurrent count.

Each strategy is built into the .NET 8 rate limiting middleware. (There are also partitioned rate limiting options, where you can segment the limits per user or API key, but that’s an advanced topic. For most cases, the four above suffice.)

Fixed Window Limiter

The fixed window limiter is straightforward: count requests in chunks of time. For example, “5 requests per 60 seconds.” Every 60-second interval resets the count. This is like having a cookie jar of 5 cookies and refilling it every minute. If a client uses up the cookies early in the minute, any further requests until the next minute will be denied.

In .NET 8 you configure it like this (in your Program.cs for a minimal API or similar):

builder.Services.AddRateLimiter(options => options
    .AddFixedWindowLimiter("fixedPolicy", config =>
    {
        config.PermitLimit = 5;
        config.Window = TimeSpan.FromSeconds(60);
        config.QueueLimit = 2;                // optional queuing
        config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    }));

This sets up a policy named “fixedPolicy” allowing 5 requests per 60 seconds. (Extra queue settings let you buffer a couple more requests before they get a 429.) According to Microsoft docs, this means “A maximum of 5 requests per each 60-second window are allowed,” then it resets

Now let me tell you that when you can use this strategy?

Fixed window is great for simple rate caps (like login attempts per minute). It does allow a user/client to make 10 requests in quick succession if they come at the end of one window and the start of the next; sliding window or token bucket would handle that burst more smoothly. But fixed window is easy to reason about and works well when you’re okay with that edge case.

Sliding Window Limiter

A sliding window limiter refines the fixed window by smoothing burstiness. It still enforces, say, “100 requests per minute,” but it divides that minute into segments. When the window moves forward, it “carries over” unused capacity from the oldest segment into the new window.

For example, imagine a 30-second window divided into three 10-second segments with a 100 requests limit. If 50 requests were made 20 seconds ago (in the first segment of the previous window), those 50 “freed up” slots get added back as the window slides. The .NET docs provide a helpful table (shown above) that tracks “Available” tokens in a sliding window over time learn.

To set this up, I follow:

builder.Services.AddRateLimiter(options => options
    .AddSlidingWindowLimiter("slidingPolicy", config =>
    {
        config.PermitLimit = 100;
        config.Window = TimeSpan.FromSeconds(30);
        config.SegmentsPerWindow = 3;
        config.QueueLimit = 2;
        config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    }));

This gives the same 100 requests per 30 seconds, but smoother across segments. I’ve used sliding window for APIs where I want to avoid a scenario of clients clustering their requests at window boundaries. Sliding windows distribute the allowed calls more evenly, at the cost of slightly more complex bookkeeping.

Token Bucket Limiter

The token bucket algorithm is a common strategy when you want to allow short bursts but limit overall throughput. Conceptually, you have a bucket that refills at a steady rate. Clients consume tokens from the bucket per request. When the bucket is empty, further requests must wait (or get rejected) until tokens refill.

For example, consider a bucket with a capacity of 10 tokens. Every second, 2 tokens are added back (up to the capacity). A client can use up to 10 requests in a burst (consuming the tokens), but afterward they’ll have to slow down to 2 requests per second (matching the refill rate). Microsoft’s docs show a table of this behavior: tokens available, taken, added, and carried over every 10 seconds.

In .NET 8, configuring a token bucket limiter looks like this:

builder.Services.AddRateLimiter(options => options
    .AddTokenBucketLimiter("tokenPolicy", config =>
    {
        config.TokenLimit = 10;                     // bucket capacity
        config.TokensPerPeriod = 2;                // tokens added per period
        config.ReplenishmentPeriod = TimeSpan.FromSeconds(1);
        config.AutoReplenishment = true;
        config.QueueLimit = 5;
        config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    }));

This sets up a policy where 2 tokens are added each second up to a max of 10. You’d get up to 10 requests instantly if unused, but then throttle to 2/sec. The AutoReplenishment flag (set true) means .NET will automatically run a timer to refill tokens. In my experience, token buckets work well for endpoints where a slight burst is fine (e.g. chat messages or search queries), but you still need a sustainable average rate.

Concurrency Limiter

The concurrency limiter is different: it limits simultaneous requests, not requests per time window. Imagine a server that can handle at most 3 heavy operations at once. Once 3 requests are in progress, any additional requests will queue (or be rejected) until a slot frees. This is configured via AddConcurrencyLimiter:

builder.Services.AddRateLimiter(options => options
    .AddConcurrencyLimiter("concurrencyPolicy", config =>
    {
        config.PermitLimit = 3;             // max concurrent requests
        config.QueueLimit = 5;
        config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    }));

Each incoming request grabs one permit. When the request completes, the permit is released. Unlike the others, this does not enforce an overall requests-per-second cap – it only throttles parallelism.

I’ve used concurrency limiting on expensive endpoints (like large file imports) to prevent too many processes from running at once. It’s effectively a way to protect resources like CPU/memory by bounding parallel tasks.

Setting Up Rate Limiting in .NET 8 Minimal APIs

Now let’s put these strategies into practice with a step-by-step .NET 8 example. Suppose we’re creating a minimal API and want to add rate limiting to some endpoints. Here’s how you might do it in Program.cs:

using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

// 1. Define rate limit policies:
builder.Services.AddRateLimiter(options => 
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
    options.AddFixedWindowLimiter("fixedPolicy", config =>
    {
        config.PermitLimit = 5;
        config.Window = TimeSpan.FromSeconds(60);
    });
    options.AddSlidingWindowLimiter("slidingPolicy", config =>
    {
        config.PermitLimit = 10;
        config.Window = TimeSpan.FromSeconds(60);
        config.SegmentsPerWindow = 4;
    });
    options.AddTokenBucketLimiter("tokenPolicy", config =>
    {
        config.TokenLimit = 20;
        config.TokensPerPeriod = 5;
        config.ReplenishmentPeriod = TimeSpan.FromSeconds(1);
        config.AutoReplenishment = true;
    });
    options.AddConcurrencyLimiter("concurrencyPolicy", config =>
    {
        config.PermitLimit = 2;
        config.QueueLimit = 5;
    });
});

var app = builder.Build();

// 2. Insert the rate limiting middleware into the pipeline:
app.UseRateLimiter(); 

// 3. Define endpoints and apply limits:
app.MapGet("/", () => "Hello World")
   .RequireRateLimiting("fixedPolicy"); // only 5 calls/min

app.MapGet("/data", () => "Some data")
   .RequireRateLimiting("slidingPolicy"); // sliding limits

app.MapPost("/upload", () => Results.Ok())
   .RequireRateLimiting("tokenPolicy"); // token bucket limits

app.MapGet("/heavy", async () =>
{
    // simulate a heavy task
    await Task.Delay(5000);
    return "Done";
}).RequireRateLimiting("concurrencyPolicy"); // concurrency-limited

app.Run();

In this example:

We call builder.Services.AddRateLimiter(…) to register the rate limiting service and define several named policies. (Note: In ASP.NET Core 8, calling AddRateLimiter is mandatory. If you forget it, .NET will throw an error telling you to add the service).

We call app.UseRateLimiter() after builder.Build() to plug the rate limiter into the request pipeline. This middleware will automatically enforce the policies on requests.

For each endpoint, we use .RequireRateLimiting(“<policyName>”) to attach a policy to that endpoint’s route. For example, our root endpoint (“/”) uses “fixedPolicy” so it’s limited to 5 requests per minute.

When a client exceeds a limit, the middleware will return a 429 Too Many Requests by default. (You can customize options.RejectionStatusCode as shown if you want a different code) The client might also see a Retry-After header indicating when they can try again.

Pro Tip: Place app.UseRateLimiter() before your endpoints and after any middleware that shouldn’t bypass limits. For example, do it before UseAuthentication() or static file middleware if you want all traffic counted. In the example above, it’s right after app = builder.Build().

Using Rate Limiting with MVC Controllers

If you prefer the traditional controller pattern instead of minimal APIs, rate limiting works similarly. You still call AddRateLimiter in Program.cs, then you can either require policies on routes or use attributes in your controller code.

For instance, in Program.cs:

builder.Services.AddControllers();
builder.Services.AddRateLimiter(options => {
    options.AddFixedWindowLimiter("fixedPolicy", config => {
        config.PermitLimit = 10;
        config.Window = TimeSpan.FromSeconds(60);
    });
    options.AddSlidingWindowLimiter("slidingPolicy", config => {
        config.PermitLimit = 20;
        config.Window = TimeSpan.FromSeconds(60);
        config.SegmentsPerWindow = 4;
    });
});
var app = builder.Build();
app.UseRateLimiter();
app.UseRouting();
app.UseEndpoints(endpoints =>
{
    endpoints.MapControllers()
             .RequireRateLimiting("fixedPolicy"); // apply fixedPolicy to all controllers by default
});
app.Run();

Then in your controller, you can override or refine policies using attributes. For example:

using Microsoft.AspNetCore.RateLimiting;

[ApiController]
[Route("[controller]")]
[EnableRateLimiting("fixedPolicy")]    // Apply fixedPolicy to this controller by default
public class HomeController : ControllerBase
{
    [HttpGet("index")]
    public IActionResult Index() { ... }

    [HttpGet("privacy")]
    [EnableRateLimiting("slidingPolicy")]  // Override with slidingPolicy for this action
    public IActionResult Privacy() { ... }

    [HttpGet("nolimit")]
    [DisableRateLimiting]  // This action has no rate limit
    public IActionResult NoLimit() { ... }
}

In the snippet above, the whole HomeController uses “fixedPolicy” except for the Privacy action which explicitly uses “slidingPolicy”, and the NoLimit action which disables rate limiting altogether.

Microsoft’s docs show a similar pattern. This attribute-based approach can be useful when different controllers or actions need different policies. Alternatively, you can skip attributes and just apply .RequireRateLimiting(…) when mapping routes in Program.cs.

Configuring Rate Limiter in appsettings.json

Hard-coding limits in code is fine for demos, but in real apps you’ll want to adjust limits without recompiling. In ASP.NET Core, a common practice is to put rate limit settings in appsettings.json and bind them to options.

For example, in appsettings.json you might have:

"MyRateLimitOptions": {
  "PermitLimit": 5,
  "WindowSeconds": 60,
  "SlidingPermitLimit": 10,
  "SegmentsPerWindow": 4,
  "TokenLimit": 20,
  "ReplenishmentPeriodSeconds": 1,
  "TokensPerPeriod": 5,
  "AutoReplenishment": true,
  "QueueLimit": 5
}

Then in your startup code:

builder.Services.Configure<MyRateLimitOptions>(
    builder.Configuration.GetSection("MyRateLimitOptions"));

var myOptions = builder.Configuration.GetSection("MyRateLimitOptions")
                                     .Get<MyRateLimitOptions>();

builder.Services.AddRateLimiter(options => {
    options.AddFixedWindowLimiter("fixedPolicy", config => {
        config.PermitLimit = myOptions.PermitLimit;
        config.Window = TimeSpan.FromSeconds(myOptions.WindowSeconds);
        config.QueueLimit = myOptions.QueueLimit;
    });
    options.AddSlidingWindowLimiter("slidingPolicy", config => {
        config.PermitLimit = myOptions.SlidingPermitLimit;
        config.Window = TimeSpan.FromSeconds(myOptions.WindowSeconds);
        config.SegmentsPerWindow = myOptions.SegmentsPerWindow;
        config.QueueLimit = myOptions.QueueLimit;
    });
    // ... and so on for token and concurrency
});

Here we define a MyRateLimitOptions class to match the JSON (properties like PermitLimit, WindowSeconds, etc.). We bind it via Configure<T> and/or Bind(…) so that the values come from configuration. This way, you can change the limits (for example, between Development and Production environments) just by editing the JSON or environment variables, without touching code.

Microsoft’s sample demonstrates this pattern with builder.Services.Configure<MyRateLimitOptions>(configuration.GetSection(…)), which is exactly how you externalize rate limit settings.

Using dependency injection for the config also means you could inject IOptions<MyRateLimitOptions> into services or controllers if needed. In practice, I often do this: it’s very handy when I want to toggle limits or integrate with feature flags or a control panel.

Real-World Use Cases

Rate limiting is not just theoretical – it solves concrete problems. Here are some situations where I (and many teams) apply it:

Login or Auth Endpoints: Preventing brute-force attacks is a classic use case. For example, you might allow only 5 login attempts per minute per user/IP. A fixed window or token bucket limiter can block excessive retries. In one project, adding a simple rate limiter on the login route immediately cut down spam requests and lockouts.

Public/Open APIs: If you offer a public API (e.g. a free-tier data service), you often need to cap usage per API key. Sliding window limiters are good here to ensure developers don’t hog the service. Twitter and GitHub APIs famously enforce rate limits per user/IP. (In .NET, you could use RateLimitPartition to create per-user buckets if needed.)

Search and Data APIs: For endpoints that can be called in bursts (like a product search or analytics query), a token bucket is useful. It lets clients burst occasionally (up to the bucket size), but then throttles to a steady rate. This way, casual spikes are allowed but no one can flood the service long-term.

Heavy Background Jobs: If your API triggers heavy background processing (like image resizing or database reports), a concurrency limiter can help. For instance, AddConcurrencyLimiter with a small PermitLimit ensures only a few jobs run at once, preventing resource exhaustion.

Fairness Across Clients: If your users belong to different tiers (free vs. premium) or you want to isolate tenants, you can combine rate limiting with partitioning by user ID or key. This ensures one user hitting their limit doesn’t affect others.

In short, anywhere too many calls can harm your service, you put rate limits. Over the years, I’ve integrated rate limiting into APIs for SaaS platforms, public endpoints, and internal microservices. It was always a relief when the first real attack or traffic spike occurred and our endpoints gracefully throttled, rather than crashed.

Visualizing the Rate Limiting Flow

It often helps to visualize how rate limiting fits into your system. Consider this simplified flow (similar to the earlier diagram):

A client sends a request to your ASP.NET Core API.

The Rate Limiting Middleware intercepts the request before it reaches your controllers.

The middleware looks up the applicable policy (e.g. fixed, sliding, etc.) and checks the current usage count for that client (by IP, API key, or a global bucket).

If under limit: It decrements or notes the usage and allows the request to proceed (call your endpoint).

If over limit: It short-circuits and immediately returns a 429 response. Optionally it can include a Retry-After header so the client knows when the limit resets.

This flow ensures that your backend endpoints never see more requests than they should. The beauty of using the built-in ASP.NET Core 8 middleware is that it handles all this logic for you. You just define the rules and attach them to endpoints. The image below (from Microsoft’s documentation) illustrates a similar concept with a user+title example, but in your case it might be user+API endpoint:

User X requests the API.

The service (with rate limiting enabled) checks “Is User X at or above their limit?”

If no, returns 200 OK normally.

If yes, returns 429 Too Many Requests with a retry delay.

Understanding this helps you debug rate limiting issues (e.g. why a particular client is seeing 429’s) and properly test your configuration.

Project Structure Example

Here’s how a small .NET 8 project using rate limiting might be organized (GitHub-style structure):

RateLimiterDemo/
├── Program.cs
├── appsettings.json
├── Models/
│   └── MyRateLimitOptions.cs    // (optionally define your settings class here)
├── Controllers/
│   └── HomeController.cs       // example API controller
└── Middleware/
    └── (any custom middleware, though not needed for built-in rate limiting)

Program.cs: Contains the code to set up AddRateLimiter(), call UseRateLimiter(), and map endpoints or controllers as shown above.

appsettings.json: Holds the JSON configuration for rate limits (as shown in the previous section).

Models/MyRateLimitOptions.cs: A class to bind your JSON config to a strongly-typed object.

Controllers/ (or Endpoints): Your API logic. You apply [EnableRateLimiting(“policy”)] on controllers or use .RequireRateLimiting(“policy”) on routes inside Program.cs.

For example, Program.cs might look like:

The key takeaway is to have a clear setup: Program.cs to configure everything, appsettings.json for your limits, and your controllers/endpoints decorated or mapped with those limits.

Best Practices and Tips

Drawing from my experience and community guidance, here are some best practices when implementing rate limiting in .NET Core 8:

Always Add the Service: Don’t forget builder.Services.AddRateLimiter() on startup. In ASP.NET Core 8 this is required (or you’ll get a clear error). I once migrated a project and spent time debugging an error that ultimately just reminded me to call AddRateLimiter.

Order of Middleware: Call app.UseRateLimiter() early in the pipeline (before endpoints). You want to catch excessive requests before any heavy processing (or before static files). However, place it after anything that should bypass limits (e.g. a health check endpoint or status page, if needed).

Use Configuration: As shown, move your numeric limits to config (appsettings or environment). This way you can tweak them per environment (dev vs prod) without redeploying code. I generally use builder.Configuration.GetSection(“MyRateLimitOptions”).Bind(options) or IOptions<MyRateLimitOptions>.

Choose the Right Strategy: Think about your traffic patterns. For login endpoints, a fixed or token bucket limiter is usually enough. For APIs where fairness is crucial, sliding or partitioned limiters can help. It’s okay to mix: you can apply different policies to different routes or controllers.

Testing & Monitoring: Before enabling in production, simulate traffic to test the behavior. Also, log or count the number of 429 responses so you can tune limits. ASP.NET Core 8’s limiter can emit metrics (if you hook into RateLimitMetrics), which I recommend doing if you have a monitoring system.

Graceful Message: Customize the rejection response if needed. The middleware lets you specify the status code (default 429) and you can write custom messages. Personally, I return a friendly JSON telling developers their rate limit was exceeded and when to retry.

Cache/Storage: The built-in rate limiter in .NET Core uses in-memory stores by default. For multi-instance or scaled-out scenarios, consider using a distributed store (Redis, etc.) via PartitionedRateLimiter or custom RateLimiter providers. This ensures limits are shared across servers. (If you have multiple API instances and only in-memory counting, each instance resets independently which can lead to effectively higher limits than intended.)

Consider Client Identification: By default, rate limits are global or per-route. You’ll usually want to partition by client key or IP. The ASP.NET Core middleware lets you use PartitionedRateLimiter.Create to apply limits per IP or per user ID. For example, PartitionedRateLimiter.Create<HttpContext, string>(ctx => RateLimitPartition.GetFixedWindowLimiter(ctx.Connection.RemoteIpAddress.ToString(), …)) would create a separate fixed window limiter for each IP address.

Queueing and Ordering: Be careful with queue sizes. The QueueLimit setting lets you buffer extra requests (it’s basically how many requests are allowed to wait when the limit is hit). Setting it too high might lead to lots of queued requests and potential delays. I usually keep it small (5 or 10). Also, QueueProcessingOrder (OldestFirst vs NewestFirst) determines which queued requests get through first; OldestFirst is typically fair.

Implementing rate limiting can involve a bit of tuning. In one project, I initially set a limit of 100 req/minute but found that in practice too many clients were hitting it. We monitored the rejections and slowly increased the limit while still protecting the backend. Having metrics and logs was critical there.

Common Mistakes to Avoid

When working with rate limiting in ASP.NET Core 8, watch out for these pitfalls:

Forgetting AddRateLimiter(): As noted earlier, .NET 8 requires builder.Services.AddRateLimiter(). If omitted, you’ll see an error that services are missing. Always double-check that call.

Incorrect Middleware Order: Placing UseRateLimiter() after endpoints or too late will mean it never actually intercepts requests. Make sure it’s before your MapGet/MapControllers calls.

Not Applying the Policy: Defining a policy is not enough; you must attach it. Either use .RequireRateLimiting(“policyName”) on each route (or MapDefaultControllerRoute().RequireRateLimiting(…) for controllers) or use [EnableRateLimiting(“policyName”)] on controllers/actions. A common mistake is to define AddRateLimiter but forget to call RequireRateLimiting, which results in no limits being enforced at all (unless you called MapDefaultControllerRoute().RequireRateLimiting(…) globally).

Global vs Endpoint Limits: If you call AddRateLimiter multiple times (as shown in examples for readability), remember that’s actually adding to the same options builder. You typically chain all .AddXXXLimiter calls inside one AddRateLimiter call. Doing multiple separate calls will override previous ones. (The Microsoft docs use multiple AddRateLimiter(_) calls in examples, but be cautious: it’s easier to do it once with multiple .Add*Limiter() inside.)

Not Accounting for Clocks: In fixed windows, forgetting that windows reset on the server clock can cause odd behavior near UTC minute boundaries if your windows are aligned (or if servers have drift). Sliding and token bucket mitigate this, but be mindful of timezones if you generate user-visible “reset” times.

Overly Aggressive Limits: It’s possible to shoot yourself in the foot by setting limits too low. Start lenient (perhaps log 429s at first) then tighten. If your app has occasional legitimate high traffic (like report generation), an overly strict limiter could break functionality.

Ignoring Partitioning: By default, limits are usually global across all clients. If you want per-client fairness, implement partitioning by IP or API key. Otherwise, one “noisy neighbor” could exhaust the limit for everyone else.

Missing Retry Headers: The middleware can send a Retry-After header, but only if you set it up. If you want clients to know how long to wait, ensure options.RetryAfterHeader is enabled. Otherwise, all your clients see is a generic 429.

By avoiding these mistakes, you’ll have a smoother rate limiting rollout. In my own projects, forgetting the UseRateLimiter() call or mixing up policy names have been the quickest ways I’ve tripped up – so I put reminders or comments in code to double-check them.

Conclusion

Rate limiting in ASP.NET Core 8 (aka .NET 8) is now easier than ever thanks to built-in middleware. In this blog, we covered what rate limiting in .NET 8 is and why it matters (protecting API security and performance, explored the main rate limiting algorithms (fixed window, sliding window, token bucket, concurrency) and how to configure them, and walked through example code for minimal APIs and MVC controllers. We also discussed external configuration, practical use cases (like throttling login attempts or public API access), and best practices from personal experience.

As a senior developer, I encourage you to apply these techniques in your next ASP.NET Core project. Start by deciding which endpoints need protection, choose appropriate limits, and monitor the results. Rate limiting is not just a “nice to have” – it’s essential for ASP.NET Core API security and reliability. By following the examples and tips here, you can prevent downtime, ensure fair usage, and keep your API running smoothly for all clients.

Resources for you : Microsoft

FAQ

What is rate limiting?

Rate limiting (or request throttling) is a technique to control how many requests a client or user can make to your API in a given time frame. It helps prevent abuse and ensures fair access and consistent performance. For example, you might limit users to 5 requests per second on an endpoint. In .NET Core 8, you can implement this via built-in middleware with various algorithms (fixed window, token bucket, etc.).

How do I enable rate limiting in .NET Core 8?

Add the rate limiter services in Program.cs: call builder.Services.AddRateLimiter(…) to define your policies, then call app.UseRateLimiter() before mapping your endpoints. Finally, attach policies to endpoints using .RequireRateLimiting(“policyName”) or [EnableRateLimiting(“policyName”)]. Don’t forget AddRateLimiter() – .NET 8 will not activate rate limiting without it.

What’s the difference between fixed window and sliding window limiters?

A fixed window limiter counts requests in discrete intervals (e.g. per minute). It’s simple but can have bursts at window edges. A sliding window limiter divides the window into overlapping segments and carries unused quota forward, smoothing out bursts. Use sliding window when you want a more even request distribution; use fixed when simplicity is enough.

When should I use a token bucket limiter?

Use a token bucket limiter when you want to allow short bursts of traffic but enforce a sustained rate. It’s defined by a bucket capacity and a refill rate. For example, you might allow 20 quick requests if your bucket holds 20 tokens, but then only 5 more per second after that. This is great for scenarios like user actions or messaging where occasional bursts are normal.

How does a concurrency limiter work?

A concurrency limiter only controls simultaneous in-flight requests. You set a max number of concurrent tasks. When that limit is reached, additional requests queue up or fail until a slot frees. Unlike time-window limiters, this does not restrict the total requests over time – it just caps parallel execution. It’s useful for expensive operations (e.g. file processing) where you want to avoid too many running at once.

What happens when a client exceeds the rate limit?

By default, ASP.NET Core’s rate limiting middleware returns an HTTP 429 Too Many Requests status. You can customize this code in options if needed. It’s good practice to include a Retry-After header so the client knows how long to wait. For most APIs, 429 is the standard response indicating “slow down, you’ve hit the limit.”

Can I apply different limits per user or API key?

Yes. ASP.NET Core 8 supports partitioned rate limiting. You can define your policy with a partition key (like user ID or IP address) so each user gets their own bucket. This ensures one user’s behavior doesn’t affect others. For example, you can create a separate fixed-window limiter for each remote IP address. This is especially useful for multi-tenant or public APIs.

For any query, you can contact us