Telerik blogs
ASP.NET Core

Efficient queries are essential to the performance and scalability of APIs that handle large volumes of data. In this post, we will explore best practices and strategies for optimizing backend queries using ASP.NET Core.

In backend applications that consume relational databases, the way queries are structured can be crucial to application performance. Inefficient queries, excessive data loading and the lack of appropriate filters are some of the pitfalls that can compromise performance, especially in APIs exposed to high access volumes.

In this post, we will cover techniques for implementing efficient queries, such as designing only the necessary properties, eliminating looping queries, using cursor-based paging, dynamic filters and others.

🔍 The Importance of Optimizing Data Searches in Web APIs

Efficient data processing is one of the main challenges faced by backend developers today. With the growing demand for data-driven applications, it’s become common to handle requests involving large volumes of information, requiring solutions that balance performance, scalability and usability.

Scenarios such as generating analytical reports, building real-time dashboards and managing large product catalogs in marketplaces are just a few examples where backend performance can become a bottleneck if not properly addressed.

In this post, we’ll explore some performance-improving resources and strategies we can adopt when creating web APIs that will process large volumes of data.

All code examples covered in this post are available in this GitHub repository: TurboNote Source Code.

🎯 1. Projecting Only the Necessary Properties

Using Entity Framework Core (EF Core) to query entity instances is very useful because it abstracts away the complexity of database interactions, allowing the use of strongly typed objects instead of raw SQL commands. However, it’s necessary to consider what data should be extracted in queries to optimize them.

Consider the example below. Although this code only requests the Slug property of each Post, the entire entity is retrieved. Consequently, unnecessary columns are transferred from the database during the query.

await foreach (var post in db.Posts.AsAsyncEnumerable())
{
    post.Slug.Validate();
}
SELECT [p].[PostId], [p].[AuthorId], [p].[CategoryId], [p].[Content], [p].[CreatedAt], [p].[PublishedAt], [p].[Slug], [p].[Title], [p].[UpdatedAt]
FROM [Posts] AS [p]

It is possible to optimize this query with the Select extension method by telling EF which columns to project:

await foreach (var postSlug in db.Posts.Select(b => b.Slug).AsAsyncEnumerable())
{
    postSlug.Validate();
}

Now the resulting SQL retrieves only the desired column:

SELECT [p].[Slug]
FROM [Posts] AS [p]

2. 🔄️ Eliminating Looped Queries

The problem known as N+1 queries is common when using EF and iterating over a list, resulting in new queries for each item.

Below is an example that uses multiple queries:

var posts = await db.Posts.ToListAsync();

foreach (var post in posts)
{
    var comments = await db.Comments
        .Where(c => c.PostId == post.PostId)
        .ToListAsync();

    post.Comments = comments;
}
SELECT [c].[CommentId], [c].[AuthorId], [c].[Content], [c].[CreatedAt], [c].[PostId]
FROM [Comments] AS [c]
WHERE [c].[PostId] = @__post_PostId_0

SELECT [c].[CommentId], [c].[AuthorId], [c].[Content], [c].[CreatedAt], [c].[PostId]
FROM [Comments] AS [c]
WHERE [c].[PostId] = @__post_PostId_0

(...successive queries)

If there are 100 posts, the code will do 1 + 100 queries (1 for the posts and 1 for each group of comments).

Instead, we can do the following:

var comments = await db.Posts.Include(p => p.Comments).ToListAsync();

Which would result in the following SQL:

SELECT [p].[PostId], [p].[AuthorId], [p].[CategoryId], [p].[Content], [p].[CreatedAt], [p].[PublishedAt], [p].[Slug], [p].[Title], [p].[UpdatedAt], [c].[CommentId], [c].[AuthorId], [c].[Content], [c].[CreatedAt], [c].[PostId]
FROM [Posts] AS [p]
LEFT JOIN [Comments] AS [c] ON [p].[PostId] = [c].[PostId]
ORDER BY [p].[PostId]

Note that when using .Include(p => p.Comments), EF already brings the comments related to each post in a single query with LEFT JOIN, avoiding the N+1 problem.

3. 🚀 Using Cursor-based Pagination

Cursor-based pagination is a pagination technique where results are fetched from a marker (the cursor), usually represented by an item’s unique key, such as an ID or a creation date.

It works as follows: Consider a single reference value, called a cursor, which will be used to retrieve the next records from that point.

Instead of using a page number or an offset as in the OFFSET method, the client sends this cursor in the request (?after=105, for example), and the server returns the records that come after that value, limiting the number with LIMIT or Take. This method is more efficient and reliable, especially in large databases or those that undergo frequent changes, as it avoids the problems of duplicate or lost records that can occur with offset-based pagination.

One point to note is that, although more efficient than the OFFSET method, cursor-based pagination does not allow for direct jumps to a specific page; it requires sequential navigation.

The code below demonstrates how to implement cursor-based pagination:

app.MapGet("/posts/pagination", async (
    [FromQuery] int? after,
    [FromQuery] int pageSize,
    BlogDbContext db) =>
{
    const int MaxPageSize = 100;
    pageSize = pageSize > MaxPageSize ? MaxPageSize : pageSize;

    var query = db.Posts
        .AsNoTracking()
        .OrderBy(p => p.PostId)
        .Where(p => !after.HasValue || p.PostId > after.Value)
        .Take(pageSize + 1); // Get 1 more to know if there is a next page

    var posts = await query.ToListAsync();

    var hasNextPage = posts.Count > pageSize;
    var results = posts.Take(pageSize).ToList();

    var nextCursor = hasNextPage ? results.Last().PostId : (int?)null;

    return Results.Ok(new
    {
        Data = results.Select(p => new
        {
            p.PostId,
            p.Title,
            p.Slug,
            p.PublishedAt
        }),
        NextCursor = nextCursor
    });
});

If we execute the GET endpoint /posts/pagination/?pageSize=10 after=50, EF will generate the following SQL:

SELECT TOP(11) [p].[PostId], [p].[AuthorId], [p].[CategoryId], [p].[Content], [p].[CreatedAt], [p].[PublishedAt], [p].[Slug], [p].[Title], [p].[UpdatedAt]
FROM [Posts] AS [p]
WHERE [p].[PostId] > 50
ORDER BY [p].[PostId];

Note that here we define the after parameter, which acts as a cursor, allowing the client to request the next records from a specific PostId.

Also note that one more item than the requested pageSize is sent (Take(pageSize + 1)), which allows the server to know if there are more pages to be loaded without relying on additional counts. This extra item is not returned to the client, but it serves to define whether the NextCursor field will be populated with the PostId of the last visible item, which facilitates sequential navigation.

Finally, we use AsNoTracking() to improve performance by preventing EF Core from tracking the returned objects, something unnecessary in read-only scenarios.

4. 🏓 Using Dynamic Filters with IQueryable

A common mistake in database queries is loading all records into memory and only then applying filters. This approach is inefficient, as it consumes unnecessary resources and impacts application performance.

Instead, it’s recommended to apply filters directly to the query. One way to do this is through the IQueryable interface, which allows the filter to be translated into SQL and executed in the database, returning only the data actually needed.

Note the example below:

   // Inefficient: loads everything from the database
    var allPosts = await db.Posts.ToListAsync();

    // Filtering in memory (LINQ to Objects)
    var filteredPosts = allPosts;

    if (published == false)
    {
        filteredPosts = filteredPosts.Where(p => p.PublishedAt == null).ToList();
    }

    if (categoryId.HasValue)
    {
        filteredPosts = filteredPosts.Where(p => p.CategoryId == categoryId.Value).ToList();
    }

    filteredPosts = filteredPosts
        .OrderByDescending(p => p.PublishedAt)
        .ToList();

Here we are loading all the data into memory, and then filtering it later.

An alternative would be to use IQueryable, dynamically assembling the filter and finally applying the query to the database:

IQueryable<Post> query = db.Posts;

// Filter applied to the SQL query
if (published == false)
{
    query = query.Where(p => p.PublishedAt == null);
}

if (categoryId.HasValue)
{
    query = query.Where(p => p.CategoryId == categoryId.Value);
}

// Sorting applied in the SQL query
query = query.OrderByDescending(p => p.PublishedAt);

// Execute the query in the database
var posts = await query.ToListAsync();

Assuming categoryId = 3 and published = false, the following SQL will be generated:

SELECT [p].[PostId], [p].[AuthorId], [p].[CategoryId], [p].[Content], 
       [p].[CreatedAt], [p].[PublishedAt], [p].[Slug], [p].[Title]
FROM [Posts] AS [p]
WHERE [p].[PublishedAt] IS NULL AND [p].[CategoryId] = 3
ORDER BY [p].[PublishedAt] DESC

This way, a more efficient query is executed than the first example, as it filters the records we need directly in SQL, without the need to work with the data in memory.

5. 🙅 Do Not Transfer Unnecessary Functions to the Query

Sometimes we need to perform conversions on manipulated data. A common example is converting text to uppercase or lowercase, which in C# can be done using the ToLower() and ToUpper() methods. However, if we’re not careful, these functions can be sent to the query and compromise performance. Note the example below:

var user = db.Users.Where(u => u.UserName.ToUpper() == userName.ToUpper());

This code will generate the following SQL query:

SELECT [u].[UserId], [u].[CreatedAt], [u].[Email], [u].[UserName]
FROM [Users] AS [u]
WHERE UPPER([u].[UserName]) = @__ToUpper_0

The problem here is that the UPPER clause was transferred to the query, and this can compromise performance depending on the scenario. After all, this function will be executed on all lines resulting from the query.

An alternative in this case is to rely solely on the collation of the UserName column. If the column uses a case-insensitive collation (such as SQL_Latin1_General_CP1_CI_AS in SQL Server), then no transformation is necessary:

var user = db.Users.Where(u => u.UserName == userName);

The generated SQL will be:

SELECT [u].[UserId], [u].[CreatedAt], [u].[Email], [u].[UserName]
FROM [Users] AS [u]
WHERE [u].[UserName] = @__userName_0

Therefore, it is not necessary to use ToUpper(); SQL Server itself will perform the conversion using the SQL_Latin1_General_CP1_CI_AS collation. In this case, the name “john” will be the same as “John” and “JOHN.”

To check if the column has the SQL_Latin1_General_CP1_CI_AS collation, simply run the query:

SELECT COLUMN_NAME, COLLATION_NAME 
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = 'Users' AND COLUMN_NAME = 'UserName';

If it has the collation, it will return:

COLUMN_NAMECOLLATION_NAME
UserNameSQL_Latin1_General_CP1_CI_AS

6. 📦 Reduce the Number of Queries with Caching

Imagine an endpoint that returns post categories. They rarely change but are accessed thousands of times a day. In this case, querying the database every time is a waste of resources. To address this, we can use caching strategies, such as using the local memory cache (ImemoryCache).

Note the example below:

app.MapGet("/posts/categories", async (
    [FromServices] IMemoryCache cache,
    [FromServices] BlogDbContext db) =>
{
    const string cacheKey = "categories_list";

    if (!cache.TryGetValue(cacheKey, out List<Category> categories))
    {
        categories = await db.Categories
            .OrderBy(c => c.Name)
            .ToListAsync();

        var cacheOptions = new MemoryCacheEntryOptions()
            .SetSlidingExpiration(TimeSpan.FromMinutes(10))
            .SetAbsoluteExpiration(TimeSpan.FromHours(1));

        cache.Set(cacheKey, categories, cacheOptions);
    }

    return Results.Ok(categories);
});

Instead of accessing the database with each request, the database checks whether data is present in the cache. If there is not already data, the query is executed, and then the data is temporarily saved in the cache in memory. Thus, the next requests will use the data directly from the cache.

Using a caching strategy reduces unnecessary database accesses, resulting in improved performance and enabling scalability.

7. ✨ Using Optimized Raw SQL

Let’s analyze a situation where writing a manual SQL query (raw SQL) is more advantageous than using LINQ alone.

Imagine an endpoint that displays a blog summary through a dashboard with aggregated statistics. This dashboard should display: total posts, total published posts, total comments and last published date.

Using the LINQ approach and in-memory manipulation, we could do the following:

var totalPosts = await db.Posts.CountAsync();

var publishedPosts = await db.Posts.CountAsync(p => p.PublishedAt != null);

var totalComments = await db.Comments.CountAsync();

var lastPublished = await db.Posts
     .Where(p => p.PublishedAt != null)
     .MaxAsync(p => p.PublishedAt);

 return Results.Ok(new
 {
     totalPosts,
     publishedPosts,
     totalComments,
     lastPublished
 });

Which will generate the following SQL:

SELECT COUNT(*)
FROM [Posts] AS [p]

SELECT COUNT(*)
FROM [Posts] AS [p]
WHERE [p].[PublishedAt] IS NOT NULL

SELECT COUNT(*)
FROM [Comments] AS [c]

SELECT MAX([p].[PublishedAt])
FROM [Posts] AS [p]
WHERE [p].[PublishedAt] IS NOT NULL

The problem with this approach is that four separate queries are executed, which can cause overhead in high-load environments. Furthermore, it’s more difficult to optimize queries using only SQL indexes.

A more performant approach would be to create a single query using raw SQL:

var result = await db
     .DashboardStats
     .FromSqlRaw(@"
         SELECT
             (SELECT COUNT(*) FROM Posts) AS TotalPosts,
             (SELECT COUNT(*) FROM Posts WHERE PublishedAt IS NOT NULL) AS PublishedPosts,
             (SELECT COUNT(*) FROM Comments) AS TotalComments,
             (SELECT MAX(PublishedAt) FROM Posts WHERE PublishedAt IS NOT NULL) AS LastPublished
     ")
     .AsNoTracking()
     .FirstAsync();

 return Results.Ok(result);

Note that in this approach, we use raw SQL to manually add a custom query that returns all results in a single query. This way, SQL Server can evaluate all subqueries in parallel and optimize cached page reads.

The generated SQL will look like this:

SELECT TOP(1) [t].[LastPublished], [t].[PublishedPosts], [t].[TotalComments], [t].[TotalPosts]
FROM (
	    SELECT
		    (SELECT COUNT(*) FROM Posts) AS TotalPosts,
		    (SELECT COUNT(*) FROM Posts WHERE PublishedAt IS NOT NULL) AS PublishedPosts,
		    (SELECT COUNT(*) FROM Comments) AS TotalComments,
		    (SELECT MAX(PublishedAt) FROM Posts WHERE PublishedAt IS NOT NULL) AS LastPublished
) AS [t]

🌱 Conclusion

A prerequisite for creating a good API is enabling good performance. To achieve this, it’s essential to adopt practices that optimize data access and querying. Throughout this post, we’ve explored seven tips on how to make the most of ASP.NET Core’s features and the database, reducing bottlenecks and improving the efficiency of backend applications.

I hope these tips help you develop even faster and more scalable APIs capable of efficiently handling large volumes of data. See you next time!


assis-zang-bio
About the Author

Assis Zang

Assis Zang is a software developer from Brazil, developing in the .NET platform since 2017. In his free time, he enjoys playing video games and reading good books. You can follow him at: LinkedIn and Github.

Related Posts

Comments

Comments are disabled in preview mode.