Common mistake and fix in Entity Framework

Entity Framework is an Object Relational Mapper (ORM) which is a type of tool that simplifies mapping between objects in your software and columns of relational database. Entity Framework implements the IQueryable interface to create dynamic sql queries for many RDBMS like MySQL, MSSQL, POSTGRES. It is commonly used in implementing the Repository Pattern.

However, I often find developers use it incorrectly. It is mainly because working with IQueryable is very similar to working with the more common IEnumerable type. Both types have same extension methods like Where, First, Select, etc. that look the same but work very differently under the hood.

entityframework_common_mistake

Incorrect usage will result in bugs that might cause huge performance issues down the road. This is especially true when deploying to Production — with huge data — after testing on an environment where there is limited data.

To demonstrate this, let us take a hypothetical example of a Catalog Application. This application needs to create, remove, and query products. To support this, you design a ProductRepository class that uses Entity Framework to perform CRUD operations.

public class ProductRepository {
    private readonly DbContext _context;

    public ProductRepository(DbContext context){
        _context = context;
    }

    public IEnumerable<Product> All(){
        return _context.Products.AsEnumerable();
    }
    public Product GetById(int id){
        return _context.Products.Find(id);
    }
    public void Remove(Product product){
        _context.Products.Remove(product);
        _context.SaveChanges();
    }
}

Problem

The above code is deployed, and everything is working as expected. A few weeks later, you are asked to introduce another feature that allows filtering of Products using its Title. To achieve this, you add the below code snippet to the ProductRepository.

...
public IEnumerable<Product> FilterByTitle(string title){
    return All().Where(e=>e.Title.BeginsWith(title));
}

The code is tested, and everything seems to be working correctly and marked ready for deployment. Once you deploy this to production, problems begin to emerge. After running for a while, your application consumes a huge amount of memory, slowing the system down. Why does this happen?

Analysis

To understand this, you need to understand the difference between IQueryable and IEnumerable. The main difference between them is where the filter logic is executed. IEnumerable executes on the client side (in memory) whereas IQueryable executes on the database.

In the snippet that was deployed to production, we are invoking the Where() method after invoking the All() method. By invoking the All() method, we convert the result set to IEnumerable, which works in memory. What is essentially happening is that all the Products are firstly loaded into memory from the database. Only then is the required filtering done in memory. Millions of products in Production gets loaded into memory causing the high usage that slows the application down. In some cases, it might also lead to application crashes.

Imagine your wife asks you to get some eggs from the store. To get the eggs, you buy everything in the store, just so that you could filter the eggs to give to your wife. That is what the above code is comparable to. Besides being extremely wasteful, it is also stupid and expensive.

Solution

The best way to accomplish the filtering requirement is to use the IQueryable Where method. To use that, we just skip the AsEnumerable call.

public class ProductRepository {
    private readonly DbContext _context;

    public ProductRepository(DbContext context){
        _context = context;
    }
    ...
    public IEnumerable<Product> FilterByTitle(string title){
        return _context.Products.Where(p=>p.Title.BeginsWith(title)).AsEnumerable();
    }
}

In the above code snippet, note that we skipped the call to the All() that returns an IEnumerable. Instead we use the Where method of the IQueryable. This invokes a sql command to the underlying database that does the filtering and only returns the products that match the criteria, significantly improving the performance as no unnecessary products are loaded into memory.

Takeaway

I hope after reading this, developers understand the difference IEnumerable and IQueryable, and also avoid this common mistake when using Entity Framework.

Kshitiz

Kshitiz Shakya

Principal Software Engineer at LIS Nepal