Cachnig of huge data - strategy and performance considerations

Thread is closed for posting
2 posts, 1 answers
  1. Zdeněk
    Zdeněk avatar
    27 posts
    Member since:
    May 2007

    Posted 07 Mar 2014 Link to this post

    I'm long time fan of Telerik OpenAccess/Data Access (TDA) and I used it frequently in many sucessfull projects. I like it!
    2nd level cache is very handy for usual tasks. But now I felt in problem with huge data. I'm preparing new ASP.NET/WebForms application - eCommerce store. Nothing unusual, core of application are three tables with n:m relation: Products (ProductId, Name...), ProductsInCategories (ProductId, CategoryId...) and Categories (CategoryId, ParentCategoryId, Name...).
    First problem is that there are over 100.000 rows in Products table - it means we have nearly million records in ProductsCategories table (every product can be in one or more categories). Second problem is that Categories are in tree structure and I want to use queries like "get all products from category XYZ and from all its childrens".
    Using standard ASP.NET caching and "POCO" objects is quick and easy solution but now I want to utilise TDA and its 2nd level cache. Are there some general recommendations for using TDA internal caching with huge data? I tried both approaches with about 500.000 records:
    1) I loaded data through TDA with LINQ, projected them into simple objects and IList of those objects cached. Caching took about 7 seconds and page working with those cached data is generated in about 10-20ms. Nice.
    2) Second approach: I load data with TDA and let its second level cache store them (I increased limit of storable objects in backend TDA configuration to 1M). Caching List of ProductsCategories took a little more time (about 10 seconds, it is still acceptable), but working with those data and page generation (same operations as before) took about 100 ms. It is too much :( Also I found some strange behavior: if you did not specify number of records you want to retrieve from database in first call (LINQs .Take), TDA did not cache any data. If you specify number of records (eg. .Take(1000000)) TDA will cache it. Strange.

    Can you reccomend me general approach for working with Telerik DataAccess with huge tables? Thank you!
  2. Answer
    Kaloyan Nikolov
    Kaloyan Nikolov avatar
    118 posts

    Posted 12 Mar 2014 Link to this post

    Hi Zdenek,

    The Telerik Data Access Second Level cache is doing some extra work for you like invalidating objects in case there are insert, delete or update operations that might influence the result of some select statements. Also the cache maintains the relation between a query, the query parameters and its result so the next time you execute the same query you will receive the cached objects and do not hit the database. Of course those operations will cost some extra time but I think in your scenario the cache performs a bit slower than expected.

    One of the following reasons could influence the cache performance:
    - If you have projections the results are not cached. The reason is that the result is anonymous type and we cannot compare them from different parts of your application. Also using a projection over a cached object will be slower than using directly the cached object
    - The queries you execute could fill up the cache too often and thus purge operations to happen too often.You could validate this by profiling the queries that are executed against the DB and play with the cache limit settings.
    - The result for queries that return too many objects and we consider will fill up the cache is not cached. If you specify Take it is cached even we think they will fill up the cache fast. This is why you see the difference in the behavior with and without Take() in your linq queries.

    It is perfectly fine to use your own cache implementation if you do not expect too many changes in the cached objects. It will most probably work faster because it will be designed just for you scenario. 

    Also you could profile the queries you execute and see which is the one that takes most of the time. If you can list those queries and will be happy to help you to improve them if possible.

    I hope this helps. 

    Kaloyan Nikolov
    OpenAccess ORM is now Telerik Data Access. For more information on the new names, please, check out the Telerik Product Map.
Back to Top