Best strategy for bulk insert and ignoring duplicate key exceptions

4 posts, 0 answers
  1. Mathieu
    Mathieu avatar
    21 posts
    Member since:
    Feb 2014

    Posted 08 Apr 2015 Link to this post

    Hey,

     I'm using Telerik Data Access and have a use case where I need to bulk insert thousands of rows (as part of XML parsing logic) into a SQL server database table.

     The records I'm inserting do not have a natural primary key, so I created a surrogate (simple autoincrement int). I do however have declared a unique key index on the table (spanning about 6 columns) to determine as much as possible the uniqueness of the record.

     Now I want to do the following with Telerik Data Access:

    1) Insert thousands of records as fast as possible

    2) Whenever a DuplicateKeyException is detected, ignore that entity and move on to the next one.

     

    What I currently have is a foreach looping over all entities to save, and then call a "Save" method (all using the same context).

    Inside the save method I "try" to add the entity to the context, call savechanges and then "catch" for DuplicateKeyException, but I'm not sure what to write in the catch to "ignore" the exception and remove the entity from the context. It seems that if I just catch the exception, all subsequent calls to savechanges rethrow the same exception as the entity is kept into the context.

     

    Any advise on this ?

    Any other approaches you would recommend ?

     

    thanks

    Mathieu

  2. Boyan
    Admin
    Boyan avatar
    100 posts

    Posted 09 Apr 2015 Link to this post

    Hi Mathieu,

    While similar approach to the one you describe is also possible given the requirements and set-up you provided it seems that there could be a more appropriate way. To achieve such bulk insert you could take advantage of the Telerik Data Access ADO API to execute a custom SQL statement and enable the IGNORE_DUP_KEY=ON setting of the SQL Server (please find more information on it from this MSDN article). This setting could be specified on a unique index and would make any inserted values that violate the index be skipped without causing any errors.
    Using the ADO API you could construct and execute an insert statement inserting a number of rows that best matches your requirements. Please refer to this documentation article that demonstrates how a parameterized query could be executed through the ADO API.
    Is such approach feasible in your situation?

    Please note that an exception (though a different one) would still occur if an unique index if violated through the context api (adding an entity to the context) as the expected number of objects to be inserted would be different with those actually inserted.  

    I hope this is helpful. Should you need any further assistance or have any more questions do not hesitate to get back to us.

    Regards,
    Boyan
    Telerik
     
    OpenAccess ORM is now Telerik Data Access. For more information on the new names, please, check out the Telerik Product Map.
     
  3. DevCraft banner
  4. Mathieu
    Mathieu avatar
    21 posts
    Member since:
    Feb 2014

    Posted 10 Apr 2015 in reply to Boyan Link to this post

    Dear Boyan,

     thank you for your reply. For the time being, the workaround I did was to create an INSTEAD OF INSERT trigger on the affected table, and inside the trigger check if the inserted record already exists, and only effectively inserting it when it does not.

     But your solution seems very viable too and might be better performance wise. I however would have liked to avoid having to dive into the low level ADO API and wondered if there was no elegant solution with the standard Telerik Data Access API ?

     

    thanks

  5. Boyan
    Admin
    Boyan avatar
    100 posts

    Posted 14 Apr 2015 Link to this post

    Hi Mathieu,

    This is a viable solution as well. Telerik Data Access does not provide such functionality out of the box though. In order to achieve this with the context API you would need to implement such custom bulk insert logic.

    What you could try do to is to insert the objects in groups of given number of items. Then if a group fails you could try and identify which item caused it and react on it by removing it from the originally inserted group of items. Unfortunately you would need to identify the problematic item based on the exception.
    In such scenario Telerik Data Access would provide transport batching for each group of insert statements. You could see how you to modify those settings from this documentation article

    This is why I would recommend you to use one of previously discussed approaches using SQL Server functionality to aid you.

    Do not hesitate to get back to us with any more questions.

    Regards,
    Boyan
    Telerik
     
    OpenAccess ORM is now Telerik Data Access. For more information on the new names, please, check out the Telerik Product Map.
     
Back to Top