Most memory efficient way to convert csv to excel in .NET Core API

1 Answer 50 Views

SpreadProcessing

Martin Herløv asked on 02 Nov 2023, 12:02 PM

I have data like this in CLOB field in the database. When a user downloads the data, it's converted to an Excel file using OpenXML.

Customer name¤Active¤customer id¤agreement name¤agreement id¤fund isin¤fund id¤rate active¤agreement active¤agreement vp key type¤agreement vp number¤rate type¤rate commision

I am converting the site so it can run in Kubernetes using the Blazor UI components. Today it's an old MVC site.

So how can I convert all the lines of code to an Excel sheet without using too much memory?

Each CLOB is around 32mb.

1 Answer, 1 is accepted

answered on 02 Nov 2023, 12:36 PM

Hello Martin,

Our SpreadStreamProcessing library is created specifically for efficiently processing XLSX and CSV documents. It allows you to create or read big spreadsheet documents with great performance and minimal memory footprint by writing the spreadsheet content directly to a stream without creating and preserving the document model in memory. Each time an exporter object is disposed of, the set values are written into the stream which allows you to create large documents with excellent performance.

To get started you can check out the following articles:

I hope this helps. Let me know if you have any further questions.

Regards,
Yoan
Progress Telerik

Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.

Martin Herløv

commented on 02 Nov 2023, 12:56 PM

I am aware of the components. Do you have an example creating a spreadsheet from a list of properties?

Martin Herløv

commented on 05 Nov 2023, 08:11 PM

I read the documentation and looked at the git repo. Not very helpful

Why don't you have an example of exporting some business data? same process as it was from a grid?

I have a list of persons. How do I stream them as an excel sheet?

Martin Herløv

commented on 05 Nov 2023, 08:14 PM | edited

I created a solution using CsvHelper.Excel It uses a lot of memory but works. with very little code

Yoan

commented on 07 Nov 2023, 09:33 AM

Hello, Martin

I am glad to hear that you have managed to find a solution on your own. Thank you for bringing this to our attention, but you are also correct that the examples in our documentation cover very few scenarios and I can assure you the matter will be looked into.

That said, if you are willing to share with us some sample business data that you are using, I can try to create a custom implementation that fits your scenario. Bare in mind that this is a Q&A Forum and anything you share will be made public so make sure you remove any sensitive data or replace it with some dummy entries instead.

I remain at your disposal if any assistance is required.

Regards,

Yoan

Martin Herløv

commented on 07 Nov 2023, 12:01 PM

Thanks, I would really appreciate that.

I read a about 130.000 lines from a CLOB in the database having this format:
It's legacy code. And not the best implementation.

Customer name¤Active¤customer id¤agreement name¤agreement id¤fund isin¤fund id¤rate active¤agreement active¤agreement vp key type¤agreement vp number¤rate type¤rate commision
Maj Bank A/S¤True¤66ff3dd7-d747-e511-801a-0050569702cb¤Aftale om markedsføring, distribution og investeringsrådgivning¤d773fdff-a962-e511-a49a-0050569702cb¤DK0060516854¤1013¤True¤True¤CD_IDENT¤07574¤GuaranteeComm¤0
Maj Bank A/S¤True¤66ff3dd7-d747-e511-801a-0050569702cb¤Aftale om markedsføring, distribution og investeringsrådgivning¤d773fdff-a962-e511-a49a-0050569702cb¤DK0060516854¤1013¤True¤True¤CD_IDENT¤07574¤PurchaseComm¤0
Maj Bank A/S¤True¤66ff3dd7-d747-e511-801a-0050569702cb¤Aftale om markedsføring, distribution og investeringsrådgivning¤d773fdff-a962-e511-a49a-0050569702cb¤DK0060516854¤1013¤True¤True¤CD_IDENT¤07574¤StockComm¤0,75
Maj Bank A/S¤True¤66ff3dd7-d747-e511-801a-0050569702cb¤Aftale om markedsføring, distribution og investeringsrådgivning¤d773fdff-a962-e511-a49a-0050569702cb¤DK0060622967¤1042¤True¤True¤CD_IDENT¤07574¤GuaranteeComm¤0

So first I convert it

This is the current code:

    [AllowAnonymous]
    [HttpGet("DownloadFile")]
    public async Task<ActionResult> DownloadFile(string token)
    {
        var id = _protector.Unprotect(token);
        long.TryParse(id, out var errorId);
        if (errorId == 0)
        {
            return NotFound();
        }

        var attachment = await _dbContext.ProshareErrors.FindAsync(errorId);
        if (attachment == null)
        {
            return NotFound();
        }

        var filename = $"ProshareErrors_{attachment.Stamp:dd-MM-yyyy}_{attachment.Id}.xlsx";
        var contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
        
        var agreements = CsvToExcelHelpers.ReadCsvFile(attachment.UploadedData);
        var excelStream = CsvToExcelHelpers.CreateExcelStream(agreements);
        Console.WriteLine(excelStream.Length);
        excelStream.Position = 0;
        return File(excelStream, contentType, filename);
    }

To be more efficient the code should start to stream the excel sheet in chunks while the sheet is being populated.
is it possible?

Yoan

commented on 10 Nov 2023, 11:51 AM

Hi Martin,

Thank you for the shared resources. I have created a sample SpreadStreamProcessing implementation specific to this case that imports a TXT file (could also be CSV) with values and exports it as an XLSX file. The key element here is setting the delimiter to the specific symbol so the values can be split appropriately:

 SpreadCsvSettings csvSettings = new SpreadCsvSettings()
{
    Delimiter = '¤'
};

using (FileStream fs = new FileStream("input.txt", FileMode.Open))
{
    using (FileStream stream = File.OpenWrite("output.xlsx"))
    {
        using (IWorkbookImporter workBookImporter = SpreadImporter.CreateCsvImporter(fs, csvSettings))
        {
            using (IWorkbookExporter workbookExporter = SpreadExporter.CreateWorkbookExporter(SpreadDocumentFormat.Xlsx, stream))
            {
                foreach (IWorksheetImporter worksheetImporter in workBookImporter.WorksheetImporters)
                {
                    using (IWorksheetExporter worksheetExporter = workbookExporter.CreateWorksheetExporter("First sheet"))
                    {
                        foreach (IRowImporter rowImporter in worksheetImporter.Rows)
                        {
                            using (IRowExporter rowExporter = worksheetExporter.CreateRowExporter())
                            {
                                foreach (ICellImporter cellImporter in rowImporter.Cells)
                                {
                                    using (ICellExporter cellExporter = rowExporter.CreateCellExporter())
                                    {
                                        string value = cellImporter.Value;
                                        double doubleValue;
                                        if (double.TryParse(value, out doubleValue))
                                        {
                                            cellExporter.SetValue(doubleValue);
                                        }
                                        else
                                        {
                                            cellExporter.SetValue(value);
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Regards,

Yoan

Martin Herløv

commented on 10 Nov 2023, 11:57 AM

Thanks for making the example 😊

I am doing this in an ASP.NET Core API controller. So no reading or writing to files.

I need to see how you start streaming to the client while creating the document. Very much like yield return in C#

Anna

commented on 13 Nov 2023, 08:25 AM

Hi Martin,

RadSpreadStreamProcessing is designed to work with the Stream class, so as long as the API Controller works with a stream, the logic should work the same as it does with a FileStream. So I don't expect you will have any issues. :) Of course, please, let us know if you encounter any along the way.

Regards,

Anna

Martin Herløv

commented on 23 Nov 2023, 02:41 PM

This works but I am still using the same amount of memory.
I need to find a way to stream in chunks.

The following code is long and dirty. The download starts after 2.3 seconds and the user can follow the download process.
unfortunately the file that is downloaded is corrupted.

Can you spot the error?

    var filename = $"ProshareUpload_{attachment.Stamp:dd-MM-yyyy}_{attachment.Id}.xlsx";
        var contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";

        var csvSettings = new SpreadCsvSettings
        {
            Delimiter = '¤'
        };

        Response.StatusCode = 200;
        Response.ContentType = contentType;
        Response.Headers.Add("Content-Disposition", $"attachment; filename={filename}");
        
        var i = 0;
        const int bufferSize = 1024 * 1024 * 10;

        var excelStream = new MemoryStream(bufferSize);
        using (var ms = new MemoryStream(Encoding.UTF8.GetBytes(attachment.UploadedData)))
        {
            using (IWorkbookImporter workBookImporter = SpreadImporter.CreateCsvImporter(ms, csvSettings))
            {
                using (IWorkbookExporter workbookExporter = SpreadExporter.CreateWorkbookExporter(SpreadDocumentFormat.Xlsx, excelStream))
                {
                    foreach (IWorksheetImporter worksheetImporter in workBookImporter.WorksheetImporters)
                    {
                        using (IWorksheetExporter worksheetExporter = workbookExporter.CreateWorksheetExporter("Log"))
                        {
                            foreach (IRowImporter rowImporter in worksheetImporter.Rows)
                            {
                                i++;
                                if(i % 10000 == 0)
                                {
                                    var buffer = new byte[bufferSize];
                                    excelStream.Position = 0;
                                    var bytesRead = await excelStream.ReadAsync(buffer, 0, bufferSize);
                                    if (bytesRead > 0)
                                    {
                                        await Response.Body.WriteAsync(buffer, 0, bytesRead);
                                        excelStream.Flush();
                                    }
                                }
                                using (IRowExporter rowExporter = worksheetExporter.CreateRowExporter())
                                {
                                    foreach (ICellImporter cellImporter in rowImporter.Cells)
                                    {
                                        using (ICellExporter cellExporter = rowExporter.CreateCellExporter())
                                        {
                                            var value = cellImporter.Value;
                                            if (double.TryParse(value, out var doubleValue))
                                            {
                                                cellExporter.SetValue(doubleValue);
                                            }
                                            else
                                            {
                                                cellExporter.SetValue(value);
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

        var buffer2 = new byte[bufferSize];
        excelStream.Position = 0;
        var bytesRead2 = await excelStream.ReadAsync(buffer2, 0, bufferSize);
        if(bytesRead2 > 0)
            await Response.Body.WriteAsync(buffer2, 0, bytesRead2);
        
        await excelStream.DisposeAsync();
        
        return new EmptyResult();
    }

Nikolay Demirev

commented on 28 Nov 2023, 02:00 PM

Hi Martin,

In theory, you should be able to export and stream a document in chunks, but the chunks have to be separated when the writing in each file contained in the ZIP archive is finished. The XLSX document is a ZIP archive containing many XML files. This limitation comes from the way the ZIP archive is structured.

Each file inside the ZIP archive has its file header, followed by its data. The header gives information on the data, such as length and other things. So, its header can't be written without knowing how long the file will be. That is why there is no way to send it through a network connection before writing an entire XML file inside the ZIP archive.

The SpreadStreamProcessing simplifies the generation of those XML files, and if you don't have information on how the XLSX file is organized, there is no way to know when you can send something over the network. I am explaining all that background information so you can understand why you can't achieve what you are trying to. It is obvious that in an Excel document the largest chunks of information are stored in the Worksheets, but sadly each worksheet is a separate XML file inside the ZIP archive, so the only possible way to split the document while exporting it and send it through the network is by sending the data from the output stream each time a IWorksheetExporter is disposed of. That is when the XML file is written in the ZIP archive.

Martin Herløv

commented on 28 Nov 2023, 02:05 PM

Thanks this make totally sense.

I will keep the current code using the SpreadStreamProcessing. It's 30% faster than using CsvHelper.Excel api.

Thanks for super service

Product Bundles

DevCraft

Web

Mobile

Document Management

Desktop

Reporting & Mocking

Automated Testing

CMS

UI/UX Tools

Debugging

Free Tools

Most memory efficient way to convert csv to excel in .NET Core API

1 Answer, 1 is accepted