Product Bundles
DevCraft
All Telerik .NET tools and Kendo UI JavaScript components in one package. Now enhanced with:
Web
Mobile
Document Management
Desktop
Reporting
Testing & Mocking
CMS
UI/UX Tools
Debugging
Free Tools
Hi,
I want to be able extract embedded images and text from a pdf. Is this even possible?
If so, can you point me in the right direction?
Thanks... Ed
Hi Ed,
I wanted to ask you what is your application type? If you are using Net Framework this can be achieved with the following approach:
static void Main(string[] args) { var pdfProvider = new PdfFormatProvider(); var docuemnt = pdfProvider.Import(File.ReadAllBytes(@"..\..\sampledoc.pdf")); int count = 0; foreach (var page in docuemnt.Pages) { foreach (var item in page.Content) { if (item is TextFragment) { var text = ((TextFragment)item).Text; Console.WriteLine(text); } if (item is Image) { var image = (Image)item; BitmapSource source = image.ImageSource.GetBitmapSource(); SaveClipboardImageToFile(@"C:\my_temp\image" + count++ + ".png", source); } } } } public static void SaveClipboardImageToFile(string filePath, BitmapSource image) { using (var fileStream = new FileStream(filePath, FileMode.Create)) { BitmapEncoder encoder = new PngBitmapEncoder(); encoder.Frames.Add(BitmapFrame.Create(image)); encoder.Save(fileStream); } }
Regards, Dimitar Progress Telerik
Love the Telerik and Kendo UI products and believe more people should try them? Invite a fellow developer to become a Progress customer and each of you can get a $50 Amazon gift voucher.
Looks almost exactly what the doctor ordered! Many thanks. I will play with it and get back to you.
Thanks again ... Ed