ImageEn for Delphi and C++ Builder ImageEn for Delphi and C++ Builder

 

ImageEn Forum
Profile    Join    Active Topics    Forum FAQ    Search this forumSearch
 All Forums
 ImageEn Library for Delphi, C++ and .Net
 ImageEn and IEvolution Support Forum
 Creating PDF+OCR

Note: You must be registered in order to post a reply.
To register, click here. Registration is FREE!

View 
UserName:
Password:
Format  Bold Italicized Underline  Align Left Centered Align Right  Horizontal Rule  Insert Hyperlink   Browse for an image to attach to your post Browse for a zip to attach to your post Insert Code  Insert Quote Insert List
   
Message 

 

Emoji
Smile [:)] Big Smile [:D] Cool [8D] Blush [:I]
Tongue [:P] Evil [):] Wink [;)] Black Eye [B)]
Frown [:(] Shocked [:0] Angry [:(!] Sleepy [|)]
Kisses [:X] Approve [^] Disapprove [V] Question [?]

 
Check here to subscribe to this topic.
   

T O P I C    R E V I E W
Merlin Posted - Mar 07 2025 : 06:51:39
Hello,

Is it possible to subsequently convert a PDF into a PDF with OCR content or do the pages of the PDF file have to be exported and then reassembled using TIEVisionSearchablePDFGenerator?

An example program would be great :)

Thanx
18   L A T E S T    R E P L I E S    (Newest First)
xequte Posted - Jun 09 2026 : 00:50:19
Hi

I don't think there is a requirement for DevExpress here. It should all be possible within ImageEn:

http://www.imageen.com/help/TIEPdfViewer.html

The code will be something like (converting your code):
function PdfExtractDocumentText(const AFileName: string;
                                const AMaxPages: integer = -1;
                                ALog: TLogCallback = nil): TArray<string>;
var LPDF: TIEPdfViewer;
    i, LCount: integer;
    LPageText: string;
    LOCR: TIEVisionOCR;
    LieBitmap: TIEBitmap;
begin
  LPDF := TIEPdfViewer.Create();
  try
    LPDF.LoadFromFile(AFileName);
    if AMaxPages = -1 then
      LCount := LPDF.PageCount
    else
      LCount := AMaxPages;
    SetLength(result, LCount);
    for i := 0 to LCount-1 do begin
      LPageText := LPDF.GetText();
      if LPageText.IsEmpty then
      try
        if Assigned(ALog) then
          ALog('Seite ohne Text, aber mit Bild(ern). OCR-Texterkennung ausführen.');
        LieBitmap := TIEBitmap.Create;
        try
          LPDF.DrawTo( LieBitmap );
          LOCR := IEVisionLib.createOCR(IEOCRLanguageList[OCR_English_language].Code);
          LPageText := LOCR.recognize(LIEBitmap.GetIEVisionImage, IEVisionRect(0, 0, 0, 0)).c_str();
        finally
          LieBitmap.Free;
        end;
      except
        on E:Exception do
          if Assigned(ALog) then
            ALog('Feher beim Ausführen der OCR: ' + E.Message);
      end;
      result[i] := LPageText;
    end;
  finally
    LPDF.Free;
  end;
end;


Nigel
Xequte Software
www.imageen.com
Harald Posted - Jun 06 2026 : 07:31:31
For now, I've found a solution that works for me. The PDF is read using a DevExpress component, exported as images, and those images are then processed using IEVision OCR. The text isn't yet written back into the PDF, but this is sufficient for my purposes regarding the subsequent AI analysis of the documents.

function PdfExtractDocumentText(const AFileName: string;
                                const AMaxPages: integer = -1;
                                ALog: TLogCallback = nil): TArray<string>;
var LPDF: TdxPDFDocument;
    i, LCount: integer;
    LPageText: string;
    LOCR: TIEVisionOCR;
    LdxImage: TdxSmartImage;
    LieBitmap: TIEBitmap;
begin
  LPDF := TdxPDFDocument.Create;
  try
    LPDF.LoadFromFile(AFileName);
    if AMaxPages = -1 then
      LCount := LPDF.PageCount
    else
      LCount := AMaxPages;
    SetLength(result, LCount);
    for i:=0 to LCount-1 do begin
      LPageText := LPDF.PageInfo[i].Text;
      if LPageText.IsEmpty and (LPDF.PageInfo[i].Images.Count > 0) then
      try
        if Assigned(ALog) then
          ALog('Seite ohne Text, aber mit Bild(ern). OCR-Texterkennung ausführen.');
        LdxImage  := TdxSmartImage.Create;
        LieBitmap := TIEBitmap.Create;
        try
          if not dxPDFDocumentExportToImageEx(LPDF, i, 1, LdxImage) then
            Continue;
          LieBitmap.Assign(LdxImage.GetAsBitmap);
          LOCR := IEVisionLib.createOCR(IEOCRLanguageList[OCR_English_language].Code);
          LPageText := LOCR.recognize(LIEBitmap.GetIEVisionImage, IEVisionRect(0, 0, 0, 0)).c_str();
        finally
          LdxImage.Free;
          LieBitmap.Free;
        end;
      except
        on E:Exception do
          if Assigned(ALog) then
            ALog('Feher beim Ausführen der OCR: ' + E.Message);
      end;
      result[i] := LPageText;
    end;
  finally
    LPDF.Free;
  end;
end;


Document Management https://www.officemanager.de/en
Harald Posted - Jun 05 2026 : 03:37:02
I would like to run this function in a background thread. I have scanned PDF and want to convert them into searchable PDF.
Is IEVision capable of multithreading in this scenario, where multiple threads are converting PDFs simultaneously?
What is the best way to read the images from the PDF if I don’t want to use a visual TImageEnMView in the thread?
Thank you very much and best regards, Harald

Document Management http://www.officemanager.de/en
xequte Posted - May 31 2026 : 16:48:03
Sorry, the information is not returned by that process.

Nigel
Xequte Software
www.imageen.com
AndNit Posted - May 31 2026 : 16:45:06
Good evening

How can I measure the OCR confidence score within this process?

pdfGen := IEVisionLib.createSearchablePDFGenerator('./', IEOCRLanguageList[OCR_Portuguese_Language].Code);

pdfGen.beginDocument(PAnsiChar(AnsiString(CaminhoPDF)), PAnsiChar(AnsiString('title')));

for i := 0 to imgMPdf.ImageCount - 1 do
begin
  imgMPdf.SelectedImage := i; // Show the image being processed
  pdfGen.addPage(imgMPdf.IEBitmap.GetIEVisionImage());
end;

pdfGen.endDocument();
AndNit Posted - May 11 2026 : 11:50:30
I appreciate the information from the forum; I implemented the metadata processing and conversion to PDF/A using Ghostscript.
xequte Posted - May 10 2026 : 17:33:44
Hi

No, it is not PDF/A, you would need to use a post converter for that. I'm afraid I don't know what third party tool would be best for that.

Nigel
Xequte Software
www.imageen.com
AndNit Posted - May 10 2026 : 17:21:34
Okay, thank you for your reply.

The generated PDF is not a PDF/A; if I'm not mistaken, ImageEnter doesn't generate PDF/A, correct?

Do you suggest any way to include the metadata and convert the PDF to PDF/A after creating it?

Thank you.
xequte Posted - May 09 2026 : 17:04:27
Unfortunately PDFium does not support meta-data at this time.



Nigel
Xequte Software
www.imageen.com
AndNit Posted - May 09 2026 : 14:35:17
Perfect, everything worked out, thank you Nigel.

Now, how do I create this PDF with all the metadata?

PDF_Title
PDF_Author
PDF_Subject
PDF_Keywords
PDF_Creator
PDF_Producer,
Etc
xequte Posted - May 09 2026 : 07:59:02
Hi

TIEVisionOCREngine should generally just be left as ievOCRDefault.

The main thing is which language files you use:

- LTSM - Standard
- LTSM - Slow, Highest Quality
- LTSM + Legacy


Naturally, the second one should generally give the best results.

Nigel
Xequte Software
www.imageen.com
xequte Posted - May 09 2026 : 02:37:52
Did you try using the JPEG format (passing the ievPDFImgFmt_JPEG parameter to addPage):

http://www.imageen.com/help/TIEVisionSearchablePDFGenerator.addPage.html

Nigel
Xequte Software
www.imageen.com
AndNit Posted - May 07 2026 : 22:04:58
I'd like to take this opportunity to ask which OCR is most accurate for all situations... running text, tables, text under images, etc...

TIEVisionOCREngine::ievOCRFAST
AndNit Posted - May 07 2026 : 22:00:28
First, I'd like to thank you for the explanation; it's very simple and I've already implemented it in my code. However, I noticed that the file with OCR is MUCH larger. How can I solve this?

Here's my code. Thank you.

   for i := 0 to imgMPdf.MIO.ParamsCount - 1 do begin
      imgMPdf.MIO.Params[i].PDF_PaperSize   := iepAuto;
      imgMPdf.MIO.Params[i].PDF_Compression := ioPDF_JPEG; // or ioPDF_G4FAX for monochrome images
   end;
   IEGlobalSettings().PDFEngine := ieenLegacy;
   imgMPdf.MIO.SaveToFilePDF(CaminhoPDF);
   IEGlobalSettings().PDFEngine := ieenAuto;
   //
   pdfGen := IEVisionLib.createSearchablePDFGenerator('./', IEOCRLanguageList[OCR_Portuguese_Language].Code);
   pdfGen.beginDocument(PAnsiChar(AnsiString(CaminhoPDF)), PAnsiChar(AnsiString('title')));
   for i := 0 to imgMPdf.ImageCount - 1 do
   begin
   imgMPdf.SelectedImage := i; // Show the image being processed
   pdfGen.addPage(imgMPdf.IEBitmap.GetIEVisionImage());
   end;
   pdfGen.endDocument();
Merlin Posted - Mar 12 2025 : 08:20:53
Hello Nigel,


thank you, I will give it a try :)
xequte Posted - Mar 10 2025 : 19:46:39
Why not do it as follows:


// Convert "in.pdf" (pages are images) to "out.pdf" (text in pages now selectable)
ImageEnMView1.MIO.LoadFromFile( 'D:\in.pdf' );
pdfGen := IEVisionLib.createSearchablePDFGenerator('./', IEOCRLanguageList[OCR_English_language].Code);
pdfGen.beginDocument(PAnsiChar(AnsiString(langPath + 'out')), PAnsiChar(AnsiString('title')));
for i := 0 to ImageEnMView1.ImageCount - 1 do
begin
  ImageEnMView1.SelectedImage := i; // Show the image being processed
  pdfGen.addPage(ImageEnMView1.IEBitmap.GetIEVisionImage());
end;
pdfGen.endDocument();


You will need to add iepdf32.dll to your EXE folder.

Nigel
Xequte Software
www.imageen.com
Merlin Posted - Mar 10 2025 : 04:46:37
Hello

yes, I want to apply text recognition to a pdf file that does not contain any text. To do this, the file must be loaded, the individual pages exported as images and then the text content must be determined with the text recognition via pdfGen : TIEVisionSearchablePDFGenerator.

Hmm, maybe there's a small example available if I do not have to use external libraries for the export of the individual PDF pages.

Thanks
xequte Posted - Mar 07 2025 : 19:07:06
Sorry, do you mean that you have a PDF that contains images of text (not text itself), and you want to convert it into a PDF where the text is available (text has been OCR'ed)?

Nigel
Xequte Software
www.imageen.com