Continue to Site

Welcome to our site!

Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

  • Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.

Need help with VB Script to detect bad PDF files

Status
Not open for further replies.

lusolth

New Member
I have an OCR system that is fed files by a VB Script. The OCR program is locking up when it gets a bad PDF. I have tried to open the PDF, and sure enough get an error message that it is a bad file. Adobe Reader says "File is damaged and could not be repaired." I know how many pages are supposed to in each file, and I wrote a quick script to count pages, and unfortunately it returned the expected page count. So I suspect there some something like a missing EOF flag or ????? I don't need to fix the files, just identify them so I can put them out of the queue to the OCR machine. Any method that will identify a bad file would be greatly appreciated. VB Script is the only language I can deal with, but my son could handle a test written in PHP, Java, Java Script, or Python.
 
You may use long guide below or refer to following powerful resources as you wish, but both variants must aid you...


https://onlinefilerepair.com/en/pdf-repair-online.html PDF On-line repair service
https://www.instructables.com/answers/How-to-repair-corrupted-PDF-files/
https://www.answerbag.com/q_view/3302387


Use Preflight to check the PDF for syntax issues (requires Acrobat Pro or above - and I haven't found out what to do with this information):

Click Advanced -> Preflight... in Adobe Acrobat Pro

Expand PDF Analysis, select Report PDF syntax issues, and click Execute.


Remove the Tags in the document (requires Acrobat Pro or above) :

Right-click the Navigation Pane and click Tags

Click the root of the Tags tree

Right-click Tags and click Delete Tag

Save


Reduce File Size (Acrobat Standard or above):

Click Document -> Reduce File Size...

(you can adjust the Compatibility level of the file)

Click OK

Save as a new file name and click Save


Use the PDF Optimizer to remove features (one by one or all then all minus 1, then all minus 2, etc) (Adobe Acrobat Pro or above):

Click Advanced -> PDF Optimizer...

Click the checkbox next to unnecessary settings or settings you would like to try to remove for troubleshooting purposes

Click OK

Save as a new file name and click Save


Try resaving the document using Nitro PDF Reader (this has worked for me many times):

Open the document in Nitro PDF Reader

Click File -> Save As -> PDF Document

Save the file using a different name and click Save


Delete bad pages (either one by one or as a group, if necessary) (Adobe Acrobat Standard or above):

Open the problem PDF in Adobe Acrobat

Click the tool "Click to show one page at a time"

Page through the pages in the PDF, note any page that gives you an error message.

In the Navigation Pane, click Pages. Select the problem pages, right-click, and click Delete Pages...

Click File -> Save As, save the file using a different name, and click Save


Export as PostScript without comments, export comments, convert PostScript to PDF, and import comments (you lose bookmarks using this method) (Adobe Acrobat Standard or above):

Click Comments -> Export Comments to Data File..., give it a name, and click Save

Click File -> Export -> PostScript -> PostScript

Click Settings and uncheck Include Comments, click OK, give it a different name, and click Save.

Open the PS file in Adobe Acrobat.

Click Comments -> Import Comments..., select Adobe FDF File from the file types drop-down, click the FDF file, and click Select.

Click File->Save.
 
Status
Not open for further replies.

New Articles From Microcontroller Tips

Back
Top