I have a number of documents in pdf format that have been created from scans and ocr's . There are Vietnam era and were originally typed on thin paper and are not the most legible. Can your software enhance these pdf files to them more legible ??


Unfortunately improving the legibility of digital scans is most easily done at the time the scan. Depending on the type of issue, enlargement by increasing resolution settings, placing blank paper behind the page, or adjusting the image settings of the scanner or camera to increase contrast can all assist. Quality checks on each scan are also important, particularly if the original may become hard to access again. If you have access to professional desktop editing software such as Photoshop, it is possible to import an edit a PDF page by page as JPGs to increase legibility. However the quality of the image can suffer in the conversion process and if you have many pages it can be very time consuming. Another option we have come across is to print out all the pages, photocopy them to increase contrast and re-scan the resulting copies. While some OCR (optical character recognition) software can cope better with bleed-through and other page markings, almost all of the scanned output requires some form of re-formatting and manual correction to produce completely accurate text. So unfortunately there are no easy solutions there. There are some professional services that will do the correction for you, but they are likely to be an expensive option.

As well as professional services some memory institutions offer the ability to crowdsource text correction from rough OCR. A good example is the Australian Newspapers project.

The above mentioned OCR correction project is part of Trove:

