Quietly launching in the last week of June was a major update to Papers Past, New Zealand's largest free online digitised resource. Papers Past, featuring newspapers from the nineteenth and early twentieth centuries, is now bigger, faster and fully text searchable thanks to Optical Character Recognition (OCR) technology.
Papers past was first launched in 2001 with quarter of a million digitised pages of New Zealand historic newspapers. It now has five times that many pages (1.3 million) covering 52 different New Zealand publications from as far back as 1839. In fact, Papers Past currently has more digitised pages online than Chronicling America, an equivalent project in the U.S. for historic newspapers.
NZ Truth is one of 52 publications now available on Papers Past
Old newspapers make for a really interesting resource to digitise, and it's perhaps not surprising that they have been one of the earliest digitisation efforts around the world.
Because of the poor quality of the paper they were printed on, newspapers are prime candidates for copying to make surrogates to access instead of fragile originals. Without microfilm and now digitisation, many old newspapers would simply not be available to view at all.
Many New Zealand newspapers have been microfilmed or are being microfilmed for preservation purposes. Digitising microfilm is a lot simpler and cheaper than dealing with the paper originals. It is only recently that directly digitising high volume large format materials like newspapers has become possible. The National Archives and Records Administration (NARA) in the United States has recently invested in 10 large format scanners at a cost of around NZ$250,000 apiece, but it's likely to be some time before that kind of technology is widely available and affordable. A lot of preparation work to sort, unbend and repair old newspapers is also required - this work has already been undertaken where the paper has been microfilmed.
From a copyright perspective, old newspapers are often less complicated than other resources, as a large proportion of the contributions of articles from the nineteenth and early twentieth centuries are unattributed. In New Zealand, where authors are unknown after reasonable enquiry, the copyright term for published works is only 50 years.
Being text-based and complex in structure, newspapers lend themselves extremely well to full text searching. OCR is the quickest way to achieve this, and while by no means perfect, it can get very good results - certainly far better than scrolling through pages of microfilm. As a way of trying to improve on OCR results, the National Library of Australia is testing out a very cool newspaper service that allows users to easily correct and tag newspaper content in a way that has search results getting better over time.
Newspapers are likely to continue to be a highly prized target for digitisation, both public domain editions and more recent in copyright ones. Engaging with newspaper publishers and encouraging them to open up their more recent back catalogue for digitisation and public access is a challenge that still lies ahead of us. In the meantime, there has already been considerable interest on our Make it Digital voting tool for a variety of newspaper editions to be digitised. We've invited the Papers Past manager to join in and post newspaper titles so you can have a say in what you think they should be digitising next. Get voting now!