Preserving digital content
This page includes guidance for digital preservation projects in museums, libraries, archives, and other organisations in Aotearoa New Zealand.
Understanding digital preservation
Making sure you preserve your digital content for future use is an important final step in your digitisation project. We present here some principles for even the smallest digitisation project, as well as guidance for larger-scale institutional digitisation work.
As a component of digital content management, digital preservation aims to keep digital content usable in the long term. Unlike non-digital items where loss can be visible, gradual, and partial, digital content and the physical media it is stored on can very quickly be lost completely without any visible signs of decay to indicate a problem. The volume of digital content being created today is also incredibly large, making it potentially difficult to assess and prioritise. This creates a challenge for digital preservation practice.
Back-up strategies are a vital first step to avoiding short-term content loss. In the longer term, storage for preservation involves planning for archival copies of your content to be migrated and contingencies for transfer to a new owner, should your project, organisation, or service face closure. This is where your choices of appropriate formats, descriptions, collections policy, and rights statements really come into their own.
Following the basics
Any organisation or person responsible for keeping digital content for any reasonable length of time needs to follow the basic steps involved in preventing information loss. Even without access to expensive archival software or repositories, there are three practices to always follow where ever digital content is kept.
1. Make regular back-up copies
Despite the claims of some manufacturers, no digital storage media currently exists that can be considered safe for long-term storage. All storage media, such as hard disk drives or optical disks, are prone to failure or corruption over time. In the short term, the way to manage this risk is through back-up copies. A simple guide for back-up is the one used by the American Society of Media Photographers, known as the 3-2-1 backup rule:
keep 3 copies of any important file, the primary (or master) file and two back-ups
keep 2 of the copies on different media types, or at least use physically separate media and brands
keep 1 other back-up copy stored off-site.
New back-up copies should be made whenever significant changes or additions are made to your content. As a guide, a significant change is likely to be one where changes or additions are not easily re-created. Having a regular back-up schedule or routine is the simplest way to minimise disruption and information loss caused by storage failure.
2. Use file integrity checking for archived copies
If you have back-up copies of digital content that does not change over time, such as archived files, it is important to be able to verify the integrity and completeness of both the master file and the copies. The use of a file integrity check, such as generating MD5 checksums, is a good way of gaining a level of assurance about the integrity of your content. MD5 checksums generate the equivalent of a digital fingerprint that you can match between copies or over different time periods to check that a digital file has not been modified or corrupted. There are many software programs available cheaply or freely, such as FastSum, that will generate a checksum code for each file or folder and which can be saved for later checking.
If you are copying to optical media using a CD or DVD burner, in addition to a checksum, always finalise the disk and verify the data you have created. Avoid using re-writable CDs and DVDs as they are significantly less reliable than CD and DVD ROMs.
3. Have a workflow for managing your content
Back-up strategies can fail if your digital workflow does not address the different stages of the digital content lifecycle. Use the good practice principles identified in our Managing Digital Content guide to establish a process for managing your digital content from the point of creation through to digital archiving. Good inventories, file-naming schemes, and having someone responsible for entering information and undertaking back-ups all minimise the risk of unintended loss of your digital content.
Refreshing and migration strategies
Storage media generally don't last as well as they claim to. Hard disk drives may be warranted to last between 12 months and 5 years under normal operation, but some fail very quickly, with the cost being permanent loss of information. Optical disks such as CDs and DVDs can easily lose information due to improper handling and storage, even if they are rated for 10, 20 or 100 years. Those created by standard computer burners also have a shorter life than commercially made disks, as they are more prone to oxidisation or dye fading. Solid State Drives (SSDs), while not having moving parts, are prone to wear and like hard disk drives may lose data through cosmic radiation.
The only current solution to avoiding data corruption caused by storage media is to periodically refresh your media. This means copying your digital content to fresh storage media that is generally of the same nature as the previous media e.g. from one hard disk drive to another. The presence of errors arising from regular file integrity checking on your archived content provides an indicator that media refreshing is required.
Over time, a bigger issue than media failure is technology obsolescence. Software and hardware systems continue to rapidly change and evolve, making it harder to keep using the same media. This may require periodic migration to a new storage type and software environment. The migration from analogue tape to digital tape or hard disk storage is one of the more significant ones in recent years, as media stored on magnetic tape from the 1950s through to the 1990s has suffered from both obsolescence and physical decay.
A migration strategy is something to carefully consider and plan beforehand as issues may arise around the need to:
format-shift content to be readable on new software
develop software emulation or 'virtual machines' to run older software
preserve obsolete hardware in order to keep older content accessible.
The best way to manage the issues you are likely to face is to seek advice from professionals or experts in content and systems migration. Attention to using open standards and formats for content creation along with widely used storage media types will also help limit your exposure to obsolescence.
Digital continuity and contigency planning
If the digital content you are creating or collecting is important or worth keeping, chances are that it will need to be kept for longer than the careers of most individuals in your organisation. Thinking and planning ahead to what future staff or volunteers need to know about your digital content will help ensure it remains usable and accessible for as long as it is needed.
Things to consider include:
having written policies and workflow processes for managing your digital content, including processes for back-ups, media refreshing, migration, and disaster recovery
identifying long-term solutions for securing storing digital content, including possible provisions for transfer to external repositories for archiving
making provisions for what will happen in the event of an organisation wind-up, including ensuring continued ownership of the content.
Human error, poor record-keeping, and accidental loss are significant contributors to loss of content or information, and are as likely to occur as storage media corruption or technology obsolescence.
Trusted digital repositories
It seems inevitable that collections of the future will be increasingly digital. Blogs, websites, digital publishing, photographs, and video are all born-digital content that may have no analogue original or copy to fall back on. This makes the task of preserving digital content in the long term a serious challenge for cultural institutions and research bodies. A key solution to this challenge is to develop and build trusted digital repositories that can hold indefinitely a diverse range of cultural and research knowledge in an open, accessible formats. The OCLC has described the characteristics of trusted digital repositories as needing:
compliance with the Reference Model for an Open Archival Information System (OAIS)
administrative responsibility
organisational viability
financial sustainability
technological and procedural suitability
system security
procedural accountability.
If your digital collection continues to grow, a possible outcome is that your organisation may need to consider building its own repository. By good planning you can be aware of the kinds of capacity you may need to develop ahead of time. However, not every collecting organisation is likely to be capable of building its own repository with the kind of characteristics OCLC recommends. It may instead be useful to develop relationships with regional, national, or sector-based digital repositories, and to consider whether your long-term archival content is able to be managed by them.
Formal policies for digital preservation
If you are working for an institution that needs a company-wide digital preservation policy, look out for preservation guidance that follows good practice by addressing the following areas:
maintenance of descriptions of digital content, particularly relating to creation information and change history
backing up digital content through duplication onto separate storage media
refreshing storage media by periodic copying of digital content to new media of the same type
migration of digital content to new hardware, software, and storage environments to avoid obsolescence
maintenance of access by making access copies or through emulation of software and hardware environments
organisational continuity through provisions for ownership of digital content in the event of organisational change or transfer of responsibilities.
Strategies such as migration and emulation may require specialised resources and staff that are not available to smaller organisations. In such cases it may be appropriate to focus on descriptions, back-up, and refreshing as primary strategies until items are able to be transferred to a larger archive or specialist resources are found.
Minimising the risk of loss is central to any preservation strategy. Preventing informational loss for digital content can be difficult, as preservation strategies when poorly executed can be the cause of loss (e.g. reformatting using lossy compression). It is important to identify early on for instance whether a digital record also has an analogue form, or whether source versions or masters are held or owned by someone else. Understanding which digital content is irreplaceable and which is a mere copy is key to managing risk and preventing loss. A basic knowledge of records or collection management, as covered in our Managing digital content guide, will greatly assist.
Disclaimer: We believe the information in this guide on this page is accurate, but it does not constitute legal advice and DigitalNZ is not responsible for loss or damage caused as a result of following it.
Further reading and useful links
File integrity software
There are many free or cheap software programmes available on the internet for file integrity checking using the open-source MD5 checksum. Some that you may want to try are:
Fastsum - for Windows platforms
ImageIngester - for Windows or Mac
Exactfile - for Windows
Most versions of Linux have an MD5 checker, md5sum, included in their distro. You can find out more about the Ubuntu version here.
New Zealand resources
Best practice guidance on digital storage and preservation Archives New Zealand website
Digital Preservation Statement Archives New Zealand, 2020
Digital Preservation at the National Library of New Zealand
National Library of New Zealand Digital preservation team - research papers and presentations
Institutional Repositories in New Zealand: Comparing Institutional Strategies for Digital Preservation and Discovery - Brenda Chawner, Rowena Cullen, 2008
The Future of the Past, the Future in the Present and Beyond Jocelyn Cummings & Ingrid Mason, LIANZA, 2004
Public Sector Readiness for Digital Preservation Dr Daniel G. Dorner, Dr Chern Li Liew and Mark Crookston, Victoria University of Wellington, 2006
Overseas resources
Care and Handling of CDs and DVDs - a Guide for Librarians and Archivists - Fred R Byers, National Institute of Standards and Technology, U.S.A.
UKOLN (UK Office for Library and Information Networking) - a resource centre jointly funded by MLA and JISC in the U.K.
Preservation Management of Digital Material Handbook - Digital Preservation Coalition Handbook
Preserving Access to Digital Information Resource Gateway - a preservation portal for international resources provided by the National Library of Australia
Digital Preservation: Managing Digital Collections - a page of resources about initiatives and innovative projects relating to Digital Preservation
Changing Trains in Wigan: Digital Preservation and the Future of Scholarship Dr Seamus Ross, National Preservation Office, U.K., 2000