Preserving digital content

Making sure you preserve your digital content for future use is an important final step in your digitisation project. We present here some principles for even the smallest digitisation project, as well as guidance for larger-scale institutional digitisation work.

Make it Digital has a detailed Preserving Digital Content guide: Preservation resources

Understanding digital preservation

As a component of digital content management, digital preservation aims to keep digital content usable in the long term. Unlike non-digital items where loss can be visible, gradual, and partial, digital content and the physical media it is stored on can very quickly be lost completely without any visible signs of decay to indicate a problem. The volume of digital content being created today is also incredibly large, making it potentially difficult to assess and prioritise. This creates a challenge for digital preservation practice.

Back-up strategies are a vital first step to avoiding short-term content loss. In the longer term, storage for preservation involves planning for archival copies of your content to be migrated and contingencies for transfer to a new owner, should your project, organisation, or service face closure. This is where your choices of appropriate formats, descriptions, collections policy, and rights statements really come into their own.

Following the basics

Any organisation or person responsible for keeping digital content for any reasonable length of time needs to follow the basic steps involved in preventing information loss. Even without access to expensive archival software or repositories, there are three practices to always follow where ever digital content is kept.

1. Make regular back-up copies

Despite the claims of some manufacturers, no digital storage media currently exists that can be considered safe for long-term storage. All storage media, such as hard disk drives or optical disks, are prone to failure or corruption over time. In the short term, the way to manage this risk is through back-up copies. A simple guide for back-up is the one used by the American Society of Media Photographers, known as the 3-2-1 backup rule:

  • keep 3 copies of any important file, the primary (or master) file and two back-ups

  • keep 2 of the copies on different media types, or at least use physically separate media and brands

  • keep 1 other back-up copy stored off-site.

New back-up copies should be made whenever significant changes or additions are made to your content. As a guide, a significant change is likely to be one where changes or additions are not easily re-created. Having a regular back-up schedule or routine is the simplest way to minimise disruption and information loss caused by storage failure.

2. Use file integrity checking for archived copies

If you have back-up copies of digital content that does not change over time, such as archived files, it is important to be able to verify the integrity and completeness of both the master file and the copies. The use of a file integrity check, such as generating MD5 checksums, is a good way of gaining a level of assurance about the integrity of your content. MD5 checksums generate the equivalent of a digital fingerprint that you can match between copies or over different time periods to check that a digital file has not been modified or corrupted. There are many software programs available cheaply or freely, such as FastSum, that will generate a checksum code for each file or folder and which can be saved for later checking.

If you are copying to optical media using a CD or DVD burner, in addition to a checksum, always finalise the disk and verify the data you have created. Avoid using re-writable CDs and DVDs as they are significantly less reliable than CD and DVD ROMs.

3. Have a workflow for managing your content

Back-up strategies can fail if your digital workflow does not address the different stages of the digital content lifecycle. Use the good practice principles identified in our Managing Digital Content guide to establish a process for managing your digital content from the point of creation through to digital archiving. Good inventories, file-naming schemes, and having someone responsible for entering information and undertaking back-ups all minimise the risk of unintended loss of your digital content.

Refreshing and migration strategies

Storage media generally don't last as well as they claim to. Hard disk drives may be warranted to last between 12 months and 5 years under normal operation, but some fail very quickly, with the cost being permanent loss of information. Optical disks such as CDs and DVDs can easily lose information due to improper handling and storage, even if they are rated for 10, 20 or 100 years. Those created by standard computer burners also have a shorter life than commercially made disks, as they are more prone to oxidisation or dye fading. Solid State Drives (SSDs), while not having moving parts, are prone to wear and like hard disk drives may lose data through cosmic radiation.

The only current solution to avoiding data corruption caused by storage media is to periodically refresh your media. This means copying your digital content to fresh storage media that is generally of the same nature as the previous media e.g. from one hard disk drive to another. The presence of errors arising from regular file integrity checking on your archived content provides an indicator that media refreshing is required.

Over time, a bigger issue than media failure is technology obsolescence. Software and hardware systems continue to rapidly change and evolve, making it harder to keep using the same media. This may require periodic migration to a new storage type and software environment. The migration from analogue tape to digital tape or hard disk storage is one of the more significant ones in recent years, as media stored on magnetic tape from the 1950s through to the 1990s has suffered from both obsolescence and physical decay.

A migration strategy is something to carefully consider and plan beforehand as issues may arise around the need to:

  • format-shift content to be readable on new software

  • develop software emulation or 'virtual machines' to run older software

  • preserve obsolete hardware in order to keep older content accessible.

The best way to manage the issues you are likely to face is to seek advice from professionals or experts in content and systems migration. Attention to using open standards and formats for content creation along with widely used storage media types will also help limit your exposure to obsolescence.

Digital continuity and contigency planning

If the digital content you are creating or collecting is important or worth keeping, chances are that it will need to be kept for longer than the careers of most individuals in your organisation. Thinking and planning ahead to what future staff or volunteers need to know about your digital content will help ensure it remains usable and accessible for as long as it is needed.

Things to consider include:

  • having written policies and workflow processes for managing your digital content, including processes for back-ups, media refreshing, migration, and disaster recovery

  • identifying long-term solutions for securing storing digital content, including possible provisions for transfer to external repositories for archiving

  • making provisions for what will happen in the event of an organisation wind-up, including ensuring continued ownership of the content.

Human error, poor record-keeping, and accidental loss are significant contributors to loss of content or information, and are as likely to occur as storage media corruption or technology obsolescence.

Trusted digital repositories

It seems inevitable that collections of the future will be increasingly digital. Blogs, websites, digital publishing, photographs, and video are all born-digital content that may have no analogue original or copy to fall back on. This makes the task of preserving digital content in the long term a serious challenge for cultural institutions and research bodies. A key solution to this challenge is to develop and build trusted digital repositories that can hold indefinitely a diverse range of cultural and research knowledge in an open, accessible formats. The OCLC (external link) has described the characteristics of trusted digital repositories as needing:

  • compliance with the Reference Model for an Open Archival Information System (OAIS)

  • administrative responsibility

  • organisational viability

  • financial sustainability

  • technological and procedural suitability

  • system security

  • procedural accountability.

If your digital collection continues to grow, a possible outcome is that your organisation may need to consider building its own repository. By good planning you can be aware of the kinds of capacity you may need to develop ahead of time. However, not every collecting organisation is likely to be capable of building its own repository with the kind of characteristics OCLC recommends. It may instead be useful to develop relationships with regional, national, or sector-based digital repositories, and to consider whether your long-term archival content is able to be managed by them.

Formal policies for digital preservation

If you are working for an institution that needs a company-wide digital preservation policy, look out for preservation guidance that follows good practice by addressing the following areas:

  • maintenance of descriptions of digital content, particularly relating to creation information and change history

  • backing up digital content through duplication onto separate storage media

  • refreshing storage media by periodic copying of digital content to new media of the same type

  • migration of digital content to new hardware, software, and storage environments to avoid obsolescence

  • maintenance of access by making access copies or through emulation of software and hardware environments

  • organisational continuity through provisions for ownership of digital content in the event of organisational change or transfer of responsibilities.

Strategies such as migration and emulation may require specialised resources and staff that are not available to smaller organisations. In such cases it may be appropriate to focus on descriptions, back-up, and refreshing as primary strategies until items are able to be transferred to a larger archive or specialist resources are found.

Minimising the risk of loss is central to any preservation strategy. Preventing informational loss for digital content can be difficult, as preservation strategies when poorly executed can be the cause of loss (e.g. reformatting using lossy compression). It is important to identify early on for instance whether a digital record also has an analogue form, or whether source versions or masters are held or owned by someone else. Understanding which digital content is irreplaceable and which is a mere copy is key to managing risk and preventing loss. A basic knowledge of records or collection management, as covered in our Managing digital content guide, will greatly assist.