What's in a (file)name?

Posted on 10 August 2009 by Thomasin

In the news recently was the revelation that researchers in Japan are working on a digital 'Rosetta stone' to back up your digital data for 1000 years.  If all that content however is poorly named and organised, the nightmare won't be hardware failure but the inability to find anything - and it won't take 1000 years to be a problem.  It has been predicted that by 2011 over 60 billion digital photos will be taken annually.  That's a lot of content. 

The moment you have more items of digital content than can be scanned with the eye in a few seconds, discovery becomes limited by how well your digital content is organised.  One of the easiest practices for keeping content usable and accessible is having a consistent naming scheme for filenames and folders.  Yet like many good habits, it's one of the most common practices  people fail to follow.

Arguably it all started around the introduction of Windows 95.  For the first time in many computer users' experience, long file names up to 255 characters meant you could let rip with what you named digital files. 'SMITHLET.DOC' could become 'Letter to J Smith re postage costs.doc'.  Microsoft Office 95 even helpfully suggested you name your new file 'Document1' and your new folders 'New Folder', a pattern carried on with digital scanners and cameras.

Camera filenames

As digital files such as MP3s grew in popularity, many software programmes took advantage of long file names and applied default names using conventions such as 'Track#_Trackname_Artist_Albumname'.  On the face of it, this seems like a really good naming convention, until you get to a track such as Pink Floyd's  'Several Species of Small Furry Animals Gathered Together in a Cave and Grooving with a Pict'.  Bury this a few folders into a directory tree and you will get some software, including some CD-burning software, that simply can't read the file without an error.  For that software, the 255 character limit includes the folder path of the file, not just the filename. There are three rules of thumb to ensuring you content is organised consistently.  If you follow them, you will be taking a significant step to keeping your digital content discoverable over time.

Rule 1: Have a simple and unique name for every file

It can take a bit of thinking about, but devising a system that will create unique names for each file you create while keeping that name simple, will help avoid accidental deletion or overwriting.  Filenames with more than 30 characters and obscure abbreviations make it hard to order the content over time, so less complexity is better.  Special characters, dashes or extra dots can create compatibility problems, so stay with the basic alphanumeric characters, underscores and one three letter extension. If date of creation is important to the content or its sequence, this can be usefully included as part of a filename. Where the date will be used to order records in sequence, starting the filename with year, then month, then day, will enable sorting chronologically and avoid confusion between day and month order.  In many cases record order is important, so a serial number or lettering can be included, allowing enough leading zeros to (e.g. 001) to capture the entire series.  Some other basic descriptive metadata can also be added (such as file origin, type or subject e.g.'film','scan', or 'drama'), while large projects may require an item ID. A digital photograph can be labelled 'YYYYMMDD_type_sequence' e.g. 20090801_dphoto_281.tif. It's worth considering including folder names as part of a filename.  This helps ensure files can be linked back to their origin in the directory, and can help ensure uniqueness where dates as descriptors are not useful.  Pink Floyd's track could be labelled 'Track#_Artist_Foldername' or 'Track#_Foldername' where the foldername identifies the album artist and number e.g. 07_PinkFloyd_Ummagumma.wav or 07_PinkFloyd1969_01.wav.

Rule 2: Make use of folder structures to keep your content organised

Although it's not always possible, using a controlled folder structure for your file storage improves organisation and aids back-up and future migration.  Like any filing system, design your folder structure to allow your content to expand over time without duplicating descriptions or nesting folders too deeply.  Aim for a maximum of five folder levels from your drive letter to reduce compatibility problems and aid navigation. The same kinds of unique naming approaches for filenames can be used for folders. If you create different versions or derivatives of files, you may need a folder system that allows you to distinguish original 'master' versions from altered versions.  It may be useful to pair folders as master and copy, or keep them in completely separate structures to avoid accidental alteration.  Both the master and the copy need to be backed-up. Even if you mainly use a software library or database to navigate through your content, a good folder structure will help minimise accidental loss over time and as software systems change. 

Rule 3: Embed important metadata into the file

With your filenames and folders perfected, don't rely on those names as important metadata for your digital content.  Names can be changed by other users and timestamps can be altered just by copying folders or opening the file.  The best way of describing your content is to embed the metadata into the file itself using an open and accepted metadata format.  Many digital files, such as PDFs, TIFFs and JPGs allow the embedding of metadata in the file header, while other files such as MPEG4 and WAV can be placed in 'wrappers' that carry metadata.  Embedded metadata allows you to verify the contents of a file even if the filename or folder location has changed.  If you are unable to embed metadata, then it is important to have a metadata record that links your unique file and folder structure to a description of what the file contains.  If you don't have software to help you do this, you can save a plain text file in each folder of content listing what the contents are.

Digital technology  is accused of creating many risks, including the risk of rapid loss or obsolescence of any content stored in digital form.  While this may have been a very real risk in earlier years, today we have available to us the open standards and the knowledge to manage that risk to an acceptable degree.  It's up to us to apply them in a way that will work for the future, whatever technology systems may eventuate.

We've turned off comments here, but we'd still love to know your thoughts. Visit us on our Facebook Page @digitalnz or on Twitter @DigitalNZ to share any ideas or musings with the DigitalNZ team.