One of the certainties of scientific research, no matter what your field, is that you will create data of one kind or another. And usually that data is stored in the form of computer files. Another certainty of scientific research is that you must be able to find your data files, sometimes years after you first created them. Could you easily find those original microscope images from 2012?
To be able to quickly and easily find your data when you want it, you need a good file naming system.
What are the features of a good file naming system? Well, they vary depending on your specific needs and those of your research group but here are a few things you should consider:
1. Be Consistent!
This is the most important point. When you need to find a file relating to an experiment you did three years ago, you can’t rely on your memory to know what you called it. Having a system that is consistent will allow you to locate that crucial piece of data when you need it. If you can instil a consistent file naming system across your whole group that would be ideal, but even if it is just you, your colleagues will soon notice and appreciate your organisation, and may begin to adopt it of their own accord. One thing to note: it would obviously be best to start this system at the beginning of your PhD, post-doc or project but it doesn’t matter if you haven’t. Just start as soon as you can and build from there.
2. Use the YYMMDD date format (preferably at the beginning of each file and folder name)
This is the single biggest insight that has helped me with my file organisation. Computers can only sort filenames alphabetically and if you use dates such as 23-Jan-2015 or 07-05-13 then you will soon find that the files do not end up on chronological order. The only way to ensure this is to use the YYMMDD format. So 23 January 2015 becomes 150123. It seems strange at first but once you get used to it, it has immense benefits. I personally start almost every single file and folder name with a 6-digit date code like this – it means everything is chronological and makes it much easier to find. I then use the rest of the filename to describe something about the experiment or data but at the very least I can reconcile the date with my lab book to work out which experiment that data belongs to.
3. Use a logical file structure
How to divide up your files depends on your research but pick something that makes logical sense for you. Here are a few things to consider:
By project – if the research projects in your lab are distinct and don’t cross over much then a separate file location for each project is a logical option.
By researcher – a common solution is to have a folder for each lab member, especially if you all use a shared drive for your data. This works well if each person has their own distinct research, but if Debbie and Jack work together on a project which file do they save it in? And how will anyone know later? This might not be a major issue but it is worth considering.
By date – a purely chronological system has some advantages. Top level folder can be for years or months and then subfolders for days.
By technique or equipment – you may find it easiest to store your files organised by source or data type. For example: all images produced on a particular microscope, all real-time PCR files, or all flow-cytometry data together.
In actual fact, a good filing system will likely be a combination of the above. Remember, consistency is key.
4. Use Leading Zeros
If using a sequential numbering system, estimate the number of files you are likely to create (overestimating is better than underestimating!) and use leading zeros so fill in the lower numbers so that each has the same number of digits. This ensures files will sort in the correct order.
5. Make sure filenames are unique but simple
When choosing how to label a file relating to a particular experiment, include as much (and only as much) information as is necessary to uniquely identify it. If you repeat that experiment how will you distinguish between the two sets of data? If you include the date then that should be enough.
6. Ensure filenames can stand alone
Mostly your files will be neatly organised in folders as described above. But sometimes you will need to separate a file from its folder to email to a collaborator or transfer to a difference computer. Make sure you include enough information in the individual filename to identify it on its own. Don’t rely on the folder structure to know which experiment is belongs to. Again, this is where I find that beginning every filename with the date (YYMMDD) is particularly useful.
A couple of final tips: learn how to use the automatic naming functions of whatever instrument/software you are using. Often you can set it to automatically name files the way you want them which saves hours! And if you can’t do that, learn how to use batch renaming tools to accomplish the same task. I’ll be covering these topics in later posts.