From time to time people ask me how to deal with cases of data loss. Usually I tell them about a tutorial that was written for Private Disk - the subtleties of backing up encrypted data. The problem is that data loss can happen to anyone, not only a Private Disk user, so I decided to write another guide, which is more generic, and is about data safety in general. It is true that there are more ways in which things can go wrong with encrypted data (because if you forgot the encryption key you actually lost all the data), there are still plenty of scenarios which can make someone unhappy even if they don't use encryption.
This is a story for those who are aware of the fact that we live in a world where Murphy's laws rule, those who understand the importance of backups, and are looking for a good backup strategy.
If you don't know why backups are important and why you need them, check out the reading material section in the end, or read about Murphy's laws when you have some time.
A good backup mechanism is
- Automatic - copies must be made even if we are asleep, or tired, or not in the mood to do copies. A backup strategy that depends on our not forgetting to do an action is doomed to fail sooner or later, because forgetting is a feature we all have;
- Reliable - a copy that was made must be the equivalent of the original; i.e. if things go wrong, we must be certain that the copy will be able to replace the original (not "this copy will do", but "this copy is as good as the original and I will not feel the difference after restoring");
- Invisible - the copy must be made in the background so that the "files are being copied" window doesn't get in the way. If it does, we will feel tempted to close it because "ah, it will never happen to me anyway" or "just this one time I will close it". By having the copy being made in the background we exclude the possibility that a human will accidentally interrupt the process, or interfere with it.
These are not mandatory, but having them is a bonus:
- Keeping several old copies, so that you can revert to the state in which the files were at different points in time;
- Limiting the file copy speed, so that the system stays responsive if the copy is made while you are using the computer;
- Compressing the backup, so that less space is used on the storage device where the copy is kept;
- Integrity checks - after the copy is made, it is compared with the original (using a checksum or hash), and you are warned if there is a difference between the two. Such checks will take more time, and most of us will do fine without them; but if you're dealing with mission-critical data, such a check can save lives or millions.
Make a list of files and directories that you need to backup.
If you have plenty of storage, you will probably want to backup the entire partition or hard disk, but this is not a good idea because:
- it takes more time to do such a backup
- if the data is on the system disk you will most likely have to reboot the system in a special mode (which is against the lazy nature of mankind)
- the backup will contain a lot of data you do not really need (operating system modules, temporary files, programs that you can always download from the Internet, etc)
Decide where you will keep the backups, these rules must be taken into account:
- it is better to keep the backup on another partition
- even better: on another hard disk
- even better: on another computer
- even better: on another computer in another physical location (preferably on a different planet, to make sure that your data will survive the impact of Earth with an asteroid)
- Estimate how fast your data grows and how much space you will need.
This is very important, because if it is not done right you might be tempted to interrupt the backup processes in the future because you ran out of space.
If your original hard disk is 60 GB in size, it is safe to assume the backups will never be larger than 60 GB in size (unless the backup storage device also contains backups from other computers). Take into account that you will most likely compress your backups too - so that will take even less space.
Here is a sample environment:
Lappie - a laptop which contains the following data I care about
- D:\Soft\Trillian - my instant messaging program, I want to keep my conversation history and the
- D:\Soft\TheBat - my email client and the email archive for several accounts
- D:\Stuff\TXTs - a directory with various texts (essays, poems); this is what I use instead of the standard My Documents folder (the rationale behind this decision is given in one of the discussions in the reading material section)
- D:\Stuff\MyNotes - my OneNote repository (again, notice that it does not reside in My Documents)
- D:\Soft\Palm\Alex - my Palm desktop profile
- Servo - a desktop computer at home, which has a shared folder to which I can write, available as \\Servo\backups. In this folder there are sub-folders for each type of data I am backing up (MyNotes, TXTs, etc).
- Hive - the server at work, which has a shared directory \\Hive\Alex\ to which only I have access; I will store my work-related data there.
- The backup at home must be made once a week, every Sunday at 20:00; and the one at work is done every Monday at 09:30.
- If the target computer is not turned on at the time, the backup will be made automatically next time there is an opportunity (ex: instead of waiting for the next Sunday 20:00, attempts will be made to copy the files every now and then, until the backup succeeds).
The availability of \\Servo\backups can be used as a test: if the share is accessible, it means that both machines are on and the network is up; otherwise the backup is postponed.
- I want to keep several old backups too, to make sure I can revert to them in case I want to take a look at the older versions of the files, see the image below.
Each time a backup is made, the old backups are pushed down, the fresh backup becomes #1, and the oldest backup is deleted. Note that even though the image says "new files", the backup will include the old ones too. Keeping three older copies is more than enough for the average user, but if you want to be able to travel back in time and see how your files looked like back in '45, you will obviously have to keep more than three previous copies.
To get the 3 features a good mechanism must have, only 2 tools are needed:
- task scheduler
- backup program
Windows comes with a task scheduling mechanism, you can access it via Control panel\Scheduled tasks. There are alternative programs which offer more features, but you will probably want to start with the standard task scheduler because you have it for free and it is already installed. There are reasons that can convince you to choose a different program for this purpose, they will be discussed later.
The tandem is nnBackup and nnCron, both programs come from the same company, they are light and very flexible, providing a broad range of features. They are not free, though there is a special offer for ex-USSR folks, they can get it for free; therefore you might want to take a look at some of the programs described above, or look for alternatives elsewhere.
nnBackup is the program that does the actual copying. You can read about its many settings in the manual, be prepared to make notes on a paper, or in a temporary text file. Once you are done, you will end up with a set of command line arguments that do what you want, for example:
nnbackup.exe verz -n 2 -sdn "onenote" -i D:\Stuff\MyNotes -o \\Servo\backups\Onenote -s -e -sa -zip -v -pc
nnbackup.exe verz -n 2 -sdn "Documents" -i D:\Stuff\TXTs -o \\Servo\backups\Documents -s -e -sa -zip -v -pc
nnbackup.exe verz -n 2 -sdn "trillian" -i D:\Soft\Trillian -o \\Servo\backups\Trillian -s -e -sa -zip -v -pc
And so on... as you can see, all the lines are identical, the only part that varies is the one that concerns the path of the source (where files are copied from) and the target path (where the files will be copied).
For the curious minds, here is what the command line arguments mean in the examples above:
- -i: input directory
- -o: output directory
- -verz: keep several versions of the backup, in compressed files
- -n 2: two backups will be kept
- -s: include the subdirectories too
- -e: include empty directories too
- -sa: copy the access rights (ex: you have NTFS access rights set for your directories, and you want them to be preserved on the target machine)
- -pc: add a new backup only if differences were found between the current one and the old one
- -v: verbose, this will show you which files are being copied - you might be interested in watching what is going on; + it usually impresses the non-tech savvy folk that happens to be around ;-)
In the same manner, I wrote the commands that will backup my other folders. Whenever I have a new type of data I want to backup, I can copy/paste an existing line and alter it accordingly. All these commands are saved in a BAT file, thus they will be executed one after another. All we need now is to launch this BAT file automatically on a weekly basis.
Watch out! some programs lock the files they use, so the files cannot be accessed by other processes (such as nnBackup, trying to make the copy). In this cases you have to make sure that the application is not running (ex: the mail client must be closed before the backup process is started, otherwise the mail archive cannot be read). To counter this, see if the program in question provides command line arguments (or any other mechanism) that allows you to close it correctly. Once you find out how to do that, perform that action before calling the backup script. If you don't know how to do that, then just close the programs by hand - but note that this goes against our philosophy - the backup must not require human intervention of any kind, because we can't trust humans...
nnCron comes into action now, this program will take care of running the backup script at the right time, re-run it if necessary, check if the network is active, etc. Creating a new task with nnCron is very easy, the screenshots below should be more than enough.
You can play with the other settings too, their names are self-explanatory. You will probably want to use the "host exists" feature, to verify whether the target backup machine is online; there are also various plugins that make it possible to use other conditions when evaluating whether a task has to run or not.
nnCron can keep track of multiple tasks; in this scenario, you will want two different scripts (one for backing stuff up on \\Servo, the other one for \\Hive), each script will have different settings for the time it should be run.
You don't necessarily need another computer for the backups, if you have an external disk, you can use it as the target path (i.e. instead of \\Servo\backups use F:\backups, replacing 'F' with the letter that corresponds to the external disk once it is mounted).
In the beginning you will probably not want to run the tasks in the background, because you want to see the progress of the transfer process, spot errors (if any). But after you do this a couple of times and you're sure everything works as you think it does, you can trust the system and let it work in the background.
The tips above are a set of general guidelines that are supposed to help you understand that good backups are a lot more than just copying and pasting files by hand in Windows Explorer.
A good backup mechanism must be thoroughly analyzed and tested before you can actually trust it. Do not let the apparent complexity dampen your spirits (I refer to finding the right command line arguments), once you get it figured out it is easy; but the most important part is that it is worth it. You will realize that when the first crisis comes and you get over it with no pain, trust me on that one.
Feel free to experiment with other similar tools (I will greatly appreciate it if you leave a comment and share your impressions), there are many of them out there.
- Do not backup that what can be easily replaced (ex: program installers can be downloaded from the Internet).
- Another computer > another HDD > another partition > another directory.
- Keep system data and user data separate.
- Never send a human to do a machine's job.
Other reading material
- The importance of backups, don't wait until disaster strikes - the guide explains why data loss occurs, what the different types of data loss are and how to deal with them. The article also focuses on the aspects that are specific to backing up encrypted data. Although the primary focus is on Private Disk, the tips apply to other types of encryption software as well.
- Keep your system files and your data files separate - this forum discussion explains why keeping personal data in a dedicated directory is a good idea, and how it helps you solve problems easier if they occur.
Note: all the computer names were made up, coincidences with real world entities are just that - coincidences.