Home Data Backup Strategy In Today’s Computer and Internet Age

Last updated on 6th January 2014

The contents of this post were triggered by a rather stressful event that happened to me a few days ago. My primary backup device, a 1 TB external hard disk, suddenly was not getting recognized by Windows and so my primary backup data effectively became inaccessible to me [1]. I was somewhat taken aback by all the primary backup device data being lost to me, right out of the blue (no warning signs whatsoever), and all at one stroke. I did not have a secondary backup (on separate media/device) to fall back upon.

In today’s computer & Internet age, losing all of one’s data can be a catastrophic event. In my case, I still have the data on my internal hard disks (and the critical data has now been backed up to two flash drives/memory keys of 2 GB and 1 GB). But till I have all my data from the internal disks backed up at least on one backup device, I am very vulnerable as if, God forbid, something happens to my internal disks now then I will have a data loss catastrophe on my hands.

I should also mention that I am retired from commercial work and do home-based blogging and “email-writing” (writing privately to a mail group) on software development practice in academia, software development, spirituality, religion and some more topics of current and general interest.

I started thinking about a more robust home data backup strategy than what I was using so far, and corresponded with my friends and acquaintances on the matter. I received a lot of suggestions and have learned a lot of things from these suggestions. This post captures my thoughts and my understanding of the various suggestions and input I received from many correspondents. These correspondents are India based and US based. I thank my correspondents for their valuable suggestions and input.

Before one looks at a backup strategy I think one needs to talk about the data that needs to be backed up. In my case I have a lot of audio files as well as video files which increase my data size significantly. However, they typically are not changing critical data for me. An overview picture of my data sizes, as of now, is as follows:

1) (Changing) Critical Data: In compressed form slightly less than 4 GB (excluding gmail backup); gmail backup – again slightly less than 4 GB [In the past couple of years or so I have not had the time and resolve to delete all/most old and possibly not useful mail from my gmail account.]

2) Audio: around 100 GB (including some duplication)

3) Video: around 60 GB

4) Other Data: around 50 GB (includes software installables, OS images, documents & books, images etc.)

[* I had lots of movies and documentary videos taking up hundreds of GB space on the external disk. I did not consider them vital and so did not have a copy anywhere else. With the external disk becoming inaccessible I have lost those movies and documentary videos. But, to be honest, it has not bothered me so much. I guess, if I really want to see some movie/documentary that I recall was on that backup I should be able to get it from somewhere else.]

Now about the backup strategy. I think it would be quite safe to have two backup media/devices – a primary backup and a secondary backup. Given the fact that there is no guarantee about hard disk or any other media not failing/crashing, one has to take into account the very real possibility of one backup media/device, suddenly and without any prior warning whatsoever, becoming inaccessible (as mentioned earlier, that happened to me). If there is only one backup then there is a window where there is only the source data and no backup copy of it at all. That window (till a new backup is made) is a tricky one where another disaster or a mistake or a virus could result in data (and so, work) being completely lost. Having a secondary backup makes the situation far more comfortable. [There is always the danger of a virus going onto the backup media too thereby corrupting the backups. I have presumed that the system uses a good anti-virus software that catches viruses quickly i.e. even if a virus affects some source (internal hard disks) data files it gets detected quickly after which the virus can be removed by the anti-virus software and the affected source data files replaced by data files from the backup (if available).]

So a safe approach would be to have a 1 source + 2 copies backup strategy with each copy being on separate media/device.

[As I am covering (simple) home data backup I am not getting into more sophisticated RAID solutions. Another point is that I am not looking at having the data copies in separate locations. So if there is a disaster that affects the entire home then the source + 2 copies may all get lost/become inaccessible at one stroke.]

But there may be some cost/other issues. So I think one can look at the following two specific solutions:

A) Two large and separate external hard disks of suitable size (500 GB or 1 TB). This solution allows for all the data mentioned earlier to be covered under a 1 source + 2 copies approach.

B) One large external hard disk of suitable size (500 GB or 1 TB) and a somewhat large flash drive (memory stick/key) say of 32 GB or even 64 GB. This solution will permit the (changing) critical data (and some additional data) to be covered under the 1 source + 2 copies approach. But (most of) audio, video and other data will have only 1 source + 1 copy.

[* Online storage services like DropBox and Google Drive were suggested as good Backup media. I feel that for the large amount of critical data I have (of the order of 8 GB including gmail backup) I will face bandwidth and Internet usage issues on my Broadband connection. Further, for sensitive data, there may be some privacy concerns. For more about the online storage option please see Note [3].]

For either solutions (A and B), when data is lost/becomes inaccessible on either the source or the backup copy(ies), as soon as possible/feasible, a new copy (or restoration of source) must be done so that one is back to the situation of 1 source + 1 copy/2 copies.

For the changing critical data a convenient way to do easy and usually quick regular backups would be to use full backup followed by incremental backup using software like Microsoft Backup. This would typically need a decent organization of changing critical data and inclusion of all relevant folders in the folder selection list for the Backup program. I have followed this approach (full backup followed by incremental backups using Microsoft Backup) for changing critical data over the past couple of years or so, and found it to be very convenient and well worth the initial setup effort and critical data folders organization discipline.

Notes:
1. I had bought a Seagate external hard disk FREEAGENT 1TB, model no. ST310005FDM201-RK in May 2009. It worked like a charm till a few days back. Suddenly the disk stopped getting recognized by Windows. The device was recognized by Device Manager but Windows (File) Explorer froze up. Disk Management showed it as a blank filesystem with no data! Seagate Manager software reported that the disk needed servicing.

I looked up some forums via Google Search and tried out some stuff – no luck. Lots of guys seemed to have faced the same problem.

Fortunately the internal disks are working OK which allowed me to take critical data backup on two flash drives (Memory stick/key) that I currently have of 2 GB and 1 GB.

2. A blogpost that may be useful to some readers: Backing Up Gmail Mailbox with Labels to Desktop PC (or Laptop), https://ravisiyer.wordpress.com/2013/05/18/backing-up-gmail-mailbox-with-labels-to-desktop-pc/

3. Online storage is a popular service nowadays and some correspondents suggested it as a backup medium. Dropbox and Google Drive seem to be the main contenders in this space. Dropbox provides free service of max. 2GB and higher capacity at a price. Google provides a max. of 15GB free space shared between Google Drive, gmail and Google+ photos.

I was told by a correspondent that Google and Dropbox store user-uploaded data in multiple data centers so that even if a whole data center is destroyed by an earthquake or fire or something, unlikely as it may be, you still have your data. And they immediately replicate it over to a new data center, so there are two copies. In comparison onsite backup (local external hard disk, flash drive etc.) is not protected against catastrophic events like fire, earthquake, burglary etc.

I think I have an internet usage as well as upload bandwidth issue with online storage but others may not have such internet connectivity issues like me. I live in Puttaparthi, Andhra Pradesh, India, and now have an “unlimited” plan which is actually (upto) 4 Mbps till 8 GB (in a month) and then 512 Kbps for the rest of the month (the BSNL plan is BB Home Combo ULD 950). As I extensively watch internet videos (mainly youtube) I have been crossing the 8 GB limit typically by the middle of the month. I come to know by videos pausing for download (and which I sometimes confirm by using a network traffic size meter tool – BitMeter). BTW the youtube videos I watch reflect my areas of interest like higher education teaching quality, software, spirituality, religion, world economic challenges, history and current world events. I have found youtube to have a fantastic collection and variety of videos, including documentary videos from eminent media channels like BBC and PBS, which are of great educational and informative value (to me at least).

Considering the size of my critical data (around 4 + 4 GB) and my Internet connection I cannot consider online storage as my main backup medium. But I can consider a very critical but non sensitive subset of that data for online backup. [And I did that. Now I have slightly less than 750 MB uploaded to my new (free) DropBox account’s online storage area (limit of 2 GB). It took a lot of time for the upload – maybe around half a day (I think I had already crossed my Internet connection high-speed usage limit of 8 GB for the month and so was at 512 Kbps download & upload speeds). One issue I faced is that DropBox does not seem to take a list of folders as input – so I had to individually copy-paste selected folders from local disks to the Dropbox folder (which gets automatically copied online). Another issue is that I had to manually create the parent directory hierarchy structure for some of the selected folders (as its sibling folders were not to be copied). Dropbox seems to be more of a set of folders online storage service and so seems to be lacking some of the features of a sophisticated backup program like Microsoft Backup.]

There are some other concerns with online storage which were raised by some other correspondents.

A correspondent wrote that (edited somewhat) Google may not help if anything is lost on Google Drive. Moreover, Google makes everyone sign the disclaimer that they won’t be liable/responsible if docs/files are stolen/hacked away.

Ravi: That is really an important matter. I mean, we take it for granted that it will be available because of our experience. But if there is some great crash/problem, legally and perhaps even morally, we are nowhere. With a backup device the company is responsible legally for its product at least during the warranty period. Yes, it is not legally responsible for the data. But, I mean, there is something – like data recovery service for a price, e.g. http://www.seagate.com/services-software/data-recovery-services/. When I ran Seagate manager software on my problem external disk it detected the problem and asked me to contact Seagate. So there is accountability of some sort here. With the free Google Drive cloud service (and its gmail service) I think the accountability is *ZERO*. That’s the truth and one should not lose sight of this truth no matter how much we use Google’s wonderful free services. [They may still respond to support requests but it is their choice and, I guess, the reality is that Google support simply cannot seriously service all support requests. As far as I know they don’t charge for such data recovery support service. Perhaps even if they charge reasonable money they still may not be able to service all support requests.]

But I should also state that in my experience with gmail over at least five years, I don’t think I ever lost any data! That is an awesome statement of the data reliability that Google has delivered at no charge to users like me. I thank Google for this wonderful free service that they have provided me.

[One correspondent responded that DropBox provided good support for data recovery for him even when he was using its free service.]

Personally, having the data physically with me gives me a bigger comfort feel. Yes, it is not as (seemingly) ironclad as the datacentrers with their (virtually) failsafe mechanisms, but for the usual media problem cases, I have the data physically with me and so have some chances to recover the data. Also, even if some critical Internet backbone undersea cable gets cut due to which Internet access from India to some country(ies) hosting the data center(s) becomes sluggish, I don’t get affected for my critical data. I think physically having the data within my control is a vital factor for me even if that does not take care of earthquake, fire, burglary etc.

Perhaps I am wary of cloud data backup services due to lack of in-depth exposure to it though I have quite a bit of stuff in my gmail mailbox anyway. Maybe it is a reluctance to embrace the new-for-me approaches. I think that’s it. Having had some bleeding edge experiences in the past over a variety of tech. and non-tech. stuff I try to avoid getting onto new territory in some areas like tech. and prefer the beaten track for those areas. That colours my view of the online vs. onsite storage debate. It is a feeling (intuition perhaps) + intellectual analysis thing for me when it comes to such decisions. And the feeling about not getting into new-for-me tech. approaches for a critical area like data backup makes me go for more conservative even if somewhat lesser quality solutions.

Another issue which is a sensitive one but I think it must be looked at, is the possibility of political tensions between the country where one is living and the country(ies) having the data centers where one’s cloud data resides. What if these tensions lead to some arm-twisting stuff like the country(ies) with the data centers blocking access to them to the country one is living in? These decisions, I think, are in the hands of the government of the countries involved – the tech. companies may not have a choice. I mean, even if we have a global Internet, we do not have a global government – it is individual countries’ government.

A correspondent responded that data centers are spread across the globe to be resilient to natural disasters and politics as well! [Ravi: Perhaps that’s true for some services of some online storage providers but I am not sure whether it would apply to all services of any well-known online storage provider. I mean, I have not seen any statement from cloud service providers promising to adhere to such globally spread redundant datacenters for their free services. However, as of now, I just don’t know enough about the matter to take a considered view.]

Another correspondent mentioned the privacy issue. Ravi: Regarding the data privacy concern I think I am OK even if my backup data gets snooped by some government chaps. What may be a problem is if such data gets passed on to any Tom, Dick and Harry. That is a worrying matter for me. So, I don’t think it is a good idea to put very sensitive data (like personal financial info.) on online storage.

4. A correspondent mentioned a concern with external HDD which is that these are still electro-mechanical drives with relatively higher potential to fail compared with flash drives or solid state disks (SSDs).

I looked up external SSDs (I have not used any so far) on a Chennai based net it store, http://www.theitdepot.com/products-USB+Hard+drives_C25.html (you have to choose External SSD in the Category check boxes on the top left of the page to view only external SSDs). 120 GB drive minimum price is around Rs. 13,500! Whereas a 1 TB external hard disk is available around Rs 5,000 to Rs 6,000.

SSDs seem to be more in use currently as internal disks. A correspondent wrote that their main advantage is that they are superfast. Boot times are cut in half, apps open instantly, files up to many tens of mb or more copy instantly without a progress bar… Apple’s ssds can read or write almost 1gb per second. They also consume less power. But they are very expensive — hundreds of dollars. A 1tb ssd may cost 500 dollars or more. They make no sense for backup.

5. A correspondent mentioned that he read somewhere that all brands hard discs are equally unreliable due to cost cutting.

6. A correspondent provided a good analysis of backup strategies. I have provided that below:

Protecting against disk failure can be simple as 1:1 backup to more elaborate, depending on what you want:

    1. Data availability (ie., your application or OS should not see a disruption if one of the disks fail)
      1. Solution choices center around some kind of redundancy; one of the most efficient mechanisms is RAID, whereby the data is stored across redundant drives, with sufficient additional info to enable reconstruction of the contents of a failed disk using the contents of others. Both hardware RAID controllers and software RAID stacks are available. You will need more than one drive.[RAID reconstruction on disk failure is not always non-disruptive, depending on the RAID mode/client OS capabilities. Also, there are ways to make a data replication based solution non-disruptive as well, by using 2 servers and switching over between them on failure.]
    2. Data protection (ie., your application or OS should be able to regain access to the data without corruption)
      1. You would use some form of replication, ie., multiple disks. It can be as simple as full or incremental backups; to automatic replication to a destination drive whenever something on the source drive changes. You could do this at granularity smaller than a drive, of course, such as selected directories and their sub-dir/files. Various flavors of rsync are available for this.

You could also do combinations of the above.

He uses the B.i option, using multiple drives, hooked up to a $15 pogoplug NAS server: http://www.adorama.com/COCPOGOP21.html?gclid=CPKLq7OatbsCFU_NOgod1mUA9A.

Here is a link to different model of the POGOPLUG device with a single USB port, sold in India for Rs. 2125: http://www.junglee.com/Pogoplug-Backup-and-Sharing-Device/dp/B005GM1Q1O

For reference, the advantages of a box like the POGOPLUG (there are other options, including using a Raspberry Pi based NAS server), are:

  • cheap
  • the extremely low power consumption, ~ 5 w/h by the server
  • the current generation of drives spin down on idle, saving more power
  • noiseless, no fans
  • no need to have a big PC up and running if only a smartphone or tablet needs access to content on the server
  • flexibility to run other services on the box (like a DLNA server, hook up a printer) that can be accessed by clients on the home intranet

7. A software veteran shared a wise rule: I have made a new rule for myself: if I bring anything new into the house, I must get rid of something (or things) that are equivalent. So if I get a new shirt, one or more old shirts must be given away.

I try and follow this rule also in accumulating data. I have tried to keep my Google data usage to about 20% of the permissible free limit. When I cross that, I start to delete old stuff. I have not yet deleted anything that I later regretted (or even noticed!).

—————————————————————————

Update (6th Jan. 2014)

I now have all my data backed up to a 1 TB Western Digital Passport (USB) external disk, and all my critical data plus some other data (some audio & video) to a 64 GB Sandisk USB Flash drive (Memory key/stick). [This corresponds to Solution B in the strategy.]

I have also now got a free Dropbox account with 2GB cloud space on which I have put my very critical data (but not sensitive data due to privacy concerns).

One correspondent had suggested that I try RMA for the failed Seagate disk. Well, thanks to him, I tried and it worked, even after more than four and a half years since I bought it (as it was under five year warranty). The Chennai (India) based agency partnering with Seagate for this sent me a 2 TB Seagate Go Flex disk as a replacement which I received yesterday. [There was a hiccup in the tracking information as Seagate (Singapore office going by email name) mailed me a wrong courier tracking number but that was very minor stuff in exchange for a 2 TB disk.] I have yet to check out the 2 TB disk.

So now I can go for Solution A as well if I want :).

Well, the key thing, is that the vulnerable window I was in without a full backup has been closed, and I am reasonably safe with my data. Sure, it is not fool-proof but I feel I have done the due diligence for my home data. If, God forbid, I still lose it, I will put it down to destiny/God’s will :).

A correspondent who uses a Macintosh laptop computer wrote (slightly edited) in response (partial extract) to an email similar to the above update:

Clearly, backup is important.

I use Apple’s TimeMachine software, which backs everything up automatically on hard disks both at work and at home, so I have two backup copies.

Time Machine works effortlessly, behind the scenes. And it has been a real help. Every once in a while, I overwrite a file that should not be overwritten. To get a back-up copy, I just open Time Machine and go back 1 hour, 1 day, 1 week, whatever, seeing the internal hard drive the way it looked at that time, copy the file I want, get out of Time Machine, and paste the file somewhere.

I wrote back (slightly edited):

Wow! I think that is where Apple is perhaps unbeatable – they make the software/system so easy to use. I was not aware of such an elegant backup solution but then I am not a backup expert either. I had heard that name (TimeMachine) in the computer field with respect to the Internet but not an OS.

A correspondent passed on this very interesting article, “Did You Know Windows 8 Has a Built-In Time Machine Backup?”, http://www.makeuseof.com/tag/did-you-know-windows-8-has-a-built-in-time-machine-backup/. It describes a new feature in Windows 8 called, “… File History, a built-in backup feature that functions similarly to Apple’s much-loved Time Machine.”

This entry was posted in Misc. Bookmark the permalink.

Leave a comment