Keep Your Data Safe: Cloud RAIDs

October 24, 2013 8 min read Howto marco

You have probably heard the story: someone has all their data stored in the Cloud, and one morning it’s all gone. Maybe Google disabled your email account and won’t you let back in. Or maybe it’s Dropbox that dropped your files. Or maybe it’s (and this is a real case) box.com that handed somebody’s account to someone else, who promptly cleaned the account and deleted all the other person’s files.

In information security, the concept of safety includes and extends security. Security is what you need to protect your data from malevolent others. Safety encompasses that, but adds protection from other kinds of mistakes. Like the ones mentioned above.

Here is an analogy: data security is like the walls and doors and alarm systems you use to protect yourself from burglars. Data safety is that plus the smoke detector and the sprinkler system.

Here is what we’ll do: we will use free cloud services to keep an encrypted copy of whatever data we want. We will provide the encryption, and we will make sure the data is “safe,” in that the failure of any single provider will not cause us to lose any data.

How do we do that? For the purpose of this article, I will assume you are running a recent version of Kubuntu. Also, you will need three accounts with a provider that syncs files. Dropbox comes to mind, SpiderOak, OwnCloud, etc. There really is no shortage of them. But you will need three accounts on three different providers.

The theory behind it is that we will set up a complex cascade of devices that will result in a single directory into which we can push our data. That directory will then be encrypted, munged, and distributed to the three single accounts. Once this is all set up (takes a while), you never look at the setup again. You just use the directory like any other directory you would use, and you monitor if you have any problems with the synchronized accounts.

Even better: you will have access to the same directory on any system onto which you install the same setup. It’s a bit complicated (we’ll simplify, if enough people like this setup), but it keeps you safe and sound against the most common computer issues.

Here is the cascade:

1. Configure separate synchronization directories for your three accounts

That step requires you to choose sync accounts. Any provider that supports Linux will do. You don’t need encryption (you should never trust provider encryption, anyway). You will need the ability to synchronize large files, and you should look out for a provider (like Dropbox) that synchronizes only the differences within files, instead of entire modified files.

You still need three providers (four will do in a pinch; two will not), and not three accounts with one provider (although you can merrily test with that setup), because you want to make sure that if one account is compromised, the other two are not. Even if you lose an account because one of your providers goes belly up, the other two accounts will be enough to make you whole.

The magic behind this is called RAID5. RAID is not a Pirates of the Caribbean reference. It stands for Redundant Array of Inexpensive Disks, and is a technique invented precisely for the purpose described here: to make it possible to lose one chunk of data and still recover all of it.

What happens is that the RAID software copies all the data onto two of three disks. It does so in an intelligent way, such that if you lose any of the three, there is a copy of all its data somewhere on the other two. So, if one of your accounts is toast, you can rebuild the whole set from the other two.

Of course, if you host two of the pieces onto the same account, or with the same provider, losing one means losing the other, and one single chunk is not enough to make the pie whole. So, three different accounts with three different providers.

2. Create equal-size files in each sync-ed directory

The content of these files doesn’t matter (and will be overwritten anyway). But they should be of equal size, so you probably want to use the same approach to create them.

Let’s assume you are syncing to the directories /home/ego/sync1, /home/ego/sync2, and /home/ego/sync3. You could create a file (as a simple user) by typing:

dd if=/dev/zero of=/home/ego/sync1/piece1.dat bs=1G count=1

(and then the analog for piece2 and piece3)

This will take a while, depending on your system. dd, by the way, is an ancient UNIX utility that copies files. Here is copies for the if (input file) /dev/zero (which is simply full of zeros, who would have thunk) to the file you specified, the of (or output file). You said you want the block size (bs) of 1G, and the file should have exactly 1 of them. You will generate a single 1G file. If you want to generate a 16G file, you simply change the count to 16.

3. Loop the files

Linux has a very interesting ability: it can take a file and pretend it’s a hard drive (a block device, in Linux speak). That can be incredibly useful if you have a copy of a drive (for instance, of a CD or DVD) as a file. You can simply access it by declaring it a loop device.

To create a loop device, you need to specify the name of the device and the name of the file that is being looped into. In our case, it’s the three files we had above, and I will assume you haven’t created any loop devices yet, since you are reading this tutorial, so we can use the first three loop devices:

sudo losetup /dev/loop1 /home/ego/sync1/piece1.dat

(replace the number 1 with 2 and 3 for the other two loop devices).

What this did is create virtual hard drives that the RAID software can use to do its magic.

4. Create the RAID array

Next, we bundle our three devices into a single one. To do so, we need the mdadm utility. We get it by running:

sudo apt-get install mdadm

Once it’s installed, we need to tell it where our “hard drives” are and how we want to call our single access point:

sudo mdadm –create /dev/md1 –level=5 –raid-devices=3 /dev/loop1 /dev/loop2 /dev/loop3

That will run for a while. You can check on the progress by looking at the file /proc/mdstat:

cat /proc/mdstat

5. Create the file system / Format the drive

Once you have your device (here, /dev/md1), you need to make it ready to accept incoming files. To do so, you have to format it (remember, to Linux what you just created is as good as a hard drive). You can pick any file system you like, as long as your version of Kubuntu supports it. I like the default, ext2, because it is not journaling, and the less journaling information in my files shared with the rest of the universe, the happier I am.

Run the command:

sudo mkfs.ext2 /dev/md1

If you think that was too easy, don’t complain!

6. Mount the file system

First, you need to create a directory onto which you will mount the file system. This directory should not be visible to you and should be located in a safe place (one that doesn’t get automatically cleaned up, like /tmp, and one that you wouldn’t clean up yourself). I typically choose /home/ego/.cloudraid, but it’s really totally up to you.

mkdir /home/ego/.cloudraid

Then you mount the directory:

sudo mount /dev/md1 /home/ego/.cloudraid -o rw

7. Setup the encryption filter

For this article, I will use encfs. It’s easy to set up and works reliably and is really fast. There are other solutions, like Truecrypt and ecryptfs, and more. If you prefer those, you probably have good reasons and the expertise to make it work.

First, we need to install encfs

sudo apt-get install encfs

Then we create the decrypted data directory (choose whatever suits you):

mkdir /home/ego/Documents/Cloudraid

At this point, we want to test whether we can write to the encrypted directory. To do so, type

touch /home/ego/.cloudraid/test

Then try to see if the file has been written:

ls /home/ego/.cloudraid/test

If you get an error, you need to make sure the mount options are selected such that you as a user can write to that directory.

Next we use encfs to link the two directories:

encfs /home/ego/.cloudraid /home/ego/Documents/Cloudraid

You need to use a passphrase when you create the encryption link. Write it down somewhere safe, because if you lose it, there is no recovery. Also, if you choose a weak passphrase, you might as well do without and send your documents to hackers for sale to identity thieves.

You will need this passphrase every time you start up the link.

8. Enjoy!

Everything you write to your Cloudraid directory will now be:

Encrypted by encfs
Distributed to your three sync-ed files by RAID
Synchronized to your three providers by their software

To you, that’s all weird magic. What you know, though, is that the three providers have no access to your data (if the passphrase is strong enough), that evil intermediaries have no access to your data, but that you have access to it. Also, if one of the providers becomes unavailable, you can simply choose a new one and have that new account synchronize the old file.

In the worst case, if the provider deletes your files or corrupts them, you have to rebuild your array from the two other files. Call me if you need help with that! My hourly rates are not entirely disgusting!!!

9. If you want more info, comment

There are tons of things I didn’t touch upon because I wanted to keep this article decent in length. The most obvious one is how to set up the whole chain such that it will be started at boot or login time, so that your “safe” directory is available as soon as you need it. I’ll add more information if you ask for it!