First of all, I must admit that I really care about my privacy. I do my best to scan all my documents if I cannot get them in a digital version.
Quickly, I realized that finding a reliable way to back all this stuff up was “just” mandatory. I tried a few ways, using a “space exchange” with a friend of mine on each other’s server. (I really trust him if you were to guess at the first sentence).
A few months earlier (well, it could be years now by the way), I subscribe to different Cloud Storage Services, like Dropbox, Box or hubiC.
The last one, hubiC, is a service hosted by OVH, a french very well-known service provider. I host some services within their services (this website, at the time of writing this post 🙂 for example). I used hubiC for free for a long period of time and the service kept on improving. As a French Citizen (beside being a GeekCitizen), I really care about where my datas are hosted. As everyone know, in the USA, my datas are not as safe as I might expect. So hubiC seemed to be a good choice (and it is for me !!). I’m not that a paranoid but really think we should all care of our privacy. (I think of those person who were stolen their digital identity and more).
I will stop here for Privacy blabla, my stuff blabla, … and will just go with the program I wrote. By the way, it’s really an interesting topic we can discuss. Just connect with me, we could discuss around a beer if you want.
One last thing : I’m not a Consultant and don’t code in my day to day job (by the way, I don’t want that, I code only if I need something for personal reasons). But, like everything I do, I try to develop my skills to extend the trust my customer can have while discussing with me. I may probably use these dev’ skills in my daily duty with my customers to be credible when I speak with Ops or Dev guys.
1. Describing the program I expect
I will start with the easiest : I want to backup any file in the Cloud and have the option to encrypt the files. One thing to note is I truly believe that you need to manage your encryption keys yourself. I decided to go with OpenPGP keys in the script.
The program also need to be configurable easily without requiring the edition of the program itself. So, need to read its configuration from an external file.
It also needs to be “auditable”, that means anyone should be able to determine easily which file are stored in the backup directory without requiring to parse every file. By the way, as every file will be encrypted, we cannot afford decrypting each file to find the right one. Have a look at the next requirement.
If we take (CPU) time to encrypt each file, it implies that the program must remove any external identification form. So, it must alter the FS metadata of each file, and the main metadata which is the file name. So, the program must know how to change the file name.
And finally, the program has to be open-source : if you want to trust and be sure I don’t fool you, you might except to have a look at the program. Like that, you’re sure I don’t send any information somewhere.
2. How the program works
I will not describe how I wrote the program, just look at it if you want, it is fully open source (GPL v3). It’s written in Perl, and before you ask, it’s in Perl because I decided it like that.
So, the program first create the database schema (with “-n” option) and starts looping in the different source directories, according to the configuration file, to find every files.
For each files it finds :
- The program calculates its checksum
- Check if the checksum exists
- If yes, updates in the DB the number of time the file exists (Duplicate case), and add an entry with the metadata of the file in the DB
- If no, encrypts it, store the files in the repository, create an entry for that checksum and create an entry for the file in the DB with its associated metadata.
- and loops once more
At the end, it closes the DB, encrypts it, move it to the repository and deletes the un-encrypted copy of the DB.
In the current state, the program is limited to UNIX’s like operating systems. In fact, the Perl module on Mac has some limitations and is not working exactly as expected (some trouble with local environment variable and GPG module not working as expected at all on my platform).
3. How I use it
Well, quite easily in fact : I backup some directories (my personal datas, …) with it to my Cloud. Currently, I use hubiC’s program.
I also checked the features of Expandrive, which is just awesome : you can plug into your system your Cloud Folder just like a USB Stick and do not need to have the space on your hard drive/SSD like the hubiC default app requires. Storing tons of encrypted files/cold datas become really easy with this. (By the way, the pause feature, see next chapter, will become really important as it needs to upload the file “inline” before deleting the file locally).
If you guessed it, yep, you’re right, I specified “backups in the Cloud“, but you can use it with any directory you want, for example an External HDD.
4. What’s next
I plan to modify the program for v1.1 like this :
- Stop using command lines to encrypt files (currently, the GPG module is not working as expected on my Mac platform). It will help it becoming multi-platform. For the moment, it is compatible with Mac’s and should work smoothly on Linux.
- Add an option to specify in the command line the name of the configuration file
- Add more controls to limit the risks of database corruption.
I plan to release the feature below, for version 2.0 :
- Add a “pause feature”, by adding an option in the configuration file and watching if this files exists
- Add a “lock” file to prevent any misconfiguration in the user’s environnement that would cause a corruption of the database
- Add the ability to let the user choose if he wants to activate OpenPGP Encryption
- Add the ability to restore a full repository (or a file by file basis)
For v3.0, I plan to :
- Check the integrity of the database
- Check the consistency between the database and the repository
For v4.0, I plan to :
- Integrate REST API for different Cloud providers (hubiC, Dropbox, Box, …). For the moment, I don’t know how to use REST APIs so it could be interesting to learn.
One thing to note is that, for the moment, I do not plan to add any other method to encrypt files. OpenPGP is a sure and easy way to encrypt files and you can use it to build your Web of Trust (click to learn more).
And, to be honest, I’m not sure I will implement any of these features. It requires time, I do this for my needs, for pleasure some time.
5. Where to find it
I started using Github within my dev environment. The script could be find here : https://github.com/GeekCitizen/SecureBackupToCloud.
I also activated in the “Useful links” toolbar (on the right) a direct link to my Github. I will upload my new and future developments.
6. Some recommandations
Here are some keys but general recommendations I can provide :
- Never loose you keys :
- It might be obvious but with RSA/DSA keys, you will never be able to decrypt your files.
- You might to ask some well-know Intelligence Service, they might have the power to decrypt them for you …
- Always check your Cloud Storage Provider allows to store anything
- Some forbids to store some type of files.
- If you do not comply with their rules, they may delete the files without any warning and/or close your access to their service
- Never trust a service that proposes to store your keys for you.
- If you care about privacy, you must learn to manage your keys yourself and not to rely on any third-party that you cannot trust. NO, it’s not that a difficult job to manage them : Create, Export on USB Stick, Move the USB Stick in safe place. When you are more confident, you can improve this policy by yourself.
- Corollaries could be :
- Always choose a strong encryption passphrase for your keys
- Never store on an un-trusted media your keys
You might want to have a look to CaCert. I speak of it in my Web-Of-Trust on OpenPGP Page (Click to have a look) as well.
8,095 total views, 6 views today
Thanks to a friend of mine, I updated the article to correct some typos and added expected compatibility.