Problem Statement: I am working with a lot of freelancers for various small creative and content related work but they usually share the completed work as a folder in Google drive. Keep in mind that they are the owner of this folder. After 1 week or so the freelancer is no longer is tasked with us and decides to free up his drive and delete this folder.
I want to mitigate this problem. There are various ways to approach this:
1. Ask the freelancer to transfer ownership and then keep the folder safe.
2. Download and backup the shared folder whenever you receive a submission.
3. Automate the backing up process as secondary storage.
Of course, ‘automation’ wins, as not only it removes the hassle from both the concerned parties but you have a secondary backup for long-term retrieval and safekeeping as well.
Let’s get to it.
First, set up a VM on Google Cloud, you can use AWS or any other service. I used GC because they have f1.micro(0.6GB Memory, 1 shared vCPU) always free. Not using Google’s Storage because they haven’t added GUI to it, yet.
In the GCP Console, go to the VM Instances page. Launch Instance.
Follow this quickstart guide for starting the VM. https://cloud.google.com/compute/docs/quickstart-linux
- SSH into the console and install rclone by running
curl https://rclone.org/install.sh | sudo bash
- Create an S3 bucket in us-east-1. For steps, follow: https://docs.aws.amazon.com/quickstarts/latest/s3backup/step-1-create-bucket.html
- Enable versioning on S3, this will help you retain previous versions of the same file if they are updated on Google Drive. Follow: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-versioning.html
- Create IAM role for the uploader, give AmazonS3FullAccess to this role.
Note: It is recommended to restrict this role to only the bucket created above. You can skip if you are in hurry or only gonna use this account for backup up. Steps to restrict access: https://aws.amazon.com/blogs/security/how-to-restrict-amazon-s3-bucket-access-to-a-specific-iam-role/ - Start rclone config in the VM
rclone config
- In the config, perform the steps to create a new S3 remote.
- n – new remote
- s3-remote – name of the remote in rclone
- 3 Amazonzon S3 compliant storages
- 1 – AWS S3
- YOUR_AWS_KEY_ID
- YOUR_AWS_SECRET_KEY
- 1 – or whatever region you created your bucket in
- – leave blank
- 1 – will make uploaded objects private
- 1 – no encryption
- 5 – One Zone Infrequent Access storage class, very low price and good enough for backups
- y – yes this is okay
- Continue with the config to create a Google Drive remote.
- n – new remote
- drive-remote – name of the remote in rcloud
- 11 – Google Drive
- – leave blank
- – leave blank
- 2 – read-only access to drive storage
- – leave blank if you want to backup full drive
- – leave blank
- n – you are running headless machine
- open the link a in browser and copy the token, then enter the token
- n – not a team drive
- y – yes this is okay
- q – quit config
- Start the sync by simply running, or skip if you don’t want to sync now.
rclone sync drive-remote: s3-remote:your-bucket-name
- Now you can create a cron job to sync your drive daily. Edit cron by running
crontab -e
- add the following line in your cron file
0 0 * * * rclone sync --syslog drive-remote: s3-remote:your-bucket-name
- SSH into the console and install rclone by running
That’s it. You have successfully enabled drive backup.
You can also create other combinations for replication/backup from and to various services using rclone; currently, it supports:
- Amazon Drive
- Amazon S3
- Box
- DigitalOcean Spaces
- Dropbox
- FTP
- Google Cloud Storage
- Google Drive
- Microsoft Azure Blob Storage
- Microsoft OneDrive
- The local filesystem
- … and many more