MedStack’s goal for backups is to automatically back up all customer data in order to provide the ability to restore data in the case of an unforeseen event such as accidental deletion, data corruption, or a security breach.
Although a high-availability design such as real-time database replication can provide protection against downtime, it can’t protect against many scenarios where backups are useful. For example, if data is accidentally deleted, the deletion may propagate to all replicas before the problem is detected. Therefore, complete and regular backups are critical to reliability, disaster recovery, and security. And yet good backups are difficult to achieve even with industry-standard tools.
MedStack’s target is to provide backups that can be restored to the following points in time:
- Hourly, for the last 24 hours
- Daily, for the last week
- Weekly, for the last month
- Monthly, for an indefinite period
There are three main types of backup: snapshot, incremental, and continuous. A snapshot is a complete copy of a disk or database at a point in time, an incremental backup has only the changes since the last snapshot, and continuous backs up every change as it’s made. Our goal is to allow restore from the nearest hour over the last 24 hours, and daily/weekly/monthly for progressively further back in time.
Most standard cloud backup solutions, such as Azure Backup, separate out support for disks vs databases. In order to back up a database they use a database-specific method which requires a different implementation for each database (such as MySQL vs PostgreSQL) and even for new versions of the same database. This incurs a significant ongoing development effort. What’s more, with Docker, there is an exponentially larger number of database and version variations that a developer can easily install and use.
For disk backups, most cloud systems (such as Azure Backup) use a method that freezes the disk for the time it takes to make a copy of the disk. This incurs downtime on the app every time the backup runs, so they are usually only run daily. This type of daily maintenance takes a significant hit on the total Service Level.
We have been continually working on and improving on our backup system since the inception of MedStack, and with MedStack Control we have introduced a flexible system based on the ZFS filesystem released by Sun Microsystems in 2005. ZFS uses a data storage technique called copy-on-write that allows us to snapshot the entire disk or any part of it instantly (with no downtime). We use this to take daily and hourly snapshot backups of the entire disk, including the databases that live on the disk.
We undertook significant development to implement this new backup model, and with it in place, we are able to provide a flexible backup system that can back up any database or file type without needing to know what kind of data is on the disk. With the new MedStack backup system, the only requirement that we have to place on the developer is that the infrastructure be fast enough to copy the amount of data on the disk over to the backup storage in an hour. Backups start automatically whenever a node is created and support all databases and types of data.