Anyone who runs an enterprise WordPress application knows there are significant challenges and considerations to copying production data. No matter that your production data runs to tens or hundreds of thousands of articles, contains sensitive “live” data, and is accompanied by tens of gigabytes of images, often you need a complete copy of that production data to test new functionality or to reproduce a persnickety bug.
Today we’re pleased to announce a speedy, streamlined, and structured Data Sync process for VIP clients. This is a step in our larger effort to make copying large amounts of production data entirely self-service, which we will also be rolling out soon. In the meantime, and even after self-service becomes available, we are happy to sync data on behalf of our clients.
Read on for details on how our new process works.
As light as a feather
Copying data must never affect the operation of the production site. It cannot place load on the database or impact performance in any way. To remove the impact on our production servers we hook into our backup mechanism, and use the hourly backup data we keep for all production sites.
Fast, complete, and working data
For the large datasets we expect from many of our clients, copying everything over can take a long time and the subsequent operations on the data can take even longer. Our Data Sync completely replicates their production data and we wanted the operation to be as fast as possible.
To sync the data we use the reliable and well tested functionality of our backup systems. Our backups are fast to restore, and have complete internal integrity, e.g. no partly completed data operations, making them ideal for this purpose.
As well as restoring the data, we need to replace any URLs using the production domain with URLs for the new non-production environment. Traditionally this is done using the WP-CLI tool, which provides a command line interface and tools for managing a WordPress install. While this works for the majority of WordPress sites out there, this method is simply too slow for the massive datasets typically used by a high scale WordPress.com VIP client. The slowdowns are caused by the interactions between PHP and the database layer – many hundreds, thousands, or tens of thousands of reads and writes will necessarily take some time!
To replace the URLs in the data at the speed VIP customers demand, our team wrote a Golang script, “go-search-replace“. In our tests, go-search-replace is at least forty times quicker than the equivalent search and replace using WP-CLI commands, reducing operations which took many hours to minutes at most. (We apologize if you were expecting to kick back with a long and refreshing beverage during the Data Sync.)
Massive media libraries
Of course the database is just one part of the story. Many WordPress sites we host include tens, even hundreds, of gigabytes of data and hundreds of thousands of files on our VIP Go Files Service. Copying such a significant amount of data would take many hours. Instead our cloud platform provides a service we call UnionFS.
UnionFS works by making the files for the production site available to all non-production sites in read-only mode. Files shared by UnionFS in this way are served from the same infrastructure and have the same caching rules applied.
Tailored to your WordPress application
Production data often includes connections to APIs and services that should not be active in non-production environments, such as API keys for live payment gateways and connections to mailing lists. To ensure you have confidence in the data, and to be sure you get the same results every time, we provide a WordPress action hook so your code can swap API keys, clear production orders, and any other custom operations that are specific to your WordPress application.
How do I try this?
As we finalize everything that will make this process fully self-service, we will continue to support VIP client Data Sync needs as they arise. If you want the data from your VIP Go WordPress site copied into a non-production environment, please contact our support team and we will be happy to help.
You can read more in our Data Sync documentation.