Databases – To Containerize or Not To Containerize?

blue white orange and brown container van

Photo by Pixabay on Pexels.com

The microservices architecture is an increasingly popular application architecture approach which can improve time to market by structuring an application as a collection of loosely coupled services, which implement specific business capabilities. While microservices and containers are not equivalent concepts (a microservices architecture can be achieved through different deployment mechanisms), containers are the most logical approach because of their characteristics.

Coming from a data background, I was interested in how databases fit into a containerized microservices world. Following the microservices architecture, each microservice or business capability should keep its own data private or separate from that of another microservice. That doesn’t mean you need a separate database for each microservice – you can achieve this separation with distinct unrelated tables or separate schemas in the same database for each service. A one database per service approach allows more flexibility in that you can choose the right data store for the job and not have any changes by one service potentially impact another, though it involves more overhead.

However, whether you choose to isolate each service’s data at the table, schema, or database level,  a more basic question is should the database itself be containerized? The most significant challenge around containerizing a database is that containers were originally designed to be stateless – persistent state or storage does not survive when the container is not running. Some mechanism is required to maintain persistent storage for the database when a container is moved or shut down/restarted.

stateless

In some cases, say for test or development environments, persistence may not be a big issue. But for production workloads were you cannot lose any data, you definitely need a way to persist the data across container restarts, and probably a backup/high availability strategy for the storage system on the underlying server if it fails.

There are a few options for handling this potential challenge in the context of building a microservices-based applications with containers. The first approach is simply to not containerize the database. This works really well in a public cloud context where you have SaaS cloud data services that you easily spin up, require no management of the hardware and little to no software administration, and in many cases have high availability built in. The diagram below describes a storefront shopping application built on the IBM Cloud using Kubernetes and Docker and bound to instances of Elasticsearch, Cloudant, and MySQL cloud data services (link to detail).

 

If you decide to containerize the database after, you need to leverage a mechanism to persist the data store’s associated data outside of the container, either on the container host’s file system or an external file system/storage area. The Docker documentation highlights volumes as the “preferred mechanism for persisting data generated by and used by Docker containers”. Quick summary of volumes:

  • A volume can be created by Docker at container creation time or afterward, or an existing volume can be mounted into a container
  • A host volume is created in the /var/lib/docker/volumes/ directory on the host machine, which is managed by Docker; this directory is what is mounted into the container
  • Volumes can be shared between multiple containers, although containers writing to the same volume at the same time without corrupting data is not handled automatically

An example of a containerized database leveraging volumes is Db2 Warehouse, which mounts a host file system on /mnt/clusterfs in one or more Db2 Warehouse containers. Db2 Warehouse can be deployed in an MPP or distributed flavour, which requires a POSIX-compliant cluster file system. A diagram of what an MPP deployment leveraging IBM Spectrum Scale (GPFS) file system is below.

Screen Shot 2018-08-24 at 2.29.10 PM

Another approach to enable usage of an external file system is to use volume drivers/plugins. These plugins abstract the application logic from the external storage systems and can provide additional functionality, such as high availability, backups, and data encryption for your data. For example, Flocker is a plugin that enables volumes to follow your containers when they move between hosts in your cluster. VMware vSphere Storage Plugin enables running of containers backed by storage in a vSphere environment.

Whichever approach you take, the bottom line is don’t forget the usual considerations for database deployment such as performance, availability, security when considering how to handle data storage in a container world.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s