I wrote this when I was the Digital Library Applications Lead at the University of Notre Dame. It was archived from http://www3.nd.edu/~dbrubak1/ which no longer exists.

Infrastructure As Code

Our goal is to have consistency across development, staging, and production environments. The first step in achieving that goal is to systematize our environment provisioning and management processes.

Consolidated vs. Distributed Services

When all the required software is packaged together it reduces resource overhead and lowers startup time by removing the complexity of a distributed cluster of services. This makes it ideal for development and staging environments. An environment that contains all the supporting software to run an instance of a Hydra is considered a “FatVM”.

However, it is hard to scale and tune the entire system with all the software intertwined. The most flexible way to address our rapidly changing needs is to extract and cluster services of similar types. As we identify bottlenecks in our infrastructure we can spool up nodes of the needed type. Each node will focus on one type of task: ingest of material, dissemination of a content type, serving a single Rails application, etc. A tuned, single-purpose environment like this is a “ThinVM”.

The Vision

A centrally-managed chef instance will coordinate the provisioning and updating of thin and fat VMs across environments. Pre-provisioned vagrant base boxes will be built automatically as updates to the chef recipes are made. Staging, pre-production, and production VMs will be able to be spooled up quickly for any given role.

Proposed Tools

With a centralized provisioning system we can programmatically define software stacks needed to build specialized environments. We should be able to use the same provisioning manifests (Chef calls them cookbooks) to provision VMs in a VM farm as well as build base boxes for Vagrant for use in development. The full toolchain will look something like this:

Some of these tools may not be absolutely necessary (such as Librarian). There may be other approaches worth investigating as well.1

Application-Centric Configuration

By defining the Cheffile, Procfile, and Vagrantfile at the root of each application we can treat application dependencies, runtime process needs, and development environment configuration as a part of the application source code. Together they provide an executable runbook that can be interpreted by Librarian, Foreman, and Vagrant respectively.

It is possible to create both “ThinVM” and “FatVM” environments with Vagrant for local testing. The Vagrantfile can provision multiple VMs to approximate the diverse infrastructure we are building for production services. For example, in order to test the workflow project it can provision a Redis VM, a Fedora VM, a Solr VM, a VM for the application itself, and a VM for workers. That sounds like a lot of overhead but it should relieve the need for installing and configuring all of those software stacks directly on developer machines.

Documentation for using all of these tools concurrently is sparse. But there have been some examples of part of the process.

The Hydra Stack

The full Hydra application stack is rather complicated. For an administrative application that can ingest content into Fedora, search holdings via Solr, and present information from Fedora to a user two complete application stacks as well as several auxiliary pieces of software are needed.

  • Java and a Java web server: Jetty, Tomcat, JBoss, etc.
  • Ruby and a Ruby web server: Thin, Puma, Unicorn, etc.
    • Rails application for the user interface
    • Heracles workflow manager (also a Rails application) to govern the work queue
    • Heracles worker processes (Ruby daemons) for ingesting specific content types into Fedora
  • Postgres for non-archival data used by both Rails Applications
  • Redis for the Heracles queue and for in-memory caching
  • nginx as a reverse proxy for all web-facing applications
    • Fedora wrapper to allow for a service external to Fedora to provide authentication and authorization for access
    • Reverse proxy for Rails applications
    • Directly serves static assets used by Rails applications

Starting the Process

The first deliverable is a VeeWee-built Vagrant base box that is further provisioned by chef recipes executed by chef solo.

The second deliverable is a KVM VM for a staging environment where the previously written chef recipes can be successfully executed.

Required Reading

  1. Heroku’s buildpacks sound interesting but may be limited to their proprietary service architecture. ↩︎