Why Use Docker

Docker is a platform for developing, collaborating on, and distrubiting software. It allows you to run your software in a controlled, artificial operating system environment called a container. Containers are like light-weight virtual machines that only run the OS software needed to run your program(s), and not the whole OS. This builds off of makefile methods in that it allows your analysis to be run in a lightweight and contained environment with minimal dependency issues.

Getting Started with Docker (Tutorial)

The Docker website has some really helpful documentation and tutorials for helping you get started. Here we are going to do through an abridged version of that introductory information, while learning more about Docker in the process.

Installing Docker

Installing Docker is easy. Go here to install Docker for Mac and here to install Docker for Windows. Once we have Docker installed, we are going to want to check that it is working. That means it's time to open up that terminal!

Like git commands are run by first typing git followed by the desired git command (e.g. git commit), Docker commands are run by first typing docker followed by the desired command. We can easily try this out and confirm that the installation worked. For starters, we can check our Docker version.

docker version

The Basic Docker Architecture

At its most basic, Docker is working with two basic elements. These are images and containers. The formal definitions (from the glossary) are as follows.

Container

A container is a runtime instance of a docker image. A Docker container consists of:

Image

Docker images are the basis of containers. An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime.

Another way to define these units, is that "an image is a filesystem and parameters to use at runtime. It doesn’t have state and never changes. A container is a running instance of an image". Ultimately these are somewhat confusing concepts at first, so let's see how they work in an example.

Running The Whalesay Image

The Whalesay image is the standard tutorial image used by Docker. We will also use this as a learning example of how Docker works. To get started we can check out the Whalesay image on Dockerhub, which is where we are going to download the image from. The Whalesay Dockerhub site can be found here.

If we run a Whalesay container, Docker will automatically load in the approapriate image (more on these dependencies later). We can easily try it out ourselves.

docker run docker/whalesay cowsay boo

This command results in the following output (how cute!):

  _____
 < boo >
  -----
     \
      \
       \
                     ##        .
               ## ## ##       ==
            ## ## ## ##      ===
        /""""""""""""""""___/ ===
   ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
        \______ o          __/
         \    \        __/
           \____\______/

The idea here is that you can control what the whale says.

docker run docker/whalesay cowsay hello world
 _____________ 
< hello world >
 ------------- 
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/   

This is similar to the cowsay function found in Ubuntu. The idea here is that you are running this program as if it were a command line tool, much like you would run coway on Ubuntu as cowsay boo. The difference here is that instead of running the cowsay applicaiton on your own operating system, you are running it within the controlled Ubuntu environment.

Under The Hood

So what is really going on here? How was this simple application built? Well Docker images are built using Dockerfiles, which are like makefiles for building Docker images. These tell Docker what base OS to use, and what commands to run in that OS, such as packages to install or files to download. Let's take a look at the Dockerfile used for Whalesay.

FROM ubuntu:14.04

# install cowsay, and move the "default.cow" out of the way so we can overwrite it with "docker.cow"
RUN apt-get update && apt-get install -y cowsay --no-install-recommends && rm -rf /var/lib/apt/lists/* \
    && mv /usr/share/cowsay/cows/default.cow /usr/share/cowsay/cows/orig-default.cow

# "cowsay" installs to /usr/games
ENV PATH $PATH:/usr/games

COPY docker.cow /usr/share/cowsay/cows/
RUN ln -sv /usr/share/cowsay/cows/docker.cow /usr/share/cowsay/cows/default.cow

CMD ["cowsay"]

We can see that sort of looks like a shell script or a makefile in some ways. The file starts with using FROM to call the base image that we want to build from. In this case, we want to use the Ubuntu base image (Ubuntu 14.04 to be exact).

Once we have our base image set, which in this case is the basics of the Ubuntu OS, we can customize the environment by running commands to install packages just like we would in a normal Ubuntu environment. This is done using the RUN command. This will create a new layer of information over the existing image. In the case of our example image, we are using apt-get to update our existing packages, installing cowsay, and changing the main package filename so that it can be overwritten for whalesay.

The ENV command allows us to reset the PATH within the image OS.

The COPY command here (which could be replaced in this instance with the ADD command) is tranferring the new docer.cow file into the image from the host that built the image. This means that you will need the docker.cow file in your working directory if you want to rebuild this image using the Dockerfile (the Dockerfile is only used when building the image). You can find an example of what this working directory would look like here. This is then followed by another RUN command to make a symbolic link for docker.cow so that it runs by default.

Finally we see the CMD command function. This provides the instructions for running the container using the image, and each Dockerfile can only have one command. In this case the command contains the cowsay command, so it will automatically run cowsay when called. This is a little confusing at first, because you might ask why we need to type cowsay when we run the container. The answer is that, in this example image, we are not actually using the CMD function. When we type docker run cowsay hello world, we are actually creating a container using the Ubuntu environment and running cowsay hello world as if we type it into the Ubuntu terminal. This can be visualized by typing other basic bash commands after run. For example, if you want to see the contents of the working directoy within the image, you can list them out. The CMD function will only be run by default if nothing is included after run (there are more sophisticated uses but we will not go into that now).

docker run docker/whalesay ls
ChangeLog
INSTALL
LICENSE
MANIFEST
README
Wrap.pm.diff
cows
cowsay
cowsay.1
install.pl
install.sh
pgp_public_key.txt

Making A New Dockerfile

It is one thing to see a Dockerfile, but it really starts to make sense when you dive in and try building one yourself. One easy example to work through is adding functionality to the Whalesay image. We can start by moving to a new directory and opening a new Dockerfile to play with.

mkdir ./dockertut
cd ./dockertut/
touch Dockerfile

Once you make the empty file, go ahead and open it with your favorite text editor. I like to use Sublime Text, and it is especially nice because you can easily add Dockerfile syntax highlighting.

Like our example we want to being with specifying the base image that we are going to build from. In this case, we are going to add to the Whalesay image, so we will specify FROM docker/whalesay:latest. In this tutorial, we want the whale to think of its own words using the fortune program by adding RUN apt-get -y update && apt-get install -y fortunes. Finally, we want this to run as the container default so we will add the command CMD /usr/games/fortune -a | cowsay. The final Dockerfile should look like the following.

FROM docker/whalesay:latest
RUN apt-get -y update && apt-get install -y fortunes
CMD /usr/games/fortune -a | cowsay

At this point we want to save the Dockerfile and build the image. We will build the image using the build command and add a tag (shortcut) to make running it easier. We also need to specify that it should look in our current directory. It will take a little time to build.

docker build -t docker-whale .

Once we have the Docker image built, we have that image easily accessible in our local Docker environment. We can list out all of our images as follows.

docker images

Finally, we can run our image to see the result. Notice here that we do not have to specify any additional run conditions like we did above. This is because we have written the CMD such that it will automatically run the desired command by default in the new container.

docker run docker-whale

The output will be different everytime, because fortune always returns different statements.

 _______________________________________ 
/ I'm using my X-RAY VISION to obtain a \
| rare glimpse of the INNER WORKINGS of |
\ this POTATO!!                         /
 --------------------------------------- 
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/ 

 _____________________________________ 
/ Lavish spending can be disastrous.  \
\ Don't buy any lavishes for a while. /
 ------------------------------------- 
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/  

Interacting With Your Docker Images

Sometimes it can be helpful to interact with your docker image, similar to what you might expect from running a virtual machine. This is possible because Docker allows you to create "interactive containers" on your images. All you need to do is run an image (like the ubuntu image) using the -i flag for an interactive session and the -t flag to specify an unlimited session (until you exit). You also need to specify that you want bash at the end of the call. As a reminder, you can see the images you have ready using docker images. Go ahead and try it out. Type exit when you are done.

docker run -it ubuntu bash

Using Docker for Research & Analysis

So up to this point we have shown how we can use Docker images to run instances of programs as containers. There are Dockers images for pretty much anything these days, with one relevant example being BioContainers. This provides containers that allow you to run different versions programs like Blast or Bowtie2 without having to install the program, and without having to worry about other dependencies (it sucks when one installation quickly becomes ten). Another example is the R Docker image.

Another exmaple of how this could be used is to completely "contain" all of your analyses, including your makefile work. You could create an image with all of the programs you want to use, load the makefile (and other required files that you might need to start with, all in your github repo of course) into the image with ADD or COPY, and then have the CMD run the makefile. This takes the idea of makefiles and reproducible research further in that your analysis is an OS agnostic, self contained workflow. This is pretty powerful.

Finally, Docker could be used as a backend for developing analysis applicaitons. An example of this was my early steps for the metagenomic analysis workflow, which can be found here. This is very much a work in progress, but you can check it out and get an idea for how it could be applied.

Dockerhub

We are going to end with a summary of Dockerhub, which is a website that we only alluded to earlier. Dockerhub is like GitHub for Docker. It is a place to store your images, and to download other images. In fact, this is where we were getting all of our base images above. Dockerhub is build into the standard Docker program. Like GitHub, you can push and pull images easily.

As you can imagine, GitHub and Dockerhub would work well together. Dockerhub can manage the images and GutHub can manage the underlying code. It turns out that there are already easy yet powerful tools for linking Dockerhub to GitHub. While working on the Docker images in GitHub, you can set it up so that the Docker images will be automatically built from the repo. This is a powerful tool for managing both code and the image results of that code.