Docker Registry v2: Adventures in Ambiguity

All I need is a private Docker registry that I can host myself.

If you're anything like me, you've been excitedly awaiting the release of the v2.0 Docker Registry. Version 1 was not very good. The company behind Docker is in no hurry to bite the hand that feeds them, and so development of the registry has been spotty at best. Among other things, the documentation is not great and the registry has no built-in authentication protocol. I understand that it's much better for business to get people frustrated with setting up their own private registry and then point them at your hosted services, where it's very easy to write a check to have someone else take care of this mess for you. But I am not in the habit of writing checks, and my check would probably bounce anyways.

The documentation available for deploying a v2 registry is specific to one situation. It is a set of instructions for using Compose (yet another Docker technology with a seemingly nebulous purpose at this time) to get both a v1 and a v2 registry working behind an Nginx proxy. But as I am using the Nginx Docker reverse proxy by Jason Wilder, so I don't need to bring in an external Nginx server. Nor do I need to answer requests for a v1 registry, as I am not using any Docker clients earlier than version 1.6.0.

All I need is a private Docker registry that I can host myself.

So... What we need to do is rip out all the extra stuff so that we're left with what we need. We don't need a v1 registry, so ignore all that. We're using jwilder/nginx-proxy to proxy our inbound requests, so ignore the instructions about pulling in the Nginx server. The average docker user right now doesn't really know what Compose is for or what it does -- though it will reduce complexity for most of us some day in the future -- so just ignore all the cruft about Compose. We're left with something close to what we're looking for.

First, clone the Distribution repository from Github and change into that directory:

git clone https://github.com/docker/distribution && cd distribution

We'll build our registry server from this repository:

docker build -t=registryv2 .

Now our container will build and should be listed in the output of docker images. We're almost there, but first we need to configure the registry. The future may include something I've talked about before called the docker vault, which is a cryptographically-secure, in-container, ephemeral storage mechanism which holds our sensitive configuration data. But for the moment, we don't have access to the vault because it doesn't exist. Here, we're going to have to rely on storing configuration data in a file, and then mount a volume on the host which exposes our config file to the container.

The v2 registry currently reads configuration data from cmd/registry/config.yml, so we need to map a directory on the host to this directory in the container. If you're trying to configure a v2 registry, just totally forget about everything related to configuring a v1 registry. The new configuration options are in the documentation. I'll include my own sanitized debug mode configuration file so you've got a sanity check reference:

version 0.1
log:
    level: debug
    fields:
        service: registry
        environment: staging
storage:
    filesystem:
        rootdirectory: /registry
    cache:
        layerinfo: inmemory
http:
    addr: :5000
    secret: somerandomstring
    debug:
        addr: localhost:5001

This is all pretty self-explanatory, if you've played with running any version of a docker registry. I'm running this on tcp/5000 during development for testing purposes. We are also running the server to listen on tcp/5001 (on local loopback only) if we need to connect and get some verbose debug information.

Now for production mode configuration, I use a different setup, which is more like something you'd expect to see out there in the real world:

version 0.1
log:
    level: info
    fields:
        service: registry
        environment: staging
storage:
    s3:
        accesskey: AKIA0Z6307DRPWJ5VH03F
        secretkey: OgP2Yhk1ZjFFf+aYokvnqI3qTlenCxSW2nbb9zpB
        region: us-west-1
        bucket: example.com-docker-registry-v2
        encrypt: false
        secure: true
        v4auth: true
        chunksize: 5242880
        rootdirectory: /registryv2
    cache:
        layerinfo: redis
http:
    addr: :443
    secret: ZpAedwVDFHK7mkNFFKSP8OQY
    debug:
        addr: localhost:5001
redis:
    addr: localhost:6379
    db: 0

Here, we're setting our registry to use an AWS S3 backend (that configuration data is of course dummy data, but feel free to try it, leet). We're also using a Redis container for caching, which speeds things up considerably. Again, the jwilder/nginx-proxy container auto-detects which port is exposed on a container, and I want this registry to listen on HTTPS (tcp/443), so I've changed its listen port appropriately.

So, because our registry is going to use Redis for caching, we need to spin up a Redis instance and link it to our registry container. Real quick, let's pull that Redis image:

docker pull sameersbn/redis

Now run it:

docker run -d --name="registryv2-redis" --restart="always" sameersbn/redis

And when I run my registry container, it looks something like this:

#!/bin/bash
docker run -d \
  --name="registryv2" \
  --restart="always" \
  --link registryv2-redis:redis \
  -v /var/docker/registryv2/config:/go/src/github.com/docker/distribution/cmd/registry \
  registryv2

Finally, it's not clear in the documentation, but to create a repository on your new v2 registry, you've got to tag your images correctly. Let's suppose for a second you have just created an image called myContainer and you'd like to create that on your new v2 registry:

docker tag myContainer:latest registry.example.com:443/myContainer:latest

This command tags your container not only with the latest tag, but also specifies exactly which registry you want to use for your new repository. Now you can push this image to your new v2 registry. Enjoy!

Sensitive Configuration Data: fs.readFile() or process.env()

The thoughtful engineering behind the Twelve-Factor App design process has pushed environment variables as the place for sensitive configuration data such that it's somewhat of a standard these days. I have spent a couple of years now using environment variables to store sensitive configuration data in order to compare it to using time-tested configuration files. I think finally I have decided that configuration files are the way forward.

This gist summarizes fairly well the pros and cons of using environment variables. If you aren't familiar with the argument, or haven't given it much thought, I recommend you take a minute to read it over.

When I first heard of using environment variables for sensitive data, I was intrigued, but also initially skeptical. I've been doing this stuff for years and years. If it ain't broke, don't fix it, but... I'm always in the market for better solutions, so I didn't want to reject it forthright. I wanted to give it time, really roll the concepts around in my head (and my infrastructure) before making the call.

A lot of the opposition to config files comes from arguing that a deployment should scrub the sensitive data out of the environment once it's finished using it. However, in my experience environment variables are unintentionally "leakier" than files on the filesystem. Even if you scrub the environment, this data could live on in process lists and temporary files and other little nooks and crannies which are hard (or impossible) to get to, especially for the average person deploying things. With files, there's no question where your sensitive data lives, and this makes it easier to handle and to quarantine, even if only conceptually.

My specific situation is with loading docker containers with the sensitive data, and there's been some intriguing discussion among core devs about a kind of "docker vault" in #docker-dev on IRC. The mechanic they want to invent is a cryptographically-secure, ephemeral storage area where sensitive configuration data can be injected into a container, used, then destroyed. This will solve a lot of the issues associated traditionally with this specific problem. However until the time that arrives, I'm sticking with configuration files.

An Excellent Talk

Hello there! This talk was posted on HN the other day, and I think it's excellent enough to post here.

Skill Progression

There's a fascinating post I read recently that talks about skill progression, and how it relates to your perception of your own skill level, your frustration, your perception of the skill itself, and your actual abilities in a skill. I could talk on and on about how this relates to this skill, or that, and how to maximize the upward trend. But I won't do nearly as good a job explaining as this image from the original post:

Isn't that amazing insight? You can abstract this graph to represent virtually any technical skill you can think of. Replace the word "painting" with "programming" or "playing chess". Almost any technical skill fits this progression model.

Title II

I am still in a state of shock over Tom Wheeler's comments over at Wired. It looks like he's going to recommend essentially what we all thought was impossible: Title II regulations for broadband providers, including mobile broadband providers!

At first blush, the situation seems dire. Tom Wheeler was a top cable industry lobbyist before he took the position of chairman of the FCC. It doesn't take a lot of explaining for anyone to see that this is a pretty clear conflict of interest. It's as insane as Dick Cheney having to quit his job as CEO of Halliburton in order to become the Vice President of the United States. It shouldn't surprise anyone that the next eight years were full of absurd government spending on no-bid contracts given to -- naturally -- Halliburton. So it seemed the future of the internet would be like a Comedy Central show whose humor is based solely on schadenfreude.

However... Something happened. I'm intensely curious about the reason Chairman Wheeler decided to bring Title II to broadband. Because whatever it is, it's miraculous. He's almost certainly slammed the revolving door shut on himself. He's single-handedly responsible for recommending Title II for broadband providers, and for him to include mobile providers (i.e. cellphone network providers) as covered under Title II regulations is a huge blow to the zero-ethics, sociopath entities like Time Warner Cable, Comcast, and Verizon, who fight so hard to screw consumers for whatever profit they can manage.

It almost seems too good to be true. We aren't given too many things in our lifetime that are "too good to be true". I'm still cautious and thinking about what this could mean. I want to take it at face value and rejoice in what seems to be profound evidence that people in our government still care about protecting its citizen consumers. We'll just have to wait and see. We've still go to get this recommendation past the rest of the committee (which seems likely, despite there almost certainly being dissent among some members). The vote is February 26th. There are also some very smart net neutrality folks looking into the future at the possibility that Congressional Republicans will try to revoke the Telecommunications Act, thereby skirting the FCC as an entity with the power to create and enforce these kinds of regulation.

For the moment, however, the future looks bright.