In my previous post, I glossed over one of the most important and useful parts of Immutable Infrastructure: Testability. There are many kinds of tests we can write for our infrastructure, but they should all be focused on the machine/service and maybe it’s nearest dependencies, not the entire system.
While this post focuses on testing a full machine (both locally in a VM, and remotely as an Amazon EC2 instance), it is also possible to do most of the same kind of tests against a Docker container. In fact, one of the tools used in this post supports building Docker containers as an output in parallel to the AMIs, so this can also assist in providing a migration path to/from Docker.
As an example, I will show how I built and tested a LogStash machine, including how to verify that the script to create the production machine is valid, that the machine itself has been provisioned correctly, and that the services inside work as expected.
I have published all the source code to GitHub. The examples in this post are all taken from the repository but might have a few bits removed just for readability. Check the full source out if you are interested!
Repository Structure and Tools
When it comes to building anything that you will have lots of, consistency is key to making it manageable. To that end, I have a small selection of tools that I use, and a repository structure I try and stick to. They are the following:
Vagrant - This is a tool for building and managing virtual machines. It can be backed by many different providers such as Docker, HyperV and VirtualBox. We’ll use this to build a local Linux machine to develop and test LogStash in. I use the HyperV provisioner, as that is what Docker For Windows also uses, and HyperV disables other virtualisation tools.
Packer - This tool provides a way to build machine images. Where Vagrant builds running machines, Packer builds the base images for you to boot, and can build multiple different ones (in parallel) from one configuration. We’ll use this to create our AMIs (Amazon Machine Images.)
Jest - This is a testing framework written in (and for) NodeJS applications. Whatever testing tool works best for your environment is what you should be using, but I use Jest as it introduces minimal dependencies, is cross-platform, and has some useful libraries for doing things like diffing json.
The repository structure is pretty simple:
- scripts/
- src/
- test/
- build.sh
- logstash.json
- package.json
- vagrantfile
The src
directory is where our application code will live. If the application is compiled, the output goes to the build
directory (which is not tracked in source-control.) The test
directory will contain all of our tests, and the scripts
directory will contain everything needed for provisioning our machines.
We’ll describe what the use of each of these files is as we go through the next section.
Local Development
To create our virtual machine locally, we will use Vagrant. To tell Vagrant how to build our machine, we need to create a vagrantfile
in our repository, which will contain the machine details and provisioning steps.
The machine itself has a name, CPU count, and memory specified. There is also a setting for Hyper-V which allows us to use a differencing disk, which reduces the startup time for the VM, and how much disk space it uses on the host machine.
For provisioning, we specify to run the relevant two files from the scripts
directory.
Vagrant.configure("2") do |config|
config.vm.box = "bento/ubuntu-16.04"
config.vm.provider "hyperv" do |hv|
hv.vmname = "LogStash"
hv.cpus = 1
hv.memory = 2048
hv.linked_clone = true
end
config.vm.provision "shell", path: "./scripts/provision.sh"
config.vm.provision "shell", path: "./scripts/vagrant.sh"
end
To keep things as similar as possible between our development machine and our output AMI, I keep as much of the setup script in one file: scripts/provision.sh
. In the case of our LogStash setup, this means installing Java, LogStash, some LogStash plugins, and enabling the service on reboots:
#! /bin/bash
# add elastic's package repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
sudo apt-get update
# install openjdk and set environment variable
sudo apt-get install openjdk-8-jre -y
JAVA=$(readlink -f $(which java) | sed "s:bin/java::")
echo "JAVA_HOME=$JAVA" | sudo tee --append /etc/environment
#install logstash and plugins
sudo apt-get install logstash -y
/usr/share/logstash/bin/logstash-plugin install logstash-filter-uuid
/usr/share/logstash/bin/logstash-plugin install logstash-filter-prune
sudo systemctl enable logstash.service
Vagrant will automatically mount it’s working directory into the VM under the path /vagrant
. This means we can add a second provisioning script (scripts/vagrant.sh
) to link the /vagrant/src
directory to the LogStash configuration directory (/etc/logstash/conf.d
), meaning we can edit the files on the host machine, and then restart LogStash to pick up the changes.
#! /bin/bash
sudo rm -rf /etc/logstash/conf.d
sudo ln -s /vagrant/src /etc/logstash/conf.d
sudo systemctl start logstash.service
Now that we have a vagrantfile
, we can start the virtual machine with a single command. Note, Hyper-V requires administrator privileges, so you need to run this command in an admin terminal:
vagrant up
After a while, your new LogStash machine will be up and running. If you want to log into the machine and check files an processes etc., you can run the following command:
vagrant ssh
An argument can also be provided to the ssh
command to be executed inside the VM, which is how I usually trigger LogStash restarts (as it doesn’t seem to detect when I save the config files in the src
directory):
vagrant ssh -c 'sudo systemctl restart logstash'
Deployment
To create the deployable machine image, I use Packer. The process is very similar to how Vagrant is used: select a base AMI, create a new EC2 machine, provision it, and save the result as a new AMI.
Packer is configured with a single json file, in this case, named logstash.json
. The file is split into four parts: variables
, builders
, provisioners
, and outputs
. I won’t include the outputs
section as it’s not needed when building AMIs.
Variables
The variables
property is for all configuration that you can pass to Packer. Their values can come from Environment Variables, CLI parameters, Consul, Vault, and others. In the LogStash example, there are three variables:
{
"variables": {
"aws_access_key": "",
"aws_secret_key": "",
"ami_users": "{{env `AMI_ACCOUNTS`}}"
}
}
The aws_access_key
and aws_secret_key
are known names - unless we specify some value, they will automatically be read from your AWS config (in ~/.aws/
), or if running on EC2, from the EC2 machine profile.
The ami_users
is a custom variable which will read the AMI_ACCOUNTS
environment variable by default. This particular one is used so that I can grant access to the resulting AMI to multiple AWS accounts, which is useful if you’re running in an Organisation with multiple Accounts. For example, if the AMI is built in a common
account, and will be deployed into dev
, qa
and prod
accounts, then you would populate the AMI_ACCOUNTS
as a CSV of account IDs.
Builders
Packer can build many different kinds of machine image, but for this, we only need one: amazon-ebs
.
{
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "eu-west-1",
"instance_type": "t2.micro",
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "ubuntu/images/*ubuntu-xenial-16.04-amd64-server-*",
"root-device-type": "ebs"
},
"owners": ["099720109477"],
"most_recent": true
},
"ssh_username": "ubuntu",
"ami_name": "logstash {{timestamp}}",
"ami_users": "{{user `ami_users`}}"
},
]
}
The two most interesting properties of this are source_ami_filter
and ami_users
. The source_ami_filter
works in a very similar manner to the AWS CLI’s describe-images
--filters
parameter, albeit in a more readable format. In this case, I am specifying that I want an ubuntu-xenial
base, and I want it to be an official Canonical image, so specify their Account ID as the owner
. I also specify the most_recent
property, as this filter will return all versions of this AMI which Canonical publish.
The ami_users
is what lets me grant access to the AMI from other accounts (rather than just making it public). The property’s value should be an array, but Packer is smart enough to expand the CSV in the user variable into an array for us.
Provisioners
The provisioners
array items are executed in the order they are specified. To set up the machine, I use the shell
provisioner to create a temporary directory, then the file
provisioner to upload the files in the src
directory to that temporary directory. Finally a second shell
provisioner uploads and runs the scripts/provision.sh
and scripts/aws.sh
files.
{
"provisioners": [
{
"type": "shell",
"inline": "mkdir -p /tmp/src"
},
{
"type": "file",
"source": "./src/",
"destination": "/tmp/src"
},
{
"type": "shell",
"scripts": ["./scripts/provision.sh", "./scripts/aws.sh"]
}
]
}
The aws.sh
file is very small and does roughly the same thing as the vagrant.sh
script, but rather than symlinking the /vagrant
directory, it moves the uploaded src
directory into the right location for LogStash:
#! /bin/sh
sudo rm /etc/logstash/conf.d/*
sudo cp -r /tmp/src/* /etc/logstash/conf.d
Note that this doesn’t start the LogStash service - this gets done by the UserData when we launch a new instance, as often we need to pass in additional configuration parameters, and don’t want the service running until that has been done.
Running
To create the AMI, we need to invoke packer. If I am running packer on a remote machine via SSH, I run it inside tmux
, so that disconnects don’t fail the process:
packer build -var "ami_users=111,222,333" logstash.json
After a while, Packer will finish, leaving you with an output which will include the new AMI ID:
==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
eu-west-1: ami-123123123
We’ll get back to this output later when we create a build script that will also run our tests. Before we get to that, however, let’s look at how we can write tests which target both the local Vagrant machine and the AMI too.
Testing
To test the machines, I am using Jest. There isn’t anything particularly interesting going on in the package.json
, other than a few babel packages being installed so that I can use ES6 syntax:
{
"scripts": {
"watch": "jest --watch",
"test": "jest "
},
"devDependencies": {
"babel-core": "^6.26.3",
"babel-jest": "^23.6.0",
"babel-preset-env": "^1.7.0",
"jest": "^23.6.0",
"regenerator-runtime": "^0.13.1"
}
}
Packer Configuration Testing
There are a number of tests we can do to make sure our Packer configuration is valid before running it. This includes things like checking the base AMI is from a whitelisted source (such as our accounts, Amazon and Canonical). The test has to handle the possibility of multiple builders, and that some builders might not have a source_ami_filter
. It also handles if no owner has been specified at all, which we also consider a “bad thing”:
const ourAccounts = [ "111111", "222222", "333333", "444444" ];
const otherOwners = [ "amazon", "099720109477" /*canonical*/ ];
describe("ami builder", () => {
it("should be based on a whitelisted owner", () => {
const allOwners = ourAccounts.concat(otherOwners);
const invalidOwners = owners => owners.filter(owner => !allOwners.includes(owner));
const amisWithInvalidOwners = packer.builders
.filter(builder => builder.source_ami_filter)
.map(builder => ({
name: builderName(builder),
invalidOwners: invalidOwners(builder.source_ami_filter.owners || [ "NO OWNER SPECIFIED" ])
}))
.filter(builders => builders.invalidOwners.length > 0);
expect(amisWithInvalidOwners).toEqual([]);
});
});
I also test that certain variables (ami_users
) have been defined, and have been used in the right place:
describe("variables", () => {
it("should have a variable for who can use the ami", () => {
expect(packer.variables).toHaveProperty("ami_users");
});
it("should read ami_users from AMI_ACCOUNTS", () => {
expect(packer.variables.ami_users).toMatch(
/{{\s*env\s*`AMI_ACCOUNTS`\s*}}/
);
});
});
describe("ami builder", () => {
it("should set the ami_user", () => {
const invalidUsers = packer.builders
.map(builder => ({
name: builderName(builder),
users: builder.ami_users || "NO USERS SPECIFIED"
}))
.filter(ami => !ami.users.match(/{{\s*user\s*`ami_users`\s*}}/));
expect(invalidUsers).toEqual([]);
});
})
Other tests you might want to add are that the base AMI is under a certain age, or that your AMI has certain tags included, or that it is named in a specific manner.
Machine Testing
Machine testing is for checking that our provisioning worked successfully. This is very useful, as subtle bugs can creep in when you don’t verify what happens.
For example, a machine I built copied configuration directory to a target location but was missing the -r
flag, so when I later added a subdirectory, the machine failed as the referenced files didn’t exist.
So that the tests work with both the Vagrant and Packer built versions, we take in their address and key paths from the environment:
import { spawnSync } from "child_process";
import { createConnection } from "net";
// figure out where to look these up
const host = process.env.LOGSTASH_ADDRESS; // e.g. "172.27.48.28";
const keyPath = process.env.LOGSTASH_KEYPATH; // ".vagrant/machines/default/hyperv/private_key";
We also define two helper methods: one to check if a TCP port is open, and one which uses SSH to execute a command and read the response in the machine:
const execute = command => {
const args = [`vagrant@${host}`, `-i`, keyPath, command];
const ssh = spawnSync("ssh", args, { encoding: "utf8" });
const lines = ssh.stdout.split("\n");
if (lines[lines.length - 1] === "") {
return lines.slice(0, lines.length - 1);
}
return lines;
};
const testPort = port => new Promise((resolve, reject) => {
const client = createConnection({ host: host, port: port });
client.on("error", err => reject(err));
client.on("connect", () => {
client.end();
resolve();
});
});
We can then add some tests which check the files were written to the right place, that port 5044
is open, and port 9600
is closed:
describe("the machine", () => {
it("should have the correct configuration", () => {
const files = execute("find /etc/logstash/conf.d/* -type f");
expect(files).toEqual([
"/etc/logstash/conf.d/beats.conf",
"/etc/logstash/conf.d/patterns/custom.txt"
]);
});
it("should be listening on 5044 for beats", () => testPort(5044));
it("should not be listening on 9600", () => expect(testPort(9600)).rejects.toThrow("ECONNREFUSED"));
});
Of course, as we can execute any command inside the machine, we can check pretty much anything:
tail
the LogStash log and see if it’s got the right contents- check if the service is started
- check the service is enabled on boot
- check the environment variables been written to the right files
Application Testing
There are two styles of Application Testing: white-box and black-box. White-box will be tests run on the application inside the machine, using minimal external dependencies (preferably none at all), and Black-box will be run on the application from outside the machine, either using direct dependencies, or fakes.
It’s worth noting that both white-box and black-box tests are slow, mostly down to how slow LogStash is at starting up, although only giving it 1 CPU and 2Gb of RAM probably doesn’t help.
Whitebox Testing LogStash
To white-box test LogStash, I use a technique partially based on the Agolo LogStash Test Runner. The process for the tests is to run LogStash interactively (rather than as a service), send it a single event, record the output events, and compare them to an expected output.
The test cases are kept in separate folders, with two files. First is the input file, imaginatively called input.log
, which will contain one json encoded event per line. The format needs to match what the result of FileBeat sending an event to LogStash would be. In this case, it means a few extra fields, and a message
property containing a string of json. Formatted for readability, the object looks like this:
{
"@timestamp": "2018-12-27T14:08:24.753Z",
"beat": { "hostname": "Spectre", "name": "Spectre", "version": "5.3.0" },
"fields": { "environment": "local", "log_type": "application" },
"input_type": "log",
"message": "{\"Timestamp\": \"2018-12-18T17:06:27.7112297+02:00\",\"Level\": \"Information\",\"MessageTemplate\": \"This is the {count} message\",\"Properties\": {\"count\": 4,\"SourceContext\": \"LogLines.GetOpenPurchasesHandler\",\"ApplicationName\": \"FileBeatTest\",\"CorrelationId\": \"8f341e8e-6b9c-4ebf-816d-d89c014bad90\",\"TimedOperationElapsedInMs\": 1000}}",
"offset": 318,
"source": "D:\\tmp\\logs\\single.log",
"type": "applicationlog"
}
I also define an output.log
, which contains the expected result(s), again one json encoded event per line. The example pipeline in the repository will emit two events for a given input, so this file contains two lines of json (again, newlines added for readability here):
{
"source": "D:\\tmp\\logs\\single.log",
"@version": "1",
"fields": { "log_type": "application", "environment": "local" },
"@timestamp": "2018-12-18T15:06:27.711Z",
"offset": 318,
"ApplicationName": "FileBeatTest",
"host": "ubuntu-16",
"type": "applicationlog",
"CorrelationId": "8f341e8e-6b9c-4ebf-816d-d89c014bad90",
"MessageTemplate": "This is the {count} message",
"Level": "Information",
"Context": "LogLines.GetOpenPurchasesHandler",
"TimeElapsed": 1000,
"Properties": { "count": 4 }
}
{
"duration": 1000000,
"timestamp": 1545145586711000,
"id": "<generated>",
"traceid": "8f341e8e6b9c4ebf816dd89c014bad90",
"name": "LogLines.GetOpenPurchasesHandler",
"localEndpoint": { "serviceName": "FileBeatTest" }
}
To enable sending the lines directly to LogStash (rather than needing to use FileBeat), we define an input.conf
file, which configures LogStash to read json from stdin:
input {
stdin { codec => "json_lines" }
}
And an ouput.conf
file which configures LogStash to write the output as json lines a known file path:
output {
file {
path => "/tmp/test/output.log"
codec => "json_lines"
}
}
The tests need to be run inside the machine itself, so I created a script in the ./scripts
directory which will do all the work, and can be run by the execute
method in a Jest test. The script stops the LogStash service, copies the current configuration from the ./src
directory and the replacement input.conf
and output.conf
files to a temporary location, and then runs LogStash once per test case, copying the result file to the test case’s directory.
#! /bin/bash
sudo systemctl stop logstash
temp_path="/tmp/test"
test_source="/vagrant/test/acceptance"
sudo rm -rf "$temp_path/*"
sudo mkdir -p $temp_path
sudo cp -r /vagrant/src/* $temp_path
sudo cp $test_source/*.conf $temp_path
find $test_source/* -type d | while read test_path; do
echo "Running $(basename $test_path) tests..."
sudo /usr/share/logstash/bin/logstash \
"--path.settings" "/etc/logstash" \
"--path.config" "$temp_path" \
< "$test_path/input.log"
sudo touch "$temp_path/output.log" # create it if it doesn't exist (dropped logs etc.)
sudo rm -f "$test_path/result.log"
sudo mv "$temp_path/output.log" "$test_path/result.log"
echo "$(basename $test_path) tests done"
done
sudo systemctl start logstash
To execute this, we use the beforeAll
function to run it once - we also pass in Number.MAX_SAFE_INTEGER
as by default beforeAll
will time out after 5 seconds, and the test.sh
is slow as hell (as LogStash takes ages to start up).
Once the test.sh
script has finished running, we load each test’s output.log
and result.log
files, parse each line as json, compare the objects, and print out the delta if the objects are not considered equal:
const source = "./test/acceptance";
const isDirectory = p => fs.lstatSync(p).isDirectory();
const cases = fs
.readdirSync(source)
.map(name => path.join(source, name))
.filter(isDirectory);
describe("logstash", () => {
beforeAll(
() => execute("/vagrant/scripts/test.sh"),
Number.MAX_SAFE_INTEGER);
test.each(cases)("%s", directoryPath => {
const expected = readFile(path.join(directoryPath, "output.log"));
const actual = readFile(path.join(directoryPath, "result.log"));
const diffpatch = new DiffPatcher({
propertyFilter: (name, context) => {
if (name !== "id") {
return true;
}
return context.left.id !== "<generated>";
}
});
const delta = diffpatch.diff(expected, actual);
const output = formatters.console.format(delta);
if (output.length) {
console.log(output);
}
expect(output.length).toBe(0);
});
});
Blackbox Testing LogStash
As the machine has ports open for FileBeat and will send it’s output to ElasticSearch, we can set up a fake HTTP server, send some log events via FileBeat to the VM and check we receive the right HTTP calls to our fake server.
While looking on how to do this, I came across the lumberjack-protocol package on NPM, but unfortunately, it only supports lumberjack v1, and FileBeat and LogStash are now using v2, so you would have to use a local copy of filebeat to do the sending.
Due to the complexity of implementing this, and the diminished return on investment (the other tests should be sufficient), I have skipped creating the Blackbox tests for the time being.
AMI Testing
The final phase! Now that we are reasonably sure everything works locally, we need to build our AMI and test that everything works there too, as it would be a shame to update an Auto Scale Group with the new image which doesn’t work!
All that needs to happen to run the tests against an EC2 instance is to set the three environment variables we used with Vagrant, to values for communicating with the EC2 instance. To do this, we’ll need the EC2 IP Address, the username for SSH, and the private key for SSH authentication.
The first thing our build script needs to do is create the AMI. This is done in the same way as mentioned earlier, but with the slight difference of also piping the output to tee
:
packer_log=$(packer build logstash.json | tee /dev/tty)
ami_id=$(echo "$packer_log" | tail -n 1 | sed 's/.*\(ami.*\)/\1/')
By using tee
, we can pipe the build log from Packer to both the real terminal (/dev/tty
), and to a variable called packer_log
. The script then takes the last line and uses some regex to grab the AMI ID.
Next up, the script uses the AWS CLI to launch an EC2 instance based on the AMI, and store it’s IP Address and Instance ID:
json=$(aws ec2 run-instances \
--image-id "$ami_id" \
--instance-type t2.small \
--key-name "$keypair_name" \
--region eu-west-1 \
--subnet-id "$subnet_id" \
--security-group-ids "$security_group_id" \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=logstash-verification}]' \
--user-data "$userdata")
instance_id=$(echo "$json" | jq -r .Instances[0].InstanceId)
private_ip=$(echo "$json" | jq -r .Instances[0].PrivateIpAddress)
The IP Address is then used to set up the environment variables which the node test scripts use to locate the machine:
LOGSTASH_ADDRESS="$private_ip"
LOGSTASH_SSH="ubuntu"
LOGSTASH_KEYPATH="~/.ssh/id_rsa" build ou
npm run test
Finally, the script uses the Instance ID to terminate the instance:
aws ec2 terminate-instances \
--instance-ids "$instance_id"
Wrapping Up
Hopefully, this (rather long) post is a useful introduction (!) to how I tackle testing Immutable Infrastructure. All of these techniques for testing the machine and application can be used for testing things like Docker containers too (and handily, Packer can be used to create Docker containers also).
As mentioned earlier The Repository is available here.