S3 Multi-File upload with Terraform

23 Apr 2017

Hosting a static website with S3 is really easy, especially from terraform:

First off, we want a public readable S3 bucket policy, but we want to apply this only to one specific bucket. To achive that we can use Terraform’s template_file data block to merge in a value:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::${bucket_name}/*"
      ]
    }
  ]
}

As you can see the interpolation syntax is pretty much the same as how you use variables in terraform itself. Next we define a template_file to do the transformation. As the bucket name is going to be used many times, we extract that into a variable block also:

variable "bucket" {
  default = "examplebucket"
}

data "template_file" "s3_public_policy" {
  template = "${file("policies/s3-public.json")}"
  vars {
    bucket_name = "${var.bucket}"
  }
}

Next we want to create the S3 bucket and set it to be a static website, which we can do using the website sub block. For added usefulness, we will also define an output to show the website url on the command line:

resource "aws_s3_bucket" "static_site" {
  bucket = "${var.bucket}"
  acl = "public-read"
  policy = "${data.template_file.s3_public_policy.rendered}"

  website {
    index_document = "index.html"
  }
}

output "url" {
  value = "${aws_s3_bucket.static_site.bucket}.s3-website-${var.region}.amazonaws.com"
}

Single File Upload

If you just want one file in the website (say the index.html file), then you can add the following block. Just make sure the key property matches the index_document name in the aws_s3_bucket block.

resource "aws_s3_bucket_object" "index" {
  bucket = "${aws_s3_bucket.static_site.bucket}"
  key = "index.html"
  source = "src/index.html"
  content_type = "text/html"
  etag = "${md5(file("src/index.html"))}"
}

Multi File Upload

Most websites need more than one file to be useful, and while we could write out an aws_s3_bucket_object block for every file, that seems like a lot of effort. Other options include manually uploading the files to S3, or using the aws cli to do it. While both methods work, they’re error prone - you need to specify the content_type for each file for them to load properly, and you can’t change this property once a file is uploaded.

To get around this, I add one more variable to my main terraform file, and generate a second file with all the aws_s3_bucket_object blocks in I need.

The added variable is a lookup for mime types:

variable "mime_types" {
  default = {
    htm = "text/html"
    html = "text/html"
    css = "text/css"
    js = "application/javascript"
    map = "application/javascript"
    json = "application/json"
  }
}

I then create a shell script which will write a new file containing a aws_s3_bucket_object block for each file in the src directory:

#! /bin/sh

SRC="src/"
TF_FILE="files.tf"
COUNT=0

cat > $TF_FILE ''

find $SRC -iname '*.*' | while read path; do

    cat >> $TF_FILE << EOM

resource "aws_s3_bucket_object" "file_$COUNT" {
  bucket = "\${aws_s3_bucket.static_site.bucket}"
  key = "${path#$SRC}"
  source = "$path"
  content_type = "\${lookup(var.mime_types, "${path##*.}")}"
  etag = "\${md5(file("$path"))}"
}
EOM

    COUNT=$(expr $COUNT + 1)

done

Now when I want to publish a static site, I just have to make sure I run ./files.sh once before my terraform plan and terraform apply calls.

Caveats

This technique has one major drawback: it doesn’t work well with updating an existing S3 bucket. It won’t remove files which are no longer in the terraform files, and can’t detect file moves.

However, if you’re happy with a call to terraform destroy before applying, this will work fine. I use it for a number of test sites which I don’t tend to leave online very long, and for scripted aws infrastructure that I give out to other people so they can run their own copy.

code, aws, terraform, s3

---

Don't write Frameworks, write Libraries

16 Apr 2017

Programmers have a fascination with writing frameworks for some reason. There are many problems with writing frameworks:

Opinions

Frameworks are opinionated, and will follow their author’s opinions on how things should be done, such as application structure, configuration, and methodology. The problem this gives is that not everyone will agree with the author, or their framework’s opinions. Even if they really like part of how the framework works, they might not like another part, or might not be able to rewrite their application to take advantage of the framework.

Configurability

The level of configuration available in a framework is almost never correct. Not only is there either too little or too much configuration options, but how the configuration is done can cause issues. Some developers love conventions, other prefer explicit configuration.

Development

Frameworks suffer from the danger of not solving the right problem, or missing the problem due to how long it took to implement the framework. This is compounded by when a framework is decided to be developed, which is often way before the general case is even recognised. Writing a framework before writing your project is almost certain to end up with a framework which either isn’t suitable for the project, or isn’t suitable for any other projects.

What about a library or two?

If you want a higher chance at success, reduce your scope and write a library.

A library is usually a small unit of functionality, and does one thing and does it well (sound like microservices or Bounded Contexts much?). This gives it a higher chance of success, as the opinions of the library are going to effect smaller portions of peoples applications. It won’t dictate their entire app structure. They can opt in to using the libraries they like, rather than all the baggage which comes with a framework.

But I really want to write a framework

Resist, if you can! Perhaps a framework will evolve from your software, perhaps not. What I have found to be a better path is to create libraries which work on their own, but also work well with each other. This can make this more difficult, but it also give you the ability to release libraries as they are completed, rather than waiting for an entire framework to be “done”.

Some examples

These are some libraries I have written which solve small problems in an isolated manner

  • Stronk - A library to populate strong typed configuration objects.
  • FileSystem - Provides a FileSystem abstraction, with decorators and an InMemory FileSystem implementation.
  • Finite - a Finite State Machine library.
  • Conifer - Strong typed, Convention based routing for WebAPI, also with route lookup abilities

So why not write some libraries?

architecture, code

---

Using Terraform to setup AWS API-Gateway and Lambda

17 Mar 2017

I have been writing simple webhook type applications using Claudiajs, which in behind the scenes is using Aws’s Lambda and Api Gateway to make things happen, but I really wanted to understand what exactly it was doing for me, and how I could achieve the same results using Terraform.

The Lambda Function

I started off with a simple NodeJS function, in a file called index.js

exports.handler = function(event, context, callback) {
  callback(null, {
    statusCode: '200',
    body: JSON.stringify({ 'message': 'hello world' }),
    headers: {
      'Content-Type': 'application/json',
    },
  });
};

First thing to note about this function is the 2nd argument passed to callback: this maps to the whole response object not just the body. If you try and just run callback(null, { message: 'hello world' }), when called from the API Gateway, you will get the following error in your CloudWatch logs, and not a lot of help on Google:

Execution failed due to configuration error: “Malformed Lambda proxy response”

Terraform

We want to upload a zip file containing all our lambda’s code, which in this case is just the index.js file. While this could be done by generating the zip file with a gulp script or manually, we can just get terraform to do this for us, by using the archive_file data source:

data "archive_file" "lambda" {
  type = "zip"
  source_file = "index.js"
  output_path = "lambda.zip"
}

resource "aws_lambda_function" "example_test_function" {
  filename = "${data.archive_file.lambda.output_path}"
  function_name = "example_test_function"
  role = "${aws_iam_role.example_api_role.arn}"
  handler = "index.handler"
  runtime = "nodejs4.3"
  source_code_hash = "${base64sha256(file("${data.archive_file.lambda.output_path}"))}"
  publish = true
}

By using the source_code_hash property, Terraform can detect when the zip file has changed, and thus know whether to re-upload the function when you call terraform apply.

We also need an IAM role for the function to run under. While the policy could be written inline, but I have found it more expressive to have a separate file for the role policy:

resource "aws_iam_role" "example_api_role" {
  name = "example_api_role"
  assume_role_policy = "${file("policies/lambda-role.json")}"
}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": [
          "lambda.amazonaws.com",
          "apigateway.amazonaws.com"
        ]
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}

That’s the lambda done - you can login to the AWS Console, setup a test event and execute it if you want :)

Creating the Api Gateway

We are going to create a simple api, with one endpoint (or resource, in AWS terminology).

First we need to define an api root:

resource "aws_api_gateway_rest_api" "example_api" {
  name = "ExampleAPI"
  description = "Example Rest Api"
}

And then a resource to represent the /messages endpoint, and a method to handle POST:

resource "aws_api_gateway_resource" "example_api_resource" {
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  parent_id = "${aws_api_gateway_rest_api.example_api.root_resource_id}"
  path_part = "messages"
}

resource "aws_api_gateway_method" "example_api_method" {
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  resource_id = "${aws_api_gateway_resource.example_api_resource.id}"
  http_method = "POST"
  authorization = "NONE"
}

The aws_api_gateway_resource can be attached to other aws_api_gateway_resources rather than to the api root too, allowing for multi level routes. You can do this by changing the parent_id property to point to another aws_api_gateway_resource.id.

Now we need add an integration between the api and lambda:

resource "aws_api_gateway_integration" "example_api_method-integration" {
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  resource_id = "${aws_api_gateway_resource.example_api_resource.id}"
  http_method = "${aws_api_gateway_method.example_api_method.http_method}"
  type = "AWS_PROXY"
  uri = "arn:aws:apigateway:${var.region}:lambda:path/2015-03-31/functions/arn:aws:lambda:${var.region}:${var.account_id}:function:${aws_lambda_function.example_test_function.function_name}/invocations"
  integration_http_method = "POST"
}

Finally a couple of deployment stages, and an output variable for each to let you know the api’s urls:

resource "aws_api_gateway_deployment" "example_deployment_dev" {
  depends_on = [
    "aws_api_gateway_method.example_api_method",
    "aws_api_gateway_integration.example_api_method-integration"
  ]
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  stage_name = "dev"
}

resource "aws_api_gateway_deployment" "example_deployment_prod" {
  depends_on = [
    "aws_api_gateway_method.example_api_method",
    "aws_api_gateway_integration.example_api_method-integration"
  ]
  rest_api_id = "${aws_api_gateway_rest_api.example_api.id}"
  stage_name = "api"
}

output "dev_url" {
  value = "https://${aws_api_gateway_deployment.example_deployment_dev.rest_api_id}.execute-api.${var.region}.amazonaws.com/${aws_api_gateway_deployment.example_deployment_dev.stage_name}"
}

output "prod_url" {
  value = "https://${aws_api_gateway_deployment.example_deployment_prod.rest_api_id}.execute-api.${var.region}.amazonaws.com/${aws_api_gateway_deployment.example_deployment_prod.stage_name}"
}

The two output variables will cause terraform to output the paths when you call terraform apply, or afterwards when you call terraform output dev_url. Great for scripts which need to know the urls!

Run it!

You can now call your url and see a friendly hello world message:

curl -X POST -H "Content-Type: application/json" "YOUR_DEV_OR_PROD_URL"

Switching to C#

Switching to a C#/dotnetcore lambda is very straight forward from here. We just need to change the aws_lambda_function’s runtime and handler properties, and change the archive_file to use source_dir rather than source_file:

data "archive_file" "lambda" {
  type = "zip"
  source_dir = "./src/published"
  output_path = "lambda.zip"
}

resource "aws_lambda_function" "example_test_function" {
  filename = "${data.archive_file.lambda.output_path}"
  function_name = "example_test_function"
  role = "${aws_iam_role.example_api_role.arn}"
  handler = "ExampleLambdaApi::ExampleLambdaApi.Handler::Handle"
  runtime = "dotnetcore1.0"
  source_code_hash = "${base64sha256(file("${data.archive_file.lambda.output_path}"))}"
  publish = true
}

Note the handler property is in the form AssemblyName::FullyQualifiedTypeName::MethodName.

For our C# project, we need the following two nugets:

Amazon.Lambda.APIGatewayEvents
Amazon.Lambda.Serialization.Json

And the only file in our project looks like so:

namespace ExampleLambdaApi
{
  public class Handler
  {
    [LambdaSerializer(typeof(JsonSerializer))]
    public APIGatewayProxyResponse Handle(APIGatewayProxyRequest apigProxyEvent)
    {
      return new APIGatewayProxyResponse
      {
        Body = apigProxyEvent.Body,
        StatusCode = 200,
      };
    }
  }
}

One thing worth noting is that the first time a C# function is called it takes a long time - in the region of 5-6 seconds. Subsequent invocations are in the 200ms region.

All the code for this demo can be found on my GitHub, in the terraform-demos repository.

code, net, nodejs, aws, terraform, lambda, apigateway, rest

---

Unit Tests & Scratchpads

21 Jan 2017

Often when developing something, I have the need to check how a function or library works. For example, I always have to check for this question:

Does Directory.ListFiles(".\\temp\\") return a list of filenames, a list of relative filepaths, or a list of rooted filepaths?

It returns relative filepaths by the way:

Directory.ListFiles(".\\temp\\");
[ ".\temp\NuCrunch.Tests.csproj", ".\temp\packages.config", ".\temp\Scratchpad.cs" ]

Now that there is a C# Interactive window in Visual Studio, you can use that to test the output. Sometimes however the C# Interactive window is not suitable:

  • You want to test needs a little more setup than a couple of lines
  • You wish to use the debugger to check on intermediate state
  • You are not in Visual Studio (I am 99% of the time in Rider)

When this happens, I turn to the unit test file which I add to all unit test projects: the Scratchpad.cs.

The complete listing of the file is this:

using Xunit;
using Xunit.Abstractions;

namespace NuCrunch.Tests
{
	public class Scratchpad
	{
		private readonly ITestOutputHelper _output;

		public Scratchpad(ITestOutputHelper output)
		{
			_output = output;
		}

		[Fact]
		public void When_testing_something()
		{

		}
	}
}

It gets committed to the git repository with no content in the When_testing_something method, and is never committed again afterwards. The _output field is added to allow writing to console/test window easily too.

Now whenever I wish to experiment with something, I can pop open the Scratchpad write some test content, then execute and debug it to my hearts content.

After I am done with the test code, one of two things happen: it gets deleted, or it gets moved into a proper unit test.

code, net, testing, xunit

---

Update all Docker images

16 Jan 2017

My work’s wifi is much faster than my 4G connection, so periodically I want to update all my docker images on my personal laptop while at work.

As I want to just set it going and then forget about it, I use the following one liner to do a docker pull against each image on my local machine:

docker images | grep -v REPOSITORY | awk '{print $1}'| xargs -L1 docker pull

If you only want to fetch the versions you have the tags for:

docker images | grep -v REPOSITORY | awk '{ if ($2 != "<none>") { print $1":"$2 } else { print $1 } }' | xargs -L1 docker pull

Now if only I could get git bash to do TTY properly so I get the pretty download indicators too :(

docker, bash, git

---