Microservices or Components

28 Oct 2018

One of the reasons people list for using MicroServices is that it helps enforce separation of concerns. This is usually achieved by adding a network boundary between the services. While this is useful, it’s not without costs; namely that you’ve added a set of new failure modes: the network. We can achieve the same separation of concerns within the same codebase if we put our minds to it. In fact, this is what Simon Brown calls a Modular Monolith, and DHH calls the Majestic Monolith.

We recently needed to expand an existing service to have some new functionality. The current process looks something like this, where the user has done something which will eventually return them a URL which can be clicked to get to a web page to see the results.

api call does some work, returns a result_url which points to a web interface

The new process is an additional authentication challenge which the user will need to complete before they can get to the final results page. The new process looks like this:

api call does work, makes a request to challenge API, passing the result_url as an argument.  The challenge-response returns a challenge_url, which is returned to the user instead of the return_url

Design Decisions

Currently, the challenge functionality will only be used by this one service, but there is a high probability that we will need it for other services in the future too. At this point we have a decision to make: do we keep this functionality in-process, or make a separate microservice for it?

Time To Live

The first trade-off is time: it is slightly quicker to make it in-process, but if we do want to use this from somewhere else later, we’ll need to extract it; which is more work. The key here is “if” - we don’t know for sure that other services will need this exact functionality.

If we keep the new API and UI within the existing API and UI projects, we can also make some code reuse: there is a data store, data access tooling, permissions, styles that can be reused. Also, all of our infrastructure such as logging and monitoring is already in place, which will save us some time too.

API Risk

We want to avoid deploying a service which then needs to undergo a lot of rework in the future if the second and third users of it have slightly different requirements. If we build it as a separate service now, will we be sure we are making something which is generic and reusable by other services? Typically you only get the answer to this question after the second or third usage, so it seems unlikely that we would get our API design perfect on the first attempt.

Technical Risks

If we are to go the separate service route, we are introducing new failure modes to the existing API. What if the challenge API is down? What if the request times out? Are we using HTTP or a Message Broker to communicate with it?

If we keep the service in-process to start with we can eliminate all of these concerns. Luckily, we tend to have very thin controllers and make use of Mediatr, so the actual implementation of how the remote call is made can be hidden in the message handler to a certain extent.

Technical Decisions

As alluded to in the Time To Live point, we can reuse the existing data store and data access code, but this is a tradeoff in itself: what if the current storage tech is not quite ideal for the new requirements?

If the current service makes use of a complex Entity Framework model, but the new service is so simple that Dapper makes more sense, do we introduce the new dependency or not? What if we wanted to migrate away from one datastore to another (e.g. removing all MongoDB usage in favour of Postgres), but this is already using Mongo? We’d be increasing our dependency on a datastore we are explicitly trying to migrate away from.

All this assumes we want to write the service in the same programming language as the existing service! In our case we do but it’s worth considering if you have multiple languages in use already.

Finally on the data storefront, if we decide to extract this as a separate service later, we will have to take into account data migrations, and how we can handle that with little if any, downtime.

The Decision

After weighing up all these points (and a few others), we decided to keep the service inside the existing services. The Challenge API will live in its own area in the current API, and likewise, the Challenge UI will live in its own area in the existing UI.

How do we go about keeping it all separated though?

  • Communication we discuss all changes we want to make anyway, so the first line of defence to preventing the code becoming tightly coupled are these discussions.
  • Pull Requests someone will notice you are doing something which is reducing the separation, and a discussion about how to avoid this will happen.
  • Naming Conventions the Challenge API shares no naming of properties with the existing API. For example, the current API passes in a results_url and results_id, but the Challenge API stores and refers to these as the redirect_url and external_id.
  • Readme it’ll go into the repository’s readme file, along with any other notes which developers will find useful. The sequence diagrams we drew (with much more detail) will also go in here.

Technical Debt?

The final question on this decision is “Isn’t this technical debt we are introducing?”. The answer I feel is “no”, it feels much closer to applying the YAGNI Principle (You Ain’t Gonna Need It). While there is work in the backlog which can use a Challenge API at the moment, that doesn’t necessarily mean it will still be there next week, or if it will be pushed further back or changed later.

In the end, the meeting where we came up with this and drew things on the whiteboard together was productive, and likely much shorter than it took me to write all this down. We were able to resist the “cool hip microservice” trend and come up with a design which is pretty contained and composable with other systems in the future.

If after all this discussion we decided to go the MicroService route, I would still be happy with the decision, as we would have all this material to look back on and justify our choice, rather than waving our hands about and shouting “but microservices” loudly.

How do you go about designing systems? Microservice all the things? Monolith all the things? Or something in between which makes the most sense for the situation at hand?

architecture, microservices, design

---

SketchNotes: Finding Your Service Boundaries

10 Sep 2018

At NDC Oslo this year, I attended Adam Ralph’s talk on Finding Your Service Boundaries. I enjoyed it a lot, and once the video came out, I rewatched it, and decided to have a go at doing a “sketchnotes”, which I shared on Twitter, which people liked!

I’ve never done one before, but it was pretty fun. I made it in OneNote, zoomed out a lot, and took a screenshot.

Click for huuuuge!

sketchnotes - finding your service boundaries

sketchnotes, conference

---

Semantic Configuration Validation: Earlier

08 Sep 2018

After my previous post on Validating Your Configuration, one of my colleagues made an interesting point, paraphrasing:

I want to know if the configuration is valid earlier than that. At build time preferably. I don’t want my service to not start if part of it is invalid.

There are two points here, namely when to validate, and what to do with the results of validation.

Handling Validation Results

If your configuration is invalid, you’d think the service should fail to start, as it might be configured in a dangerous manner. While this makes sense for some service, others might need to work differently.

Say you have an API which supports both writing and reading of a certain type of resource. The read will return you a resource of some form, and the write side will trigger processing of a resource (and return you a 202 Accepted, obviously).

What happens if your configuration just affects the write side of the API? Should you prevent people from reading too? Probably not, but again it depends on your domain as to what makes sense.

Validating at Build Time

This is the far more interesting point (to me). How can we modify our build to validate that the environment’s configuration is valid? We have the code to do the validation: we have automated tests, and we have a configuration validator class (in this example, implemented using FluentValidation).

Depending on where your master configuration is stored, the next step can get much harder.

Local Configuration

If your configuration is in the current repository (as it should be) then it will be no problem to read.

public class ConfigurationTests
{
    public static IEnumerable<object[]> AvailableEnvironments => Enum
        .GetValues(typeof(Environments))
        .Cast<Environments>()
        .Select(e => new object[] { e });

    [Theory]
    [MemberData(nameof(AvailableEnvironments))]
    public void Environment_specific_configuration_is_valid(Environments environment)
    {
        var config = new ConfigurationBuilder()
            .AddJsonFile("config.json")
            .AddJsonFile($"config.{environment}.json", optional: true)
            .Build()
            .Get<AppConfiguration>();

        var validator = new AppConfigurationValidator();
        validator.ValidateAndThrow(config);
    }
}

Given the following two configuration files, we can make it pass and fail:

config.json:

{
  "Callback": "https://localhost",
  "Timeout": "00:00:30",
  "MaxRetries": 100
}

config.local.json:

{
  "MaxRetries": 0
}

Remote Configuration

But what if your configuration is not in the local repository, or at least, not completely there? For example, have a lot of configuration in Octopus Deploy, and would like to validate that at build time too.

Luckily Octopus has a Rest API (and acompanying client) which you can use to query the values. All we need to do is replace the AddJsonFile calls with an AddInMemoryCollection() and populate a dictionary from somewhere:

[Theory]
[MemberData(nameof(AvailableEnvironments))]
public async Task Octopus_environment_configuration_is_valid(Environments environment)
{
    var variables = await FetchVariablesFromOctopus(
        "MyDeploymentProjectName",
        environment);

    var config = new ConfigurationBuilder()
        .AddInMemoryCollection(variables)
        .Build()
        .Get<AppConfiguration>();

    var validator = new AppConfigurationValidator();
    validator.ValidateAndThrow(config);
}

Reading the variables from Octopus’ API requires a bit of work as you don’t appear to be able to ask for all variables which would apply if you deployed to a specific environment, which forces you into building the logic yourself. However, if you are just using Environment scoping, it shouldn’t be too hard.

Time Delays

Verifying the configuration at build time when your state is fetched from a remote store is not going to solve all your problems, as this little diagram illustrates:

test pass, a user changes value, deployment happens, startup fails

You need to validate in both places: early on in your process, and on startup. How you handle the configuration being invalid doesn’t have to be the same in both places:

  • In the build/test phase, fail the build
  • On startup, raise an alarm, but start if reasonable

Again, how you handle the configuration errors when your application is starting is down to your domain, and what your application does.

configuration, c#, strongtyping, stronk, validation

---

Feature Toggles with Consul

06 Sep 2018

Feature Toggles are a great way of helping to deliver working software, although there are a few things which could go wrong. See my talk Feature Toggles: The Good, The Bad and The Ugly for some interesting stories and insights on it!

I was talking with a colleague the other day about how you could go about implementing Feature Toggles in a centralised manner into an existing system, preferably with a little overhead as possible. The most obvious answer is to use a SAAS solution such as LauchDarkly, but what if you either don’t want to or can’t use a SAAS solution?

What if we already are using Consul for things such as service discovery, could we use the key-value store as a basic Feature Toggle service? It has a few advantages:

  • Consul is already in place, so there is no extra infrastructure required and no additional costs
  • Low stopping cost - If we decide we don’t want to use Consul, or not to use Toggles at all, we can stop
  • Low learning curve - we know how to use Consul already
  • Security - we can make use of Consul’s ACL to allow services to only read, and operators to write Feature Toggles.

There are also some downsides to consider too:

  • We’d effectively be reinventing the wheel
  • There won’t be any “value protection” on the settings (nothing stopping us putting an int into a field which will be parsed as a guid for example)
  • No statistics - we won’t be able to tell if a value is used still
  • No fine-grained control - unless we build some extra hierarchies, everyone gets the same value for a given key

So what would our system look like?

write to consul kv store, results distributed to other consul instances

It’s pretty straightforward. We already have a Consul Cluster, and then there are several machines with Consul clients running on them, as well as a Container Host with Consul too.

Any configuration written to a Consul node is replicated to all other nodes, so our user can write values to any node to get it to the rest of the cluster.

As mentioned earlier, we can use the ACL system to lock things down. Our services will have a read-only role, and our updating user will have a writeable role.

What Next?

Assuming this system covers enough of what we want to do, the next steps might be to make some incremental improvements in functionality, although again I would suggest looking into not reinventing the wheel…

Statistics

While we can’t use Consul to collect statistics on what keys are being read, we could provide this functionality by making a small client library which would log the queries and send them somewhere for aggregation.

Most microservice environments have centralised logging or monitoring (and if they don’t, they really should), so we can use this to record toggle usage.

This information would be useful to have in the same place you set the feature toggles, which brings us nicely onto the next enhancement.

User Interface

A simple static website could be used to read all the Toggles and their relevant states and statistics and provide a way of setting them. The UI could further be expanded to give some type safety, such as extra data indicating what type a given key’s value should be.

FIne Grained Values

Currently, everyone has the same value for a given key, but the system could be expanded to be more fine-grained. Rather than storing a feature toggle in the current form:

/kv/toggles/fast-rendering => true

We could add another level which would indicate a grouping:

/kv/toggles/fast-rendering/early-access => true
/kv/toggles/fast-rendering/others => false

At this point though, you are starting to add a lot of complexity. Think about whether you are solving the right problem! Choose where you are spending your Innovation Tokens.

Wrapping Up

Should you do this? Maybe. Probably not. I don’t know your system and what infrastructure you have available, so I don’t want to give any blanket recommendations.

I will, however, suggest that if you are starting out with Feature Toggles, go for something simple first. My current team’s first use of a Feature Toggle was just a setting in the web.config, and we just changed the value of it when we wanted the new functionality to come on.

See what works for you, and if you start needing something more complicated than just simple key-value toggles, have a look into an existing system.

microservices, consul, featuretoggles, architecture

---

Validate Your Configuration

26 Aug 2018

As I have written many times before, your application’s configuration should be strongly typed and validated that it loads correctly at startup.

This means not only that the source values (typically all represented as strings) can be converted to the target types (int, Uri, TimeSpan etc) but that the values are semantically valid too.

For example, if you have a web.config file with the following AppSetting, and a configuration class to go with it:

<configuration>
  <appSettings>
    <add key="Timeout" value="20" />
  </appSettings>
</configuration>
public class Configuration
{
    public TimeSpan Timeout { get; private set; }
}

We can now load the configuration using Stronk (or Microsoft.Extensions.Configuration if you’re on dotnet core), and inspect the contents of the Timeout property:

var config = new StronkConfig().Build<Configuration>();

Console.WriteLine(config.Timeout); // 20 days, 0 hours, 0 minutes, 0 seconds

Oops. A timeout of 20 days is probably a little on the high side! The reason this happened is that to parse the string value we use TimeSpan.Parse(value), which will interpret it as days if no other units are specified.

How to validate?

There are several ways we could go about fixing this, from changing to use TimeSpan.ParseExact, but then we need to provide the format string from somewhere, or force people to use Stronk’s own decision on format strings.

Instead, we can just write some validation logic ourselves. If it is a simple configuration, then writing a few statements inline is probably fine:

var config = new StronkConfig()
    .Validate.Using<Configuration>(c =>
    {
        if (c.Timeout < TimeSpan.FromSeconds(60) && c.Timeout > TimeSpan.Zero)
            throw new ArgumentOutOfRangeException(nameof(c.Timeout), $"Must be greater than 0, and less than 1 minute");
    });
    .Build<Configuration>();

But we can make it much clearer by using a validation library such as FluentValidation, to do the validation:

var config = new StronkConfig()
    .Validate.Using<Configuration>(c => new ConfigurationValidator().ValidateAndThrow(c))
    .Build<Configuration>();
public class ConfigurationValidator : AbstractValidator<Configuration>
{
    private static readonly HashSet<string> ValidHosts = new HashSet<string>(
        new[] { "localhost", "internal" },
        StringComparer.OrdinalIgnoreCase);

    public ConfigurationValidator()
    {
        RuleFor(x => x.Timeout)
            .GreaterThan(TimeSpan.Zero)
            .LessThan(TimeSpan.FromMinutes(2));

        RuleFor(x => x.Callback)
            .Must(url => url.Scheme == Uri.UriSchemeHttps)
            .Must(url => ValidHosts.Contains(url.Host));
    }
}

Here, not only are we checking the Timeout is in a valid range, but that our Callback is HTTPS and that it is going to a domain on an Allow-List.

What should I validate?

Everything? If you have properties controlling the number of threads an application uses, probably checking it’s a positive number, and less than x * Environment.ProcessorCount (for some value of x) is probably a good idea.

If you are specifying callback URLs in the config file, checking they are in the right domain/scheme would be a good idea (e.g. must be https, must be in a domain allow-list).

How do you check your configuration isn’t going to bite you when an assumption turns out to be wrong?

configuration, c#, strongtyping, stronk, validation

---