observability

Telemetry and Cancelled Contexts

I use opentelemetry extensively to trace my applications, and one thing I keep running into is when writing a long running process, I want to handle OS signals and still send the telemetry on shutdown. Typically, my application startup looks something like this: func main() { ctx, cancel := context.WithCancel(context.Background()) handleSignals(cancel) tracerProvider := configureTelemetry(ctx) defer tracerProvider.Shutdown(ctx) tr = traceProvider.Tracer("cli") if err := runMain(ctx, os.Args[:]); err != nil { fmt.Fprintf(os.Stderr, err.Error()) tracerProvider....

Print debugging: a tool among other tools

This is my thoughts after reading Don’t Look Down on Print Debugging. TLDR Print debugging is a tool, and it has its uses - however, if there are better tools available, maybe use those instead. For me, a better tool is OpenTelemetry tracing; it gives me high granularity, parent-child relationships between operations, timings, and is filterable and searchable. I can also use it to debug remote issues, as long as the user can send me a file....

Tracing: structured logging, but better in every way

It is no secret that I am not a fan of logs; I’ve baited (rapala in work lingo. Rapala is a Finnish brand of fishing lure, and used to mean baiting in this context) discussion in our work chat with things like: If you’re writing log statements, you’re doing it wrong. This is a pretty incendiary statement, and while there has been some good discussion after, I figured it was time to write down why I think logs are bad, why tracing should be used instead, and how we get from one to the other....

Adding Observability to Vault

One of the things I like to do when setting up a Vault cluster is to visualise all the operations Vault is performing, which helps see usage patterns changing, whether there are lots of failed requests coming in, and what endpoints are receiving the most traffic. While Vault has a lot of data available in Prometheus telemetry, the kind of information I am after is best taken from the Audit backend....

Getting NodeJS OpenTelemetry data into NewRelic

I had the need to get some OpenTelemetry data out of a NodeJS application, and into NewRelic’s distributed tracing service, but found that there is no way to do it directly, and in this use case, adding a separate collector is more hassle than it’s worth. Luckily, there is an NodeJS OpenTelemetry library which can report to Zipkin, and NewRelic can also ingest Zipkin format data. To use it was relatively straight forward:...

Observability with Infrastructure as Code

This article was originally published on the Pulumi blog. When using the Pulumi Automation API to create applications which can provision infrastructure, it is very handy to be able to use observability techniques to ensure the application functions correctly and to help see where performance bottlenecks are. One of the applications I work on creates a VPC and Bastion host and then stores the credentials into a Vault instance. The problem is that the “create infrastructure” part is an opaque blob, in that I can see it takes 129 seconds to create, but I can’t see what it’s doing, or why it takes this amount of time....

Observability Without Honeycomb

Before I start on this, I want to make it clear that if you can buy Honeycomb, you should. Outlined below is how I started to add observability to an existing codebase which already had the ELK stack available, and was unable to use Honeycomb. My hope, in this case, is that I can demonstrate how much value observability gives, and also show how much more value you would get with an excellent tool, such as Honeycomb....