opentelemetry

Telemetry and Cancelled Contexts

I use opentelemetry extensively to trace my applications, and one thing I keep running into is when writing a long running process, I want to handle OS signals and still send the telemetry on shutdown. Typically, my application startup looks something like this: func main() { ctx, cancel := context.WithCancel(context.Background()) handleSignals(cancel) tracerProvider := configureTelemetry(ctx) defer tracerProvider.Shutdown(ctx) tr = traceProvider.Tracer("cli") if err := runMain(ctx, os.Args[:]); err != nil { fmt.Fprintf(os.Stderr, err.Error()) tracerProvider....

Print debugging: a tool among other tools

This is my thoughts after reading Don’t Look Down on Print Debugging. TLDR Print debugging is a tool, and it has its uses - however, if there are better tools available, maybe use those instead. For me, a better tool is OpenTelemetry tracing; it gives me high granularity, parent-child relationships between operations, timings, and is filterable and searchable. I can also use it to debug remote issues, as long as the user can send me a file....

Multiple errors in an OTEL Span

A question at work came up recently about how to handle multiple errors in a single span. My reaction is that having multiple errors in a span is a design smell, but I had no particular data or spec to back this up, just a gut feeling. I’ve since thought about this more and think that in general, you should only have one error per span, but there are circumstances where multiple indeed makes sense....

Tracing: structured logging, but better in every way

It is no secret that I am not a fan of logs; I’ve baited (rapala in work lingo. Rapala is a Finnish brand of fishing lure, and used to mean baiting in this context) discussion in our work chat with things like: If you’re writing log statements, you’re doing it wrong. This is a pretty incendiary statement, and while there has been some good discussion after, I figured it was time to write down why I think logs are bad, why tracing should be used instead, and how we get from one to the other....

Observability Driven CI

Tracking where the time goes in your CI pipeline is an important step towards being able to make it go even faster. Up until somewhat recently, the only way of tracking how long tasks took in CI was either hoping people had wrapped all their commands in time ..., or by reading a timestamped build log and calculating the difference between numbers. Which isn’t great or fun, if we’re being honest....

Adding Observability to Vault

One of the things I like to do when setting up a Vault cluster is to visualise all the operations Vault is performing, which helps see usage patterns changing, whether there are lots of failed requests coming in, and what endpoints are receiving the most traffic. While Vault has a lot of data available in Prometheus telemetry, the kind of information I am after is best taken from the Audit backend....

Getting NodeJS OpenTelemetry data into NewRelic

I had the need to get some OpenTelemetry data out of a NodeJS application, and into NewRelic’s distributed tracing service, but found that there is no way to do it directly, and in this use case, adding a separate collector is more hassle than it’s worth. Luckily, there is an NodeJS OpenTelemetry library which can report to Zipkin, and NewRelic can also ingest Zipkin format data. To use it was relatively straight forward:...