I use opentelemetry extensively to trace my applications, and one thing I keep running into is when writing a long running process, I want to handle OS signals and still send the telemetry on shutdown.
Typically, my application startup looks something like this:
func main() {
ctx, cancel := context.WithCancel(context.Background())
handleSignals(cancel)
tracerProvider := configureTelemetry(ctx)
defer tracerProvider.Shutdown(ctx)
tr = traceProvider.Tracer("cli")
if err := runMain(ctx, os.Args[:]); err != nil {
fmt.Fprintf(os.Stderr, err.Error())
os.Exit(1)
}
}
func runMain(ctx context.Context, args []string) error {
ctx, span := tr.Start(ctx, "main")
defer span.End()
// some kind of loop
for _, message := range someProcess(ctx) {
select {
case <-ctx.Done:
return ctx.Err()
default:
doMessageThings(ctx, message)
}
}
return nil
}
The handleSignals
method listens to things like sigint
, and calls the cancel()
function, and the app stops processing messages and exits gracefully.
When the application exited due to errors, I would see the whole trace from the application, but if the application stopped due to sigint
or similar, the main
span would never come through.
After a bit of reading, I realised the bug is here:
ctx, cancel := context.WithCancel(context.Background())
handleSignals(cancel)
tracerProvider := configureTelemetry(ctx)
- defer tracerProvider.Shutdown(ctx)
+ defer tracerProvider.Shutdown(context.Background())
The problem is that when the context has been cancelled, the tracerProvider skips doing any work, so never sends through the last spans!
This issue would have been more noticable if:
- I checked the
err
value fromShutdown()
, which is easily missed in adefer
call - Setting
OTEL_LOG_LEVEL
todebug
printed something useful!
Hopefully by writing this down I will remember or at least find the answer next time I manage to do this again!