So far in this Prometheus blog series, we have looked into Prometheus metrics and labels (see Part 1 & 2), as well as how Prometheus integrates in a distributed architecture (see Part 3). In this 4th part, it is time to look at code to create custom instrumentation. Luckily, client libraries make this pretty easy, which is one of the reasons behind Prometheus’ wide adoption.

This post will go through examples in Go and Java. Prometheus has a number of other supported languages, through official libraries and community maintained ones.

Instrumentation in Go

Go is one of the officially supported languages for Prometheus instrumentation. Not hugely surprising, since Prometheus is written in Go!

The Prometheus Go client provides:

  • Built-in Go metrics (memory usage, goroutines, GC, …)
  • The ability to create custom metrics
  • An HTTP handler for the /metrics endpoint

Add built-in metrics

Enabling the built-in metrics is as simple as importing the library and exposing the metrics handler.

package main

import (
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "net/http"
)

func main() {
    http.Handle("/metrics", promhttp.Handler())
    panic(http.ListenAndServe(":8080", nil))
}

This sample application will expose standard Go metrics, which should be accessible under http://localhost:8080/metrics. It should output something like:

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 510728
# [...]

Creating custom metrics

Instrumenting an application goes further than low-level metrics. Custom metrics can be used to expose information about the internal state of applications, to gain more understanding for monitoring or debugging purposes.

Simple Gauge

Custom metrics are created via the github.com/prometheus/client_golang/prometheus package. For example, a Gauge for the number of jobs currently in a queue:

import "github.com/prometheus/client_golang/prometheus"

var jobsInQueue = prometheus.NewGauge(
    prometheus.GaugeOpts{
        Name: "jobs_in_queue",
        Help: "Current number of jobs in the queue",
    },
)

To include the new metric in the list exposed in the HTTP handler, it must be registered in the default registry (it is easy to forget this step and spend a while figuring out why a metric is missing from the endpoint):

func init(){
    prometheus.MustRegister(jobsInQueue)
}

Finally, the metric can be used inside the code:

func enqueueJob(job Job) {
    queue.Add(job)
    jobsInQueue.Inc()
}

func runNextJob() {
    job := queue.Dequeue()
    jobsInQueue.Dec()

    job.Run()
}

Creating a Counter is very similar to a Gauge, using prometheus.NewCounter(). The only difference is that a Counter only has the .Inc() receiver.

Adding labels

As it is, jobs_in_queue is a simple metric with no labels. It would be interesting to track the queue size for each type of jobs. In this case, the metric creation can use NewGaugeVec() instead of NewGauge() and specify the list of labels for the metric.

The metric creation code becomes:

var jobsInQueue = prometheus.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "jobs_in_queue",
        Help: "Current number of jobs in the queue",
    },
    []string{"job_type"},
)

The instrumentation must now specify the values for the labels (in this case job_type) before incrementing or decrementing the Gauge:

func enqueueJob(job Job) {
    queue.Add(job)
    jobsInQueue.WithLabelValues(job.Type()).Inc()
}

func runNextJob() {
    job := queue.Dequeue()
    jobsInQueue.WithLabelValues(job.Type()).Dec()

    job.Run()
}

If specifying multiple labels, the label values must be passed in the same order as the labels were defined when creating the metric.

Defining Histogram buckets

Measuring the duration of the things happening inside applications is usually a good way to get some insights of performance and detect anomalies. As explained in Part 2, durations should be analysed through their distribution (and not averages), which means the ideal metric type in such a situation is an Histogram (or potentially a Summary).

Histograms observe all values and count them in bounded buckets. These buckets must be defined for the Histogram at creation time and should be tailored to what’s being measured (otherwise, the metric will be of very little use). If the typical profile for the jobs ran by this application is in the order of 1~5 seconds, good buckets could be: <1s, <2s, <5s, <10s, <20s, <1m.

Creating an Histogram is very similar to a Gauge, with the addition of the buckets:

var jobsDurationHistogram = prometheus.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "jobs_duration_seconds",
        Help:    "Jobs duration distribution",
        Buckets: []float64{1, 2, 5, 10, 20, 60},
    },
    []string{"job_type"},
)

Instead of defining the buckets manually, it is also possible to create buckets in a linear sequence (e.g. 10, 15, 20, 25) or exponential sequence (e.g. 1, 10, 100, 1000):

// 4 buckets, starting from 10 and adding 5 between each
prometheus.LinearBuckets(10, 5, 4)

// 4 buckets, starting from 1 and multiplying by 10 between each
prometheus.ExponentialBuckets(1, 10, 4)

The Histogram is then used inside the code to observe the duration of each job:

start := time.Now()
job.Run()
duration := time.Since(start)
jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())

Creating a Summary is somewhat similar to an Histogram, the difference is that prometheus.NewSummary() must specify which quantiles to calculate (instead of specifying buckets).

You can find more examples provided with the Go client library.

Instrumentation in Java

The client library for Java is also officially supported and as a result instrumenting Java code is very similar to Go.

Following the same example as above, here are the equivalent code snippets in Java.

Add built-in metrics

import io.prometheus.client.exporter.MetricsServlet;
import io.prometheus.client.hotspot.DefaultExports;

// ...

public void initContext(ServletContextHandler context) {
   // Enable default JVM metrics
   DefaultExports.initialize();

   // Add metrics servlet to the context
   context.addServlet(new ServletHolder(new MetricsServlet()), "/metrics");

   // ...
}

If you are using a framework like Spring Boot, a lot of this might be taken care of through configuration and annotations. There is a good example in a blog from Njål Nordmark: Using Prometheus with Spring Boot. Another alertnative for instrumenting a Spring Boot app is to use a vendor-neutral library that supports Prometheus, such as Micrometer.

Creating a simple Gauge

Creation and registration:

private static final Gauge JOBS_IN_QUEUE = Gauge.build()
    .name("jobs_in_queue")
    .help("Current number of jobs in the queue")
    .register();

Using the metric:

public void enqueueJob(Job job) {
    queue.add(job);
    JOBS_IN_QUEUE.inc();
}

public void runNextJob() {
    Job job = queue.dequeue();
    JOBS_IN_QUEUE.dec();

    job.run();
}

Adding labels

Defining the metric with the added job_type label:

private static final Gauge JOBS_IN_QUEUE = Gauge.build()
    .name("jobs_in_queue")
    .help("Current number of jobs in the queue")
    .labelNames("job_type")
    .register();

Adding the label value when using the Gauge:

public void enqueueJob(Job job) {
    queue.add(job);
    JOBS_IN_QUEUE.labels(job.getType()).inc();
}

public void runNextJob() {
    Job job = queue.dequeue();
    JOBS_IN_QUEUE.labels(job.getType()).dec();

    job.run();
}

Defining Histogram buckets

Creating an histogram with manually defined buckets:

private static final Histogram JOB_DURATION_HISTOGRAM = Histogram.build()
    .name("jobs_duration_seconds")
    .help("Jobs duration distribution")
    .labelNames("job_type")
    .buckets(1, 2, 5, 10, 20, 60)
    .register();

With linear buckets:

private static final Histogram JOB_DURATION_HISTOGRAM = Histogram.build()
    .name("jobs_duration_seconds")
    .help("Jobs duration distribution")
    .labelNames("job_type")
    .linearBuckets(10, 5, 4)
    .register();

With exponential buckets:

private static final Histogram JOB_DURATION_HISTOGRAM = Histogram.build()
    .name("jobs_duration_seconds")
    .help("Jobs duration distribution")
    .labelNames("job_type")
    .exponentialBuckets(1, 10, 4)
    .register();

Observing job durations:

Histogram.Timer jobTimer = JOB_DURATION_HISTOGRAM.WithLabelValues(job.Type()).startTimer();
job.run();
jobTimer.observeDuration();

What’s next?

With the large number of client libraries available for Prometheus instrumentation, there is very little barrier to add custom metrics to any application you write yourself.

Custom metrics can be infinitely more valuable than system metrics, as they can allow to create powerful indicators of how a service is fulfilling its responsibilities (or not). These top-level metrics are typically more interesting to alert on; in the next post, we will look more into what is worth alerting on, and how to define Prometheus rules in order to trigger such alerts.

If there is any specific subject you would like me to cover in this series, feel free to reach out to me on Twitter at @PierreVincent