Running Jaeger with Cassandra

This post will detail a single caveat with running Jaeger with Cassandra as it’s backend (ie. the place where Jaeger stores data) that I’ve ran into while developing SMOK.

What is what?

Jaeger is a tool to provide traces. Ie. if you have a microservice-based app, it may take a couple of calls between microservices to achieve an objective. Tracing is a way to see through these calls, and for example note how long did a task take to complete, and it’s breakdown.

Here’s a sample screenshot from our Jaeger instance at SMOK:

Here we see what did the service archives have to do in order to process archives on a single device. It first had to determine whether the device is online (device.GetDeviceExtended) and ask the device to refresh relevant sensors (device.Execute). Since archives caches sensors, it did not need to ask for them.

Traces consist of spans (marked by a continuous bold line in the screenshot above), which themselves can consist of other spans. The context is transferred as a call metadata (eg. using metadata of gRPC or HTTP headers). A span might have events (key plus timestamp equals value) or some tags (key to value). Events are better suited for logging particular stuff happening within a particular timeframe, and tags are better suited for things like call arguments.

Not only it provides a breakdown of what your services were doing, but now you can plug in Cassandra to report directly to Jaeger thanks to Bhavin Gandhi and me.

Jaeger was contributed to the open source community by Uber. It’s primary competition is the ELK stack, Grafana Tempo and Zipkin (contributed to FOSS by Twitter). It is an implementation of the OpenTracing standard.

Since OpenTelemetry is still in it’s infancy, I won’t elaborate on it here. It’s a nice project to visit in about 3 years (note to self: set a reminder to check it when I get there), although tracing is reported to be ready for production.

Apache Cassandra is a master-less wide-column NoSQL database with tunable consistency/availability. It’s main feature is storing the data in so-called SS-tables (sorted string tables), which are immutable, but can be replaced with new SS-tables. The process of combining multiple SS-tables into one bigger is called a compaction.

Cassandra was contributed by Facebook, where it was used to power in-box search feature.

Expiring data and compactions

When you set data for expiration using Cassandra, it bears a marker to tell Cassandra that the data is to be removed after certain time.

But when do physical deletes happen? Since SS-tables are immutable, that happens only during compaction. Both LeveledCompactionStrategy and SizeTieredCompactionStrategy could allow our SS-tables to reach critical size, potentially overwhelming our Cassandra node. We also realize that insertion time is intimately tied to expiration time – data will expire after a set time (make sure that all of your Jaeger tables in Cassandra have a default_ttl set). We need to make use either of TimeWindowCompactionStrategy or a deprecated DateTieredCompactionStrategy (read the link to gain better understanding what I’m writing here about).

Seeing as we in SMOK are running a two-week expiration, and reading up on DataStax’s manual, we’ve decided that we would apply a window size of 17 hours (should allow between 20 and 30 windows to exist).

So here’s our CQL code to define table jaeger.traces:

CREATE TABLE jaeger.traces (
    trace_id blob,
    span_id bigint,
    span_hash bigint,
    duration bigint,
    flags int,
    logs list<frozen<log>>,
    operation_name text,
    parent_id bigint,
    process frozen<process>,
    refs list<frozen<span_ref>>,
    start_time bigint,
    tags list<frozen<keyvalue>>,
    PRIMARY KEY (trace_id, span_id, span_hash)
) WITH CLUSTERING ORDER BY (span_id ASC, span_hash ASC) AND
    default_time_to_live = 1209600 and compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_size': '17',
    'compaction_window_unit': 'HOURS'
}

We’ve discovered that compaction matters in case of frequently expiring data, as our Cassandra nodes suddenly woke up one day with the keyspace of Jaeger taking 300 GB.

The preceding considerations apply for each piece of data that is set to expire upon insertion so take care, and happy hacking!

Another solution would be to schedule frequent manual compaction via the nodetool utility. Remember that compaction is set not per keyspace but per table!

tl;dr – when expiring data via TTLs make sure your compaction can then efficiently dispose of the data in immutable SS-tables.

Leave a comment

Your email address will not be published. Required fields are marked *