When Time-Series Isn't the Hard Part

Why TimescaleDB beats InfluxDB for financial workloads — and what the choice is really about

The framing question

The question "which time-series database should I use" carries a hidden premise: that the time-series part is the dominant constraint of the workload. In observability and monitoring, that premise often holds — billions of low-value metric points, tagged by host and service, queried by simple aggregations over time windows. In financial workloads, it usually does not.

The dominant constraint of a quantitative system is relational. Tick data is joined to instruments. Instruments are joined to corporate actions. Positions are joined to trades, trades to signals, signals to backtest configurations. Almost every interesting question crosses two or more of these tables. Any storage system that makes joins a second-class citizen forces all of this into application code, and forfeits the correctness guarantees that come with relational engines.

That reframes the decision. The right question is not "which TSDB is better" but "do I want a relational engine that handles time-series well, or a time-series engine that handles relations awkwardly?" The first is TimescaleDB. The second is InfluxDB before version 3, and arguably still after.

Two architectures, in one paragraph each

TimescaleDB is a PostgreSQL extension. It introduces hypertables — logical tables that are partitioned by time (and optionally by a space dimension) into many physical chunks. Inserts route to the current chunk; queries plan over a chunk-aware view of the data. Everything else — query planner, locks, MVCC, replication, extension ecosystem — is PostgreSQL. From the application's perspective, hypertables are tables. From the operator's perspective, they have ordinary Postgres operational characteristics.

InfluxDB, in its 1.x and 2.x lineage, is a purpose-built time-series engine with its own storage format (TSM), its own query languages (InfluxQL, then Flux), and a native data model centred on measurements, tags, and fields. Series are uniquely identified by the combination of measurement and tag values; cardinality of that key space governs performance. Version 3.0 reset the architecture toward Apache DataFusion, Apache Parquet, and Apache Arrow — making the engine far more SQL-friendly and cardinality-tolerant — but the trajectory is recent and the ecosystem is still catching up.

SQL is a feature, not an inconvenience

The most under-appreciated property of TimescaleDB is that its query language is PostgreSQL.

This has nothing to do with familiarity. It has to do with composition. A trading system needs to express questions like:

These are joins, window functions, aggregates and CTEs braided together. They are not exotic — they are exactly what relational SQL was built to express. Any storage system that demands these be expressed in a custom DSL or in application code is choosing to throw away half a century of database engineering.

InfluxDB 1.x's InfluxQL was a SQL lookalike that did not support joins. InfluxDB 2.x's Flux was a functional pipeline language that did, but with semantics unfamiliar to anyone outside the InfluxData ecosystem. InfluxDB 3.0 brings real SQL — a tacit acknowledgement that the previous two attempts were a mistake. By the time the language story stabilises, the cost of two migrations has been paid by anyone who built on the earlier versions.

Key point: A time-series database without first-class joins is fine for metrics. It is not fine for trading data, where almost every interesting question is relational by nature.

The cardinality problem that wasn't TimescaleDB's

InfluxDB's classic operational pain point was cardinality explosion. Each unique combination of tag values created a new in-memory series; high-cardinality tag spaces — every order ID, every transient instrument that listed and delisted, every fine-grained label — eventually overwhelmed the engine. The recommended remediation was schema discipline: keep cardinality low, denormalise tags, accept information loss.

TimescaleDB has no cardinality concept. Rows are rows. A column with billions of distinct values is just a column. Indexes on it consume disk and slow inserts proportionally to their size, but there is no special collapse point.

For a market that constantly spawns new symbols — perpetuals listing daily, tokens arriving, alt-tokens delisting — this matters. The schema does not need to anticipate the universe. The relational model does not punish you for a wide identifier space.

Version 3.0 of InfluxDB largely solves cardinality through its new storage architecture. But the institutional memory of "watch out for cardinality" persists in the surrounding tooling and in operator habits, and the recent reset means production scars from earlier versions are still being healed.

Compression, retention, and the read-only window

TimescaleDB compresses old chunks columnar-style. The compression encodes each column separately with delta encoding, run-length encoding, dictionary encoding, or Gorilla compression depending on the column type. Compression ratios on financial OHLCV data routinely sit between 90% and 95%. Decompression is transparent at query time.

The catch is that compressed chunks are effectively read-mostly. Updates and deletes against compressed data are possible since recent versions but slower than against uncompressed chunks. The standard discipline is to compress chunks once they pass the active-edit window — for OHLCV data, that might be one to seven days — and treat older history as append-only. This matches how financial data is actually used: corrections happen close to ingestion; deep history is queried, not edited.

InfluxDB compresses by default through the TSM engine. No active/compressed distinction is exposed; the engine handles it transparently. This is operationally simpler. Whether it ends in similar effective compression depends on the cardinality profile and the data's compressibility — for clean OHLCV, both engines reach broadly similar ratios.

Both systems support time-based retention policies that drop or archive chunks past a threshold. The mechanics differ; the user-visible feature is similar.

Continuous aggregates: downsampling done properly

Quantitative workflows almost always need pre-aggregated data. Backtests over five years of one-minute bars across four hundred symbols will not finish in interactive time without some pre-computation. The standard pattern is to maintain rolled-up resolutions — five-minute bars, one-hour bars, one-day bars — refreshed incrementally as new ticks arrive.

TimescaleDB's continuous aggregates are materialised views with a refresh policy. They sit on top of a hypertable, define their own time bucket, and recompute the windows that have changed since the last refresh. They behave like ordinary tables in queries — the planner sees them, optimises against them, joins through them. They support compression and retention independently of the source hypertable, which means a fine-grained source can be kept for thirty days and a hourly aggregate for a decade.

InfluxDB's analogous feature is tasks in 2.x or continuous queries in 1.x. Both work; the operator interface, the way they handle late-arriving data, and the way they integrate with downstream queries differ in details that matter at scale.

The practical question is not "does feature X exist" — both ecosystems have it — but how cleanly the abstraction composes with the rest of the workload. TimescaleDB's choice to make the aggregate a queryable table that the planner can reason about pays dividends every time a query needs to span source and aggregate, or join the aggregate to reference data.

Joins, foreign keys, the relational web

The point that finally settles the choice for most quant teams is the surrounding data, not the time-series itself.

A working trading system has, at minimum:

In TimescaleDB, all of this is in the same database, with foreign keys, transactions, and the same SQL dialect across the board. A query that joins position state to instrument metadata to a tick window is a single SELECT statement, planned and optimised by one engine, with one consistent set of failure modes.

In an InfluxDB-centric architecture, the typical pattern is to keep tick data in InfluxDB and the rest in PostgreSQL or another relational store. Cross-system queries become application-level joins. Correctness guarantees evaporate at the boundary — there are no foreign keys across two engines, no transactions, no consistent snapshot. Observability and debugging become harder because every interesting query splits across two systems with different mental models. The split is doable. It is also a permanent tax.

Key point: The decision is rarely about the time-series performance characteristics. It is about whether the rest of the data — instruments, positions, trades, configurations — sits in the same engine or in a different one.

Operational characteristics

TimescaleDB inherits PostgreSQL's operational story, for better and for worse.

The good: a deep tooling ecosystem (psql, pg_dump, logical and physical replication, pg_stat_*, query plan visualisation, EXPLAIN ANALYZE), broad operator familiarity, well-understood failure modes, a mature monitoring story. Backup and point-in-time recovery are first-class. Connection security, role-based access control, and audit logging come from the underlying engine, not bolted on.

The bad: PostgreSQL operational realities apply unchanged. Autovacuum needs tuning at scale. WAL management matters. Replica lag has to be monitored. Connection pooling is mandatory — Postgres allocates one process per connection, which becomes a bottleneck at any non-trivial concurrency. PgBouncer in transaction mode is the standard answer, but it imposes constraints (no session-level features) that surprise teams that haven't run Postgres at scale before.

InfluxDB has a more compact operational surface. Fewer knobs, less to tune, less to misconfigure. For a small team running a handful of metric workloads, this is genuinely simpler. For a team that already runs PostgreSQL elsewhere — for the application database, for analytics, for any other persistent state — it adds a new operational vocabulary without removing the existing one.

For a quantitative system, the operational footprint is rarely the binding constraint. The binding constraint is data correctness across joined tables, which Postgres is built for.

Where InfluxDB still wins

Honestly enumerating the cases where the choice should go the other way:

For financial trading systems, none of these usually apply. The use case is rarely pure time-series, almost always relational, and the data is operationally close enough that object-store cold storage is not the binding constraint.

Pitfalls in practice

Warning

A choice made on synthetic benchmarks rarely survives contact with a real production schema. The honest evaluation is to run the actual workload — including its joins, its retention policies, and its peak insert rates — against both engines for a week.

When to use what

Workload shape TimescaleDB InfluxDB (≤ 2.x) InfluxDB (3.x)
Pure metrics / observability overkill
Edge / IoT collection overkill
Tick data with relational metadata partial
Backtesting across instruments partial
Trading state (orders, positions, trades)
Cold-data object-store archival partial
Mixed application + time-series workload partial

The pattern is clear. The further the workload sits from "pure time-series", the stronger the case for the relational option.

Summary

Property TimescaleDB InfluxDB (≤ 2.x) InfluxDB (3.x)
Storage model Postgres heap + columnar compression TSM custom format Apache Parquet on object store
Query language PostgreSQL SQL InfluxQL / Flux SQL via DataFusion
Joins First-class Limited or absent First-class
Cardinality handling Unbounded Constrained by series-key model Improved
Continuous aggregates Native, queryable as tables Tasks / continuous queries Improved
Foreign keys, transactions Yes No No
Ecosystem Postgres ecosystem Influx ecosystem Apache Arrow ecosystem
Operational model Postgres-class Compact, custom Newer, evolving
Best fit Relational time-series with joins Pure metrics, observability Cold object-store time-series

Choose the database whose data model matches the questions the workload will ask. For a quantitative system, those questions are relational, and the database that answers them best is the one whose foundations are relational.

References

← Back to Articles