When Time-Series Isn't the Hard Part

The framing question

The question "which time-series database should I use" carries a hidden premise: that the time-series part is the dominant constraint of the workload. In observability and monitoring, that premise often holds — billions of low-value metric points, tagged by host and service, queried by simple aggregations over time windows. In financial workloads, it usually does not.

The dominant constraint of a quantitative system is relational. Tick data is joined to instruments. Instruments are joined to corporate actions. Positions are joined to trades, trades to signals, signals to backtest configurations. Almost every interesting question crosses two or more of these tables. Any storage system that makes joins a second-class citizen forces all of this into application code, and forfeits the correctness guarantees that come with relational engines.

That reframes the decision. The right question is not "which TSDB is better" but "do I want a relational engine that handles time-series well, or a time-series engine that handles relations awkwardly?" The first is TimescaleDB. The second is InfluxDB before version 3, and arguably still after.

Two architectures, in one paragraph each

TimescaleDB is a PostgreSQL extension. It introduces hypertables — logical tables that are partitioned by time (and optionally by a space dimension) into many physical chunks. Inserts route to the current chunk; queries plan over a chunk-aware view of the data. Everything else — query planner, locks, MVCC, replication, extension ecosystem — is PostgreSQL. From the application's perspective, hypertables are tables. From the operator's perspective, they have ordinary Postgres operational characteristics.

InfluxDB, in its 1.x and 2.x lineage, is a purpose-built time-series engine with its own storage format (TSM), its own query languages (InfluxQL, then Flux), and a native data model centred on measurements, tags, and fields. Series are uniquely identified by the combination of measurement and tag values; cardinality of that key space governs performance. Version 3.0 reset the architecture toward Apache DataFusion, Apache Parquet, and Apache Arrow — making the engine far more SQL-friendly and cardinality-tolerant — but the trajectory is recent and the ecosystem is still catching up.

SQL is a feature, not an inconvenience

The most under-appreciated property of TimescaleDB is that its query language is PostgreSQL.

This has nothing to do with familiarity. It has to do with composition. A trading system needs to express questions like:

"For every position closed last week, what was the one-minute return profile of the underlying in the thirty minutes before exit?"
"Across all USDT pairs, return the cointegrated pairs whose half-life is between six and forty-eight hours, ordered by the average daily volume of the cheaper leg."
"For each strategy version deployed in the last quarter, what was the realised Sharpe and the worst three drawdowns, broken down by market regime?"

These are joins, window functions, aggregates and CTEs braided together. They are not exotic — they are exactly what relational SQL was built to express. Any storage system that demands these be expressed in a custom DSL or in application code is choosing to throw away half a century of database engineering.

InfluxDB 1.x's InfluxQL was a SQL lookalike that did not support joins. InfluxDB 2.x's Flux was a functional pipeline language that did, but with semantics unfamiliar to anyone outside the InfluxData ecosystem. InfluxDB 3.0 brings real SQL — a tacit acknowledgement that the previous two attempts were a mistake. By the time the language story stabilises, the cost of two migrations has been paid by anyone who built on the earlier versions.

Key point: A time-series database without first-class joins is fine for metrics. It is not fine for trading data, where almost every interesting question is relational by nature.

The cardinality problem that wasn't TimescaleDB's

InfluxDB's classic operational pain point was cardinality explosion. Each unique combination of tag values created a new in-memory series; high-cardinality tag spaces — every order ID, every transient instrument that listed and delisted, every fine-grained label — eventually overwhelmed the engine. The recommended remediation was schema discipline: keep cardinality low, denormalise tags, accept information loss.

TimescaleDB has no cardinality concept. Rows are rows. A column with billions of distinct values is just a column. Indexes on it consume disk and slow inserts proportionally to their size, but there is no special collapse point.

For a market that constantly spawns new symbols — perpetuals listing daily, tokens arriving, alt-tokens delisting — this matters. The schema does not need to anticipate the universe. The relational model does not punish you for a wide identifier space.

Version 3.0 of InfluxDB largely solves cardinality through its new storage architecture. But the institutional memory of "watch out for cardinality" persists in the surrounding tooling and in operator habits, and the recent reset means production scars from earlier versions are still being healed.

Compression, retention, and the read-only window

TimescaleDB compresses old chunks columnar-style. The compression encodes each column separately with delta encoding, run-length encoding, dictionary encoding, or Gorilla compression depending on the column type. Compression ratios on financial OHLCV data routinely sit between 90% and 95%. Decompression is transparent at query time.

The catch is that compressed chunks are effectively read-mostly. Updates and deletes against compressed data are possible since recent versions but slower than against uncompressed chunks. The standard discipline is to compress chunks once they pass the active-edit window — for OHLCV data, that might be one to seven days — and treat older history as append-only. This matches how financial data is actually used: corrections happen close to ingestion; deep history is queried, not edited.

InfluxDB compresses by default through the TSM engine. No active/compressed distinction is exposed; the engine handles it transparently. This is operationally simpler. Whether it ends in similar effective compression depends on the cardinality profile and the data's compressibility — for clean OHLCV, both engines reach broadly similar ratios.

Both systems support time-based retention policies that drop or archive chunks past a threshold. The mechanics differ; the user-visible feature is similar.

Continuous aggregates: downsampling done properly

Quantitative workflows almost always need pre-aggregated data. Backtests over five years of one-minute bars across four hundred symbols will not finish in interactive time without some pre-computation. The standard pattern is to maintain rolled-up resolutions — five-minute bars, one-hour bars, one-day bars — refreshed incrementally as new ticks arrive.

TimescaleDB's continuous aggregates are materialised views with a refresh policy. They sit on top of a hypertable, define their own time bucket, and recompute the windows that have changed since the last refresh. They behave like ordinary tables in queries — the planner sees them, optimises against them, joins through them. They support compression and retention independently of the source hypertable, which means a fine-grained source can be kept for thirty days and a hourly aggregate for a decade.

InfluxDB's analogous feature is tasks in 2.x or continuous queries in 1.x. Both work; the operator interface, the way they handle late-arriving data, and the way they integrate with downstream queries differ in details that matter at scale.

The practical question is not "does feature X exist" — both ecosystems have it — but how cleanly the abstraction composes with the rest of the workload. TimescaleDB's choice to make the aggregate a queryable table that the planner can reason about pays dividends every time a query needs to span source and aggregate, or join the aggregate to reference data.

Joins, foreign keys, the relational web

The point that finally settles the choice for most quant teams is the surrounding data, not the time-series itself.

A working trading system has, at minimum:

Instrument metadata — symbol, exchange, tick size, contract size, trading hours, status
Calendar information — holidays, settlement days, regulatory windows
Reference data — corporate actions, dividends, splits, listings and delistings
Position and order state
Trade history
Strategy parameters and their version history
Backtest configurations and outputs
Audit logs of every change to the above

In TimescaleDB, all of this is in the same database, with foreign keys, transactions, and the same SQL dialect across the board. A query that joins position state to instrument metadata to a tick window is a single SELECT statement, planned and optimised by one engine, with one consistent set of failure modes.

In an InfluxDB-centric architecture, the typical pattern is to keep tick data in InfluxDB and the rest in PostgreSQL or another relational store. Cross-system queries become application-level joins. Correctness guarantees evaporate at the boundary — there are no foreign keys across two engines, no transactions, no consistent snapshot. Observability and debugging become harder because every interesting query splits across two systems with different mental models. The split is doable. It is also a permanent tax.

Key point: The decision is rarely about the time-series performance characteristics. It is about whether the rest of the data — instruments, positions, trades, configurations — sits in the same engine or in a different one.

Operational characteristics

TimescaleDB inherits PostgreSQL's operational story, for better and for worse.

The good: a deep tooling ecosystem (psql, pg_dump, logical and physical replication, pg_stat_*, query plan visualisation, EXPLAIN ANALYZE), broad operator familiarity, well-understood failure modes, a mature monitoring story. Backup and point-in-time recovery are first-class. Connection security, role-based access control, and audit logging come from the underlying engine, not bolted on.

The bad: PostgreSQL operational realities apply unchanged. Autovacuum needs tuning at scale. WAL management matters. Replica lag has to be monitored. Connection pooling is mandatory — Postgres allocates one process per connection, which becomes a bottleneck at any non-trivial concurrency. PgBouncer in transaction mode is the standard answer, but it imposes constraints (no session-level features) that surprise teams that haven't run Postgres at scale before.

InfluxDB has a more compact operational surface. Fewer knobs, less to tune, less to misconfigure. For a small team running a handful of metric workloads, this is genuinely simpler. For a team that already runs PostgreSQL elsewhere — for the application database, for analytics, for any other persistent state — it adds a new operational vocabulary without removing the existing one.

For a quantitative system, the operational footprint is rarely the binding constraint. The binding constraint is data correctness across joined tables, which Postgres is built for.

Where InfluxDB still wins

Honestly enumerating the cases where the choice should go the other way:

Metrics and observability. This is the home turf. Telegraf integration, the InfluxDB-to-Grafana path, and pre-built collectors are unmatched in this space. If the workload is "monitor a fleet of services and dashboard their metrics", reach for Influx.
Edge and IoT collection. Lightweight footprint and HTTP-first ingestion make Influx attractive on resource-constrained devices. The relational machinery of Postgres is overkill here.
Pure time-series workloads with no relational structure. If the only question ever asked is "give me the average value of metric X over time window Y, grouped by tag Z", InfluxDB is purpose-built for it. There is no relational web to be lost.
InfluxDB 3.0's object-store architecture for cold data. For very cold data on S3-compatible storage, the new engine is genuinely interesting. TimescaleDB does not match this natively. For workloads that can keep ten years of microsecond-resolution data in object storage and rarely query the deep tail, this is a real advantage.

For financial trading systems, none of these usually apply. The use case is rarely pure time-series, almost always relational, and the data is operationally close enough that object-store cold storage is not the binding constraint.

Pitfalls in practice

Hypertable chunk sizing. The default chunk interval (seven days) is rarely right for financial data. Tick data wants much smaller chunks (hours to a day); daily OHLCV wants larger ones (months). Wrong sizing hurts both insert performance and query planner heuristics.
Index proliferation. Each index doubles in cost when the table is multi-billion-row. Be deliberate about which columns get indexes, and use partial indexes where the predicate is known.
Compression timing. Compressing a chunk before it stops receiving updates causes decompression cycles. Tie compression policy to the actual edit window of the data.
Continuous aggregate lag. Refresh policy granularity is a tradeoff between freshness and load. A one-minute refresh on a five-second source can starve the database; a one-hour refresh leaves dashboards stale.
Connection pooling. PgBouncer in transaction mode is mandatory at any non-trivial concurrency. This is a common surprise for teams arriving from a connection-light engine.
The autovacuum cliff. Postgres autovacuum tuning is the most common production failure mode. Default settings work fine until they don't, and the symptoms (bloat, slow queries, frozen rows) appear at the worst possible moment.

Warning

A choice made on synthetic benchmarks rarely survives contact with a real production schema. The honest evaluation is to run the actual workload — including its joins, its retention policies, and its peak insert rates — against both engines for a week.

When to use what

Workload shape	TimescaleDB	InfluxDB (≤ 2.x)	InfluxDB (3.x)
Pure metrics / observability	overkill	✓	✓
Edge / IoT collection	overkill	✓	✓
Tick data with relational metadata	✓	—	partial
Backtesting across instruments	✓	—	partial
Trading state (orders, positions, trades)	✓	—	—
Cold-data object-store archival	partial	—	✓
Mixed application + time-series workload	✓	—	partial

The pattern is clear. The further the workload sits from "pure time-series", the stronger the case for the relational option.

Summary

Property	TimescaleDB	InfluxDB (≤ 2.x)	InfluxDB (3.x)
Storage model	Postgres heap + columnar compression	TSM custom format	Apache Parquet on object store
Query language	PostgreSQL SQL	InfluxQL / Flux	SQL via DataFusion
Joins	First-class	Limited or absent	First-class
Cardinality handling	Unbounded	Constrained by series-key model	Improved
Continuous aggregates	Native, queryable as tables	Tasks / continuous queries	Improved
Foreign keys, transactions	Yes	No	No
Ecosystem	Postgres ecosystem	Influx ecosystem	Apache Arrow ecosystem
Operational model	Postgres-class	Compact, custom	Newer, evolving
Best fit	Relational time-series with joins	Pure metrics, observability	Cold object-store time-series

Choose the database whose data model matches the questions the workload will ask. For a quantitative system, those questions are relational, and the database that answers them best is the one whose foundations are relational.

References

Timescale documentation. Hypertables and chunks. docs.timescale.com.
Timescale documentation. Continuous aggregates. docs.timescale.com.
Timescale Engineering. Time-series compression algorithms. timescale.com/blog.
InfluxData. InfluxDB 3.0 architecture overview. influxdata.com.
Stonebraker, M., & Çetintemel, U. (2005). "One Size Fits All": An Idea Whose Time Has Come and Gone. ICDE 2005.
Pavlo, A., & Aslett, M. (2016). What's Really New with NewSQL? SIGMOD Record, 45(2), 45–55.
Pelkonen, T., et al. (2015). Gorilla: A Fast, Scalable, In-Memory Time Series Database. VLDB 2015. (For the compression algorithm both engines borrow from.)