Time series database
A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace.
Despite the disparate names, many of the same mathematical operations, queries, or database transactions are useful for analysing all of them. The implementation of a database that can correctly, reliably, and efficiently implement these operations must be specialized for time-series data.
TSDBs are databases that are optimized for time series data. Software with complex logic or business rules and high transaction volume for time series data may not be practical with traditional relational database management systems. Flat file databases are not a viable option either, if the data and transaction volume reaches a maximum threshold determined by the capacity of individual servers (processing power and storage capacity). Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in a relational database. Compositions of those rules are even more difficult. This is a problem compounded by the free nature of relational systems themselves. Many relational systems are often not modelled correctly with respect to time series data. TSDBs on the other hand impose a model and this allows them to provide more features for doing so.
Ideally, these repositories are often natively implemented using specialized database algorithms. However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema. Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension. Database joins across multiple time series data sets is only practical when the time tag associated with each data entry spans the same set of discrete times for all data sets across which the join is performed.
The TSDB allows users to create, enumerate, update and destroy various time series and organize them in some fashion. These series may be organized hierarchically and optionally have companion metadata available with them. The server often supports a number of basic calculations that work on a series as a whole, such as multiplying, adding, or otherwise combining various time series into a new time series. They can also filter on arbitrary patterns defined by the day of the week, low value filters, high value filters, or even have the values of one series filter another. Some TSDBs also build in additional statistical functions that are targeted to time series data.
For example, consider the following hypothetical "time series" or "profile" expression:
SELECT nymex/gold_price * nymex/gold_volume
To analyze this, the TSDB would join the two series nymex/gold_price and nymex/gold_volume based on the overlapping areas of time for each, multiply the values where they intersect, and then output a single composite time series.
More complex expressions are allowed. TSDBs often allow users to manage a repository of filters or masks that specify in some way a pattern based on the day of a week and a set of holidays. In this way, one can readily assemble time series data. Assuming such a filter exists, one might hypothetically write
SELECT onpeak( cellphoneusage )
which would extract out the time series of cellphoneusage that only intersects that of 'onpeak'. Some systems might generalize the filter to be a time series itself.
This syntactical simplicity drives the appeal of the TSDB. For example, a simple utility bill might be implemented using a query such as:
SELECT MAX( onpeak( powerusagekw ) ) * demand_charge; SELECT SUM( onpeak( powerusagekwh ) ) * energy_charge;
TSDBs also generally have conversions to and from specific time zones implemented at the server level.
Supported Time Series Transformations
|union||Merge multiple time-series into a single multivariate time-series. Retain timestamps with incomplete values.|
|intersect||Merge multiple time-series into a single multivariate time-series. Discard timestamps with incomplete values.|
|left_join||Merge two time-series into a single time-series containing two variables. Merged series has the same timestamps as the first (left) time-series.|
|right_join||Merge two time-series into a single time-series containing two variables. Merged series has the same timestamps as the second (right) time-series.|
|except||Remove one time series from another. Remove timestamps from first series that exist in second series.|
|regularize||Modify timestamps and values so that frequency (interval between consecutive samples) is constant.|
|filter||Retain timestamps that match specified condition, such as calendar, day-of-weeks/months, value filter.|
|interpolate||Add missing timestamps and values based on interpolation function.|
|lag||Modify timestamps by shifting them k steps right (k > 0) or left (k < 0) and drop k values without timestamps.|
|smooth||Replace values with statistical functions applied to a sliding window (count or duration based).|
|difference||Replace each value with difference/ratio between current value and value some steps back/forward.|
|aggregate||Convert a time series to a specified periodicity lower than the given series has.|
|round||Truncate (round) time to the nearest second, minute, hour.|
|range||Select sub-series based on start and end-time.|
|math||Apply mathematical function to each value, e.g. log(v), or square root(v)|
|endpoints||Extract index values of a time series corresponding to the last observations in each period.|
|split||Split a given time series along time periods and create a list of shorter time series.|
Supporting Time Series Data in a Relational Database
A workable implementation of a time series database can be deployed in a conventional SQL-based relational database provided that the database software supports both binary large objects (BLOBs) and user-defined functions. SQL statements that operate on one or more time series quantities on the same row of a table or join can easily be written, as the user-defined time series functions operate comfortably inside of a SELECT statement. However, time series functionality such as a SUM function operating in the context of a GROUP BY clause cannot be easily achieved.
Example TSDB Systems
The following list of open source and commercial systems are believed to provide specialised support for time series data. Some NoSQL systems may also claim support for this type of data, but if they are not explicitly designed with that use case in mind then they will not be listed here.
- KairosDB - built on top of Cassandra - fork of OpenTSDB
- Newts - built on Cassandra
- OpenTSDB - built on top of HBase
- TSDB (Time Series Database)
- eXtremeDB Financial Edition
- Axibase Time Series Database (ATSD) - uses HBase for raw storage
- Oracle Database
- SAP HANA
- TempoIQ (formerly TempoDB)
- Treasure Data
- Informix TimeSeries
- Cityzen Data
- Canary Labs