Time series database

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces.[1]

Ideally, repositories of time series are natively implemented using specialized database algorithms.[2] However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema.[citation needed] Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension.[2]

Overview[edit]

A time series database allows users to create, enumerate, update and destroy various time series and organize them. The server often supports a number of basic calculations that work on a series as a whole, such as multiplying, adding, or otherwise combining various time series into a new time series.[citation needed] They can also filter on arbitrary patterns such as time ranges, low value filters, high value filters, or even have the values of one series filter another.[citation needed] Some TSDBs also build in additional statistical functions that are targeted to time series data.[citation needed]

For example, for the following expression:

select gold_price * gold_volume

the TSDB would join the two series 'gold_price' and 'gold_volume' based on the overlapping areas of time for each, multiply the values where they intersect, and then output a single composite time series.

TSDBs often allow users to manage a repository of filters or masks that specify in some way a pattern. In this way, one can readily assemble time series data. Assuming such a filter exists, one might hypothetically write

select onpeak( cellphoneusage )

which would extract out the time series of cellphoneusage that only intersects that of 'onpeak'.

This syntactical simplicity drives the appeal of the TSDB. For example, a simple utility bill might be implemented using a query such as:

select max( onpeak( powerusagekw ) ) * demand_charge;

select sum( onpeak( powerusagekwh ) ) * energy_charge;

Supporting time series data in a relational database[edit]

A workable implementation of a time series database can be deployed in a conventional SQL-based relational database provided that the database software supports both binary large objects (BLOBs) and user-defined functions. SQL statements that operate on one or more time series quantities on the same row of a table or join can easily be written, as the user-defined time series functions operate comfortably inside of a SELECT statement. However, time series functionality such as a SUM function operating in the context of a GROUP BY clause cannot be easily achieved.[citation needed]

List of time series databases[edit]

The following database systems have functionality optimized for handling time series data.

Name License Language References
SamayDB Proprietary C / C++ [3]
Atlas Apache License 2.0[4] Java [5]
Cube Apache License 2.0[6] JavaScript [5]
DalmatinerDB MIT[7] Erlang [5]
Druid Apache License 2.0 Java [5]
eXtremeDB Commercial SQL, Python, C / C++, Java, and C# [5]
InfluxDB MIT.[8] Chronograf AGPLv3, Clustering Commercial[9] Go [5][10]
Informix TimeSeries Commercial C / C++ [5][11]
IRONdb Commercial C / C++ [5][12]
KairosDB Apache License 2.0[13] Java [5]
Kx kdb+ Commercial Q [5]
OpenTSDB GPLv3+[14] Java [5]
Prometheus Apache License 2.0 Go [5]
Riak-TS Apache License 2.0 Erlang [5]
RRDtool GPLv2 C [5]
TimescaleDB Apache License 2.0[15] C [5][10][16][17]
Whisper (Graphite) Apache 2 Python [18]

See also[edit]

References[edit]

  1. ^ Villar-Rodriguez, Esther; Del Ser, Javier; Oregi, Izaskun; Bilbao, Miren Nekane; Gil-Lopez, Sergio (2017). "Detection of non-technical losses in smart meter data based on load curve profiling and time series analysis". Energy. 137: 118–128. doi:10.1016/j.energy.2017.07.008.
  2. ^ a b Pelkonen, Tuomas; Franklin, Scott; Teller, Justin; Cavallaro, Paul; Huang, Qi; Meza, Justin; Veeraraghavan, Kaushik (2015). "Gorilla". Proceedings of the Vldb Endowment. 8 (12): 1816–1827. doi:10.14778/2824032.2824078.
  3. ^ "Bloomberg SamayDB".
  4. ^ "atlas license". GitHub. Retrieved 2018-10-03.
  5. ^ a b c d e f g h i j k l m n o Stephens, Rachel (2018-04-03). "State of the Time Series Database Market". Retrieved 2018-10-03.
  6. ^ "cube license". GitHub. Retrieved 2018-10-03.
  7. ^ "dalmatinerdb license". GitHub. Retrieved 2018-10-03.
  8. ^ "influxdb license". GitHub. Retrieved 2016-08-14.
  9. ^ "influxdb clustering". influxdata.com. Retrieved 2016-03-10.
  10. ^ a b Anadiotis, George (2018-09-28). "Processing time series data: What are the options?". zdnet.com. Retrieved 2016-03-10.
  11. ^ Dantale, Viabhav (2012-09-21). Solving Business Problems with Informix TimeSeries (PDF). IBM Redbooks. ISBN 9780738437231.
  12. ^ Schlossnagle, Theo (2018-01-08). "Monitoring in a DevOps World". Retrieved 2018-10-03.
  13. ^ "kairosdb license". GitHub. Retrieved 2018-10-03.
  14. ^ "opentsdb license". GitHub. Retrieved 2018-10-03.
  15. ^ "timescaledb license". GitHub. Retrieved 2018-10-03.
  16. ^ Slabber, Martin; Joubert, Francois; Ockards, Muhammed Toufeeq (2018). "Scalable Time Series Documents Store". Proceedings of the 16Th Int. Conf. On Accelerator and Large Experimental Control Systems. ICALEPCS2017. doi:10.18429/JACoW-ICALEPCS2017-TUBPA06.
  17. ^ Skoviera, Martin (18 September 2017). "Cyclops 3.0 release with rule engine". Retrieved 2018-10-11.
  18. ^ Joshi, Nishes (May 23, 2012). Interoperability in monitoring and reporting systems (Thesis). hdl:10852/9085.