Very large database

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Djm-leighpark (talk | contribs) at 09:24, 13 October 2018 (→‎References: Ref prep in progress but sudden interruption .. TBC). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A very large database, (originally written very large data base) or VLDB,[1] is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies .[2][3][4][5]

Definition

The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements has continually changed what is considered very large.[6][7]

One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet".[8]

Sizes of a VLDB Database

There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data.[citation needed] That said, VLDB issues may start to appear when 1TB is approached,[8][9] and are more than likely to have appeared as 30TB or so is exceeded.[10]

VLDB Challenges

Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources.[11]: 11 

Configuration

Careful configuration of databases that lie in the VLDB is necessary to alleviate or reduce issues raise by VLDB databases.[11]: 36–53 [12]

Administration

The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases.[13]

Availability and maintenance

When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB become take very significant amounts of time and resource for a VLDB database.[14]. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives.[13] To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO).[citation needed][13][15][16] For many systems only geographically remote solutions may be acceptable.[17]

Backup and Recovery

Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution.[citation needed][18]

Performance

Given the same infrastructure there may typically be a decrease in performance, that is increase in response time as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer; while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data.[citation needed][19] Other effects can be caching becoming less efficient and while some indexes such the B+ automatically sustain well with growth others may need to be rebuilt.[citation needed]

Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of contention will increase.[citation needed] Some solutions to regaining performance include partitioning, clustering, possibly with sharding, or use of a database machine.[citation needed][20]

Partitioning

Partitioning may be able to performance of bulk operations on a VLDB including backup and recovery.[21], bulk movements due to information lifecycle management (ILM)[citation needed], as well as allowing optimization of some query processing[citation needed].

Storage

In order to satisfy performance of a VLDB the database storage need to have low access latency and contention, high bandwidth, and high availability.[citation needed]

Server resources

The increasing size of a VLDB may but pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve.[citation needed]

Relationship to big data

VLDB is not the same as big data, however the storage aspect of big data may involve a VLDB database.[2] That said some of the storage solutions supporting big data were designed from the start to support large volumes of data, so database administrators do not encounter VLDB issues that older versions of traditional RDBMS's might encounter.[citation needed]

References

  1. ^ "Oracle Database Online Documentation 11g Release 1 (11.1) / Database Administration Database Concepts". oracle. 18 Very Large Databases (VLDB). Retrieved 3 October 2018.
  2. ^ a b "Very Large Database (VLDB)". Technopedia. Archived from the original on 4 July 2018. Retrieved 3 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  3. ^ Gaines, R. S. and R. Gammill. Very Large Data Bases: An Emerging Research Area, Informal working paper, RAND Corporation
  4. ^ Data Processing Magazine. North American Publishing Company. 1964. p. 18,58.
  5. ^ Widlake, Marin (18 September 2009). "What is a VLDB?". mwidlake. Archived from the original on 6 October 2018. Retrieved 7 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  6. ^ Sidley, Edgar H. (1 April 1980). Encyclopedia of Computer Science and Technology: Volume 14 - Very Large Data Base Systems to Zero-Memory and Markov Information Source. CRC Press. pp. 1–18. ISBN 9780824722142.
  7. ^ Gerritsen, Rob; Morgan, Howard; Zisman, Michael (June 1977). "On some metrics for databases or what is a very large database?". ACM SIGMOD Record. 9 (1): 50–74. doi:10.1145/984382.984393. ISSN 0163-5808.
  8. ^ a b Rankins, Ray; Jensen, Paul; Bertucci, Paul (18 December 2002). "21". Microsoft SQL Server 2000 (2nd ed.). SAMS. ISBN 978-0672324673. Administering Very Large SQL Server Databases.
  9. ^ "Oracle Database Release 18 - VLDB and Partitioning Guide". Oracle. 1 Introduction to Very Large Databases. Archived from the original on 3 October 2018. Retrieved 3 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  10. ^ "The Very Large Database Problem - How to Backup & Recover 30–100 TB Databases" (PDF). actifio. Archived from the original (PDF) on 19 February 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  11. ^ a b Hussain, Syed Jaffer (2014). "Tuning & Applying Best Practices On Very Large Databases (VLDB)" (PDF). Sangam: AIOUG (PDF). {{cite web}}: Check |archive-url= value (help); Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  12. ^ Chaves, Warner (7 January 2015). "Top 10 Must-Do Items for your SQL Server Very Large Database". SQLTURBO. Archived from the original on 13 December 2015. Retrieved 5 October 2018. {{cite web}}: |archive-date= / |archive-url= timestamp mismatch; 13 December 2017 suggested (help); Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  13. ^ a b c Furman, Dimitri (22 January 2018). "SQL Server VLDB in Azure: DBA Tasks Made Simple". MSDN. Archived from the original on 6 October 2018. Retrieved 6 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help); Unknown parameter |editors= ignored (|editor= suggested) (help)
  14. ^ "Specialized Requirements for Relational Data Warehouse Servers". Red Brick Systems, Inc. 21 June 1996. Archived from the original on 10 October 1997. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  15. ^ Chien, Tim. "Snapshots Are NOT Backups". Oracle technetwork. Archived from the original on 7 September 2018. Retrieved 10 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  16. ^ "Using a split mirror as a backup image". IBM Knowledge Center. Archived from the original on 9 January 2018. Retrieved 10 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  17. ^ "Chapter 1 High Availability and Scalability". dev.mysql. Archived from the original on 15 December 2018. Retrieved 12 October 2018. {{cite web}}: |archive-date= / |archive-url= timestamp mismatch; 15 December 2016 suggested (help); Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  18. ^ Akhtar, Ali Navid; Buchholtz, Jeff; Ryan, Michael; Setty, Kumar (2012). "Database Backup and Recovery Best Practices". Archived from the original on 29 June 2018. Retrieved 12 October 2012. {{cite web}}: Cite has empty unknown parameter: |1= (help); Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  19. ^ Tariq, Ovais (14 July 2011). "Understanding B+tree Indexes and how they Impact Performance". ovaistariq.net. Archived from the original on 7 February 2018. Retrieved 10 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  20. ^ "Encyclopedia". Definition of: database machine. Archived from the original on 4 July 2016. Retrieved 10 October 2018. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
  21. ^ Burleson, Donald (26 March 2015). "Oracle Backup VLDB tips". Burleson Consulting. Archived from the original on 30 June 2017. Retrieved 11 October 2016. {{cite web}}: Unknown parameter |dead-url= ignored (|url-status= suggested) (help)