This article has multiple issues. Please help to improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
Electronic discovery (also e-discovery or ediscovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often referred to as electronically stored information or ESI). Electronic discovery is subject to rules of civil procedure and agreed-upon processes, often involving review for privilege and relevance before data are turned over to the requesting party.
Electronic information is considered different from paper information because of its intangible form, volume, transience and persistence. Electronic information is usually accompanied by metadata that is not found in paper documents and that can play an important part as evidence (e.g. the date and time a document was written could be useful in a copyright case). The preservation of metadata from electronic documents creates special challenges to prevent spoliation. In the United States, at the federal level, electronic discovery is governed by common law, case law, specific statutes, but primarily by the Federal Rules of Civil Procedure (FRCP), including amendments effective December 1, 2006, and December 1, 2015. In addition, state law and regulatory agencies increasingly also address issues relating to electronic discovery. Other jurisdictions around the world also have rules relating to electronic discovery, including Part 31 of the Civil Procedure Rules in England and Wales.
Stages of process
The Electronic Discovery Reference Model (EDRM) is a ubiquitous diagram that represents a conceptual view of these stages involved in the e-discovery process.
The identification phase is when potentially responsive documents are identified for further analysis and review. In the United States, in Zubulake v. UBS Warburg, Hon. Shira Scheindlin ruled that failure to issue a written legal hold notice whenever litigation is reasonably anticipated will be deemed grossly negligent. This holding brought additional focus to the concepts of legal holds, eDiscovery, and electronic preservation. Custodians who are in possession of potentially relevant information or documents are identified. To ensure a complete identification of data sources, data mapping techniques are often employed. Since the scope of data can be overwhelming or uncertain in this phase, attempts are made to reasonably reduce the overall scope during this phase - such as limiting the identification of documents to a certain date range or custodians.
A duty to preserve begins upon the reasonable anticipation of litigation. During preservation, data identified as potentially relevant is placed in a legal hold. This ensures that data cannot be destroyed. Care is taken to ensure this process is defensible, while the end-goal is to reduce the possibility of data spoliation or destruction. Failure to preserve can lead to sanctions. Even if the court ruled the failure to preserve as negligence, they can force the accused to pay fines if the lost data puts the defense "at an undue disadvantage in establishing their defense."
Once documents have been preserved, collection can begin. Collection is the transfer of data from a company to their legal counsel, who will determine relevance and disposition of data. Some companies that deal with frequent litigation have software in place to quickly place legal holds on certain custodians when an event (such as legal notice) is triggered and begin the collection process immediately. Other companies may need to call in a digital forensics expert to prevent the spoliation of data. The size and scale of this collection is determined by the identification phase.
During the processing phase, native files are prepared to be loaded into a document review platform. Often, this phase also involves the extraction of text and metadata from the native files. Various data culling techniques are employed during this phase, such as deduplication and de-NISTing. Sometimes native files will be converted to a petrified, paper-like format (such as PDF or TIFF) at this stage, to allow for easier redaction and bates-labeling.
Modern processing tools can also employ advanced analytic tools to help document review attorneys more accurately identify potentially relevant documents.
During the review phase, documents are reviewed for responsiveness to discovery requests and for privilege. Different document review platforms can assist in many tasks related to this process, including the rapid identification of potentially relevant documents, and the culling of documents according to various criteria (such as keyword, date range, etc.). Most review tools also make it easy for large groups of document review attorneys to work on cases, featuring collaborative tools and batches to speed up the review process and eliminate work duplication.
Documents are turned over to opposing counsel, based on agreed-upon specifications. Often this production is accompanied by a load file, which is used to load documents into a document review platform. Documents can be produced either as native files, or in a petrified format (such as PDF or TIFF), alongside metadata.
Types of electronically stored information
Any data that is stored in an electronic form may be subject to production under common eDiscovery rules. This type of data has historically included email and office documents, but can also include photos, video, databases, and other filetypes.
Also included in e-discovery is "raw data", which forensic investigators can review for hidden evidence. The original file format is known as the "native" format. Litigators may review material from e-discovery in one of several formats: printed paper, "native file", or a petrified, paper-like format, such as PDF files or TIFF images. Modern document review platforms accommodate the use of native files, and allow for them to be converted to TIFF and Bates-stamped for use in court.
In 2006, the U.S. Supreme Court's amendments to the Federal Rules of Civil Procedure created a category for electronic records that, for the first time, explicitly named emails and instant message chats as likely records to be archived and produced when relevant.
One type of preservation problem arose during the Zubulake v. UBS Warburg LLC lawsuit. Throughout the case, the plaintiff claimed that the evidence needed to prove the case existed in emails stored on UBS' own computer systems. Because the emails requested were either never found or destroyed, the court found that it was more likely that they existed than not. The court found that while the corporation's counsel directed that all potential discovery evidence, including emails, be preserved, the staff that the directive applied to did not follow through. This resulted in significant sanctions against UBS.
Some archiving systems apply a unique code to each archived message or chat to establish authenticity. The systems prevent alterations to original messages, messages cannot be deleted, and the messages cannot be accessed by unauthorized persons.
The formalized changes to the Federal Rules of Civil Procedure in December 2006 and in 2007 effectively forced civil litigants into a compliance mode with respect to their proper retention and management of electronically stored information (ESI). Improper management of ESI can result in a finding of spoliation of evidence and the imposition of one or more sanctions including an adverse inference jury instructions, summary judgment, monetary fines, and other sanctions. In some cases, such as Qualcomm v. Broadcom, attorneys can be brought before the bar.
Databases and other structured data
Structured data typically resides in databases or datasets. It is organized in tables with columns and rows along with defined data types. The most common are Relational Database Management Systems (RDBMS) that are capable of handling large volumes of data such as Oracle, IBM DB2, Microsoft SQL Server, Sybase, and Teradata. The structured data domain also includes spreadsheets (not all spreadsheets contain structured data, but those that have data organized in database-like tables), desktop databases like FileMaker Pro and Microsoft Access, structured flat files, XML files, data marts, data warehouses, etc.
Voicemail is often discoverable under electronic discovery rules. Employers may have a duty to retain voicemail if there is an anticipation of litigation involving that employee. Data from voice assistants like Amazon Alexa and Siri have been used in criminal cases.
Although petrifying documents to static image formats (tiff & jpeg) had become the standard document review method for almost two decades, native format review has increased in popularity as a method for document review since around 2004. Because it requires the review of documents in their original file formats, applications and toolkits capable of opening multiple file formats have also become popular. This is also true in the ECM (Enterprise Content Management) storage markets which are converging quickly with ESI technologies.
Petrification involves the conversion of native files into an image format that does not require use of the native applications. This is useful in the redaction of privileged or sensitive information, since redaction tools for images are traditionally more mature, and easier to apply on uniform image types by non-technical people. Efforts to redact similarly petrified PDF files by incompetent personnel have resulted in the removal of redacted layers and exposure of redacted information, such as social security numbers and other private information.
Traditionally, electronic discovery vendors had been contracted to convert native files into TIFF images (for example 10 images for a 10-page Microsoft Word document) with a load file for use in image-based discovery review database applications. Increasingly, database review applications have embedded native file viewers with TIFF-capabilities. With both native and image file capabilities, it could either increase or decrease the total necessary storage, since there may be multiple formats and files associated with each individual native file. Deployment, storage, and best practices are becoming especially critical and necessary to maintain cost-effective strategies.
Structured data are most often produced in delimited text format. When the number of tables subject to discovery is large or relationships between the tables are of essence, the data are produced in native database format or as a database backup file.
A number of different people may be involved in an electronic discovery project: lawyers for both parties, forensic specialists, IT managers, and records managers, amongst others. Forensic examination often uses specialized terminology (for example "image" refers to the acquisition of digital media) which can lead to confusion.
While attorneys involved in case litigation try their best to understand the companies and organization they represent, they may fail to understand the policies and practices that are in place in the company's IT department. As a result, some data may be destroyed after a legal hold has been issued by unknowing technicians performing their regular duties. To combat this trend, many companies are deploying software which properly preserves data across the network, preventing inadvertent data spoliation.
Given the complexities of modern litigation and the wide variety of information systems on the market, electronic discovery often requires IT professionals from both the attorney's office (or vendor) and the parties to the litigation to communicate directly to address technology incompatibilities and agree on production formats. Failure to get expert advice from knowledgeable personnel often leads to additional time and unforeseen costs in acquiring new technology or adapting existing technologies to accommodate the collected data.
Alternative collection methods
Currently the two main approaches for identifying responsive material on custodian machines are:
(1) where physical access to the organizations network is possible - agents are installed on each custodian machine which push large amounts of data for indexing across the network to one or more servers that have to be attached to the network or
(2) for instances where it is impossible or impractical to attend the physical location of the custodian system - storage devices are attached to custodian machines (or company servers) and then each collection instance is manually deployed.
In relation to the first approach there are several issues:
- In a typical collection process large volumes of data are transmitted across the network for indexing and this impacts normal business operations
- The indexing process is not 100% reliable in finding responsive material
- IT administrators are generally unhappy with the installation of agents on custodian machines
- The number of concurrent custodian machines that can be processed is severely limited due to the network bandwidth required
New technology is able to address problems created by the first approach by running an application entirely in memory on each custodian machine and only pushing responsive data across the network. This process has been patented and embodied in a tool that has been the subject of a conference paper.
In relation to the second approach, despite self-collection being a hot topic in eDiscovery, concerns are being addressed by limiting the involvement of the custodian to simply plugging in a device and running an application to create an encrypted container of responsive documents.
Technology-assisted review (TAR)—also known as computer-assisted review or predictive coding—involves the application of supervised machine learning or rule-based approaches to infer the relevance (or responsiveness, privilege, or other categories of interest) of ESI. Technology-assisted review has evolved rapidly since its inception circa 2005.
Recently a U.S. court has declared that it is "black letter law that where the producing party wants to utilize TAR for document review, courts will permit it." In a subsequent matter, the same court stated,
To be clear, the Court believes that for most cases today, TAR is the best and most efficient search tool. That is particularly so, according to research studies (cited in Rio Tinto), where the TAR methodology uses continuous active learning ("CAL") which eliminates issues about the seed set and stabilizing the TAR tool. The Court would have liked the City to use TAR in this case. But the Court cannot, and will not, force the City to do so. There may come a time when TAR is so widely used that it might be unreasonable for a party to decline to use TAR. We are not there yet. Thus, despite what the Court might want a responding party to do, Sedona Principle 6 controls. Hyles' application to force the City to use TAR is DENIED.
Convergence with information governance
Anecdotal evidence for this emerging trend points to the business value of information governance (IG), defined by Gartner as "the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival, and deletion of information. It includes the processes, roles, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals."
As compared to eDiscovery, information governance as a discipline is rather new. Yet there is traction for convergence. eDiscovery—as a multibillion-dollar industry—is rapidly evolving, ready to embrace optimized solutions that strengthen cybersecurity (for cloud computing). Since the early 2000s eDiscovery practitioners have developed skills and techniques that can be applied to information governance. Organizations can apply the lessons learned from eDiscovery to accelerate their path forward to a sophisticated information governance framework.
The Information Governance Reference Model (IGRM) illustrates the relationship between key stakeholders and the Information Lifecycle and highlights the transparency required to enable effective governance. Notably, the updated IGRM v3.0 emphasizes that Privacy & Security Officers are essential stakeholders. This topic is addressed in an article entitled "Better E-Discovery: Unified Governance and the IGRM", published by the American Bar Association.
- Data mining
- Data retention
- Discovery (law)
- Early case assessment
- Electronically stored information (Federal Rules of Civil Procedure)
- File hosting service
- Forensic search
- Information governance
- Legal governance, risk management, and compliance
- Machine learning
- Telecommunications data retention
- Various (2009). Eoghan Casey (ed.). Handbook of Digital Forensics and Investigation. Academic Press. p. 567. ISBN 0-12-374267-6. Retrieved 27 August 2010.
- "Federal Rules of Civil Procedure". LII / Legal Information Institute.
- "2015 Amendments" (PDF). Archived from the original (PDF) on 2017-06-12. Retrieved 2017-06-27.
- "Judge Scheindlin Brought Great Insight and Leadership". March 28, 2016.
- "Case Law AJ Holdings v. IP Holdings". January 13, 2015.
- Logikcull. "Legal Hold and Data Preservation | Ultimate Guide to eDiscovery | Logikcull". Logikcull. Retrieved 2018-06-08.
- Qualcomm v. Broadcom: Implications for Electronic Discovery accessdate=2014-10-19
- Sullivan, Casey C. "How the IoT Is Solving Murders and Reshaping Discovery". Retrieved 2018-06-08.
- Kincaid, Jason (February 11, 2009). "The AP Reveals Details of Facebook/ConnectU Settlement With Greatest Hack Ever". TechCrunch.
- Schneier, Bruce (June 26, 2006). "Yet Another Redacting Failure]. Schneier on Security". Schneier.com.
- "The Sedona Conference®". thesedonaconference.org.
- "Method and system for searching for, and collecting, electronically-stored information". Elliot Spencer, Samuel J. Baker, Erik Andersen, Perlustro LP. 2009-11-25. Cite journal requires
|journal=(help)CS1 maint: others (link)
- Richard, Adams; Graham, Mann; Valerie, Hobbs (2017). "ISEEK, a tool for high speed, concurrent, distributed forensic data acquisition". Research Online. doi:10.4225/75/5a838d3b1d27f.
- "Digital Forensics Services". www.ricoh-usa.com.
- Grossman, Maura R.; Cormack, Gordon V. (January 2013). "Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, U.S. Magistrate Judge" (PDF). Federal Courts Law Review. Stannardsville, Virginia: Federal Magistrate Judges Association. 7 (1): 6. Retrieved August 14, 2016.
- Gricks, Thomas C., III; Ambrogi, Robert J. (November 17, 2015). "A brief history of technology assisted review". Law Technology Today. Chicago, Illinois: American Bar Association. Retrieved August 14, 2016.
- Sedona Conference. TAR Case Law Primer Public Comment Version August 2016 Retrieved August 17, 2016
- Roitblat, Herbert; Kershaw, Anne. "Document categorization in legal electronic discovery: Computer classification vs. manual review" (PDF). Journal of the Association for Information Science and Technology. Hoboken, New Jersey: Wiley-Blackwell. 61 (1): 1–10. Retrieved August 14, 2016.
- Richmond Journal of Law & Technology Technology-assisted review in e-discovery can be more effective and more efficient than manual review Retrieved August 14, 2016
- S.D.N.Y. (2012). Moore v. Publicis Retrieved August 13, 2016.
- High Court, Ireland (2015). Irish Bank Resolution Corporation Limited v. Sean Quinn Retrieved August 13, 2016
- High Court of Justice Chancery Division, U.K. (2016). Pyrrho Investments Ltd v. MWB Property Ltd Retrieved August 13, 2016
- S.D.N.Y (2015). Rio Tinto v. Vale Retrieved August 14, 2016
- S.D.N.Y. (2016). Hyles v. New York City Retrieved August 14, 2016
- Practical Law Journal (2016). Continuous Active Learning for TAR Retrieved August 14, 2016
- Sedona Conference (2007). Best practices recommendations & principles for addressing electronic document production Archived 2016-07-06 at the Wayback Machine
- Article (2012). Ledergerber, Marcus (ed.). "Better E-Discovery: Unified Governance and the IGRM". American Bar Association. Archived from the original on 2016-10-11. Retrieved 2016-08-21.