File carving is the process of reassembling computer files from fragments in the absence of filesystem metadata.
Introduction and basic principles
All filesystems contain some metadata that describes the actual file system. At a minimum the following is stored: the hierarchy of folders and files, with names for each. For each file is also stored the physical address on the hard disk where the file is stored. As explained below, a file might be scattered in fragments at different physical addresses.
File carving is the process of trying to recover files without this metadata. This is done by analyzing the raw data and identifying what it is (text, executable, png, mp3, etc.). This can be done in different ways, but the simplest is to look for headers. For instance, every Java class file has as its first four bytes the hexadecimal value CA FE BA BE. Some files contain footers as well, making it just as simple to identify the ending of the file.
Most file systems, such as FAT and UNIX Fast File System, work with the concept of clusters of an equal and fixed size. For example, a FAT32 file system might be broken into clusters of 4KB each. Any file no larger than 4KB fits into a single cluster, and there is never more than one file in each cluster. Files that take up more than 4KB are allocated across many clusters. Sometimes these clusters are all contiguous, while other times they are scattered across two or potentially many more so called fragments, with each fragment containing a number of contiguous clusters storing one part of the file's data. Obviously large files are more likely to be fragmented.
Simson Garfinkel reported fragmentation statistics collected from over 350 disks containing FAT, NTFS and UFS file systems. He showed that while fragmentation in a typical disk is low, the fragmentation rate of forensically important files such as email, JPEG and Word documents is relatively high. The fragmentation rate of JPEG files was found to be 16%, Word documents had 17% fragmentation, AVI had a 22% fragmentation rate and PST files (Microsoft Outlook) had a 58% fragmentation rate (the fraction of files being fragmented into two or more fragments). Pal, Shanmugasundaram, and Memon presented an efficient algorithm based on a greedy heuristic and alpha-beta pruning for reassembling fragmented images. Pal, Sencar, and Memon introduced sequential hypothesis testing as an effective mechanism for detecting fragmentation points. Richard and Roussev presented Scalpel, an open-source file-carving tool.
File carving is a highly complex task, with a potentially huge number of permutations to try. To make this task tractable, carving software typically makes extensive use of models and heuristics. This is necessary not only from a standpoint of execution time, but also for the accuracy of the results. State of the art file carving algorithms use statistical techniques like sequential hypothesis testing for determining fragmentation points.
File carving can be used to recover data from a hard disk where the metadata is missing or damaged.
When a file is deleted, only the entry in the file system metadata is removed, while the actual data is still on the disk. After a format and even a repartitioning it might be that most of raw data is untouched and can be recovered using file carving.
Bifragment gap carving
Garfinkel introduced the use of fast object validation for reassembling files that have been split into two pieces. This technique is referred to as Bifragment Gap Carving (BGC). A set of starting fragments and a set of finishing fragments are identified. The fragments are reassembled if together they form a valid object.
Pal developed a carving scheme that is not limited to bifragmented files. The technique, known as SmartCarving, makes use of heuristics regarding the fragmentation behavior of known filesystems. The algorithm has three phases: preprocessing, collation, and reassembly. In the preprocessing phase, blocks are decompressed and/or decrypted if necessary. In the collation phase, blocks are sorted according to their file type. In the reassembly phase, the blocks are placed in sequence to reproduce the deleted files. The SmartCarving algorithm is the basis for the Adroit Photo Forensics and Adroit Photo Recovery applications from Digital Assembly.
Carving memory dumps
Snapshots of computers' volatile memory can be carved. Memory-dump carving is routinely used in digital forensics, allowing investigators to access ephemeral evidence. Ephemeral evidence includes recently accessed images and Web pages, documents, chats and communications committed via social networks. If an encrypted volume (TrueCrypt, BitLocker, PGP Disk) was used, binary keys to encrypted containers can be extracted and used to instantly mount such volumes. The content of volatile memory gets fragmented. A proprietary carving algorithm was developed by Belkasoft to enable carving fragmented memory sets (BelkaCarving).
- Adroit (software), a commercial file carver that uses fragment reassembly carving.
- Belkasoft Evidence Center, a commercial computer forensics product implementing file system and smart memory dump carving.
- Defraser (software), an open-source video file carver with fragment repair capabilities.
- Foremost (software), an open-source file carver.
- PhotoRec, a popular open-source file carver.
- MediaCarve (software), a free file carver targeted to media files.
- Recover My Files, proprietary evaluationware, ms-windows
- Scalpel (software), an open-source file carver.
- Simson Garfinkel, "Carving Contiguous and Fragmented Files with Fast Object Validation", in Proceedings of the 2007 digital forensics research workshop, DFRWS, Pittsburgh, PA, August 2007
- A. Pal and N. Memon, "Automated reassembly of file fragmented images using greedy algorithms" in IEEE Transactions on Image processing, February 2006, pp 385393
- A. Thus, finding the header of a file means that the first fragment of the file is found, but the other fragments might be scattered anywhere else on the partition, making file carving much more challenging. By studying how file systems actually do fragmentation and applying statistics, it is possible to make qualified guesses as to which fragments might fit together. These fragments are then put together in various possible permutations and it is tested if the fragments fit together. For some files it is easy for the software to test if they fit, while for others, the software might accidentally fit the pieces together incorrectly. Pal, T. Sencar and N. Memon, "Detecting File Fragmentation Point Using Sequential Hypothesis Testing", Digital Investigations, Fall 2008
- Richard, Golden, Roussev, V., "Scalpel: a frugal, high performance file carver", in Proceedings of the 2005 Digital Forensics Research Workshop, DFRWS, August 2005