List of duplicate file finders

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This is a list of software tools to find and clean duplicate files in a directory.

Open Source[edit]

Language *nix Windows OS X CLI GUI Software
Python Yes Yes Yes Green tickY Red XN ActiveState Recipe - a minimal python command line tool that only detects duplicates
Python Yes Yes Yes ? ? dedupe_copy - filters duplicates while copying and allows automatic reordering
C Yes Cygwin Homebrew or MacPorts Green tickY Red XN duff - a Unix command-line utility for quickly finding duplicates in a given set of files
C++ No Yes No ? Green tickY Duff - a GUI duplicate file finder and processor for Windows
C Yes Yes ? Green tickY ? dupedit - Compares many files at once without checksumming. Avoids comparing files against themselves when multiple paths point to the same file.
Python Yes Yes Yes ? Green tickY dupeguru - runs on various platforms. Special versions for music or picture available.
C++ Yes Yes No Green tickY Green tickY Duplicate Files Finder - GUI Application for Windows and Linux. Project site.
Perl ? ? ? ? ? dupious - Perl-based duplication finder for small to large systems, or multiserver setups. Former finddup.pl
C Yes Cygwin ? ? ? dupmerge - POSIX C compliant and runs on various platforms (Win32/64 with Cygwin, *nix, Linux etc.)
Perl Yes ? Yes ? ? dupseek - Perl with algorithm optimized to reduce reads
Python Yes Yes Yes Green tickY Red XN fastdupes fast and small python command line tool to find duplicates
Perl ? Yes ? ? ? fdf - Perl/c based and runs across most platforms (Win32, *nix and probably others). Uses MD5, SHA1 and other checksum algorithms
Perl Yes Yes Yes ? ? fdupe - a small script written in Perl. Doing its job fast and efficiently.[1]
C Yes No Homebrew Green tickY Red XN fdupes - Command line tool written in C. MD5 then byte-by-byte. Can also compare hardlinks.
C Yes Yes Untested Green tickY Red XN fdupes-jody - Enhanced fork of fdupes with much higher performance. This version has also been ported to Windows.
Java Yes Yes Yes Green tickY Red XN findrepe - free Java-based command-line tool designed for an efficient search of duplicate files, it can search within zips and jars.(GNU/Linux, Mac OS X, *nix, Windows)
C Yes Cygwin ? ? ? freedup - POSIX C compliant and runs across platforms (Windows with Cygwin, Linux, AIX, etc.)
Perl Yes ? ? ? ? freedups - Perl script that hardlinks duplicates to save space, caches file checksums.
Python Yes No No Green tickY Green tickY fslint - has command line interface and GUI.
Python Yes ? ? Green tickY Red XN hardlinkpy - A tool to hardlink together identical files in order to save space. It is a complete rewrite and improvement over the original hardlink.c code (which was written by: Jakub Jelinek <jakub@redhat.com>). Performance is orders of magnitude faster than hardlink.c due to a more efficient algorithm.
Python Yes Yes Yes Green tickY Red XN liten - Pure Python deduplication command line tool, and library, using md5 checksums and a novel byte comparison algorithm. (Linux, Mac OS X, *nix, Windows)
Python Yes No Yes Green tickY Red XN liten2 - A rewrite of the original Liten, still a command line tool but with a faster interactive mode using SHA-1 checksums (Linux, Mac OS X, *nix)
C# No GUI Yes No GUI Green tickY Green tickY ndupfinder - uses MD5 hashing to efficiently find duplicates. binaries not available as of now. needs compilation by user. WPF gui available for windows.
Batch No Yes No Green tickY Red XN phdeldup - very simple easily modifiable .BAT script to delete duplicate files matching a specified mask, using only native Windows shell commands (comp, dir, del).
C++ Yes Cygwin Yes Green tickY ? rdfind - One of the few which rank duplicates based on the order of input parameters (directories to scan) in order not to delete in "original/well known" sources (if multiple directories are given). Uses MD5 or SHA1.
Python Yes Partial Yes Green tickY Red XN remdups - Small python command line tool with intermediate hash list file to produce an option driven remove file shell script.
C, Perl, SH Yes Cygwin No ? ? repeats - C and SH, from littleutils. File sizes, then partial-read hashes, then full-read hashes, then (optionally) byte-for-byte comparisons. Highly efficient. (Linux, *nix, Cygwin)
Bash Yes N/A N/A ? ? rmdupe - a shell script that uses linux tools to detect and remove duplicates.
C Yes No Experimental Green tickY Red XN rmlint - Tool with command line interface and options to find other lint and duplicate directories. Can use incremental byte-by-byte comparasion or different hashing algorithms. Heavily optimized, including the use of the FIEMAP [2] ioctl on Linux.
C Yes Yes Yes ? ? ssdeep - identify almost identical files using Context Triggered Piecewise Hashing
Java Yes Yes Yes ? Green tickY DFS - search by content / size / name
C++ Yes ? ? Green tickY Red XN ua - Unix/Linux command line tool, designed to work with find (and the like).
Python Yes Yes Yes Green tickY Green tickY pddf - A tool to find duplicate files with fast and full scan.

Commercial Or With More Restrictive License[edit]

See also[edit]

References[edit]

  1. ^ User "Dr. Liviu Daia" (16:03 GMT-8, 12 Dec 2009). "Re: Comparing large amounts of files".  Check date values in: |date= (help); [1]
  2. ^ kernel.org documentation of the FIEMAP ioctl

External links[edit]

External Comparisons[edit]