List of duplicate file finders

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This is a list of software tools to find and clean duplicate files in a directory.

Open Source[edit]

Language *nix Windows OS X CLI GUI Software
Python Yes Yes Yes Green tickY Red XN ActiveState Recipe - a minimal python command line tool that only detects duplicates
Python Yes Yes Yes ? ? dedupe_copy - filters duplicates while copying and allows automatic reordering
C Yes Cygwin MacPorts Green tickY Red XN duff - a Unix command-line utility for quickly finding duplicates in a given set of files
C++ No Yes No ? Green tickY Duff - a GUI duplicate file finder and processor for Windows
C Yes Yes ? Green tickY ? dupedit - Compares many files at once without checksumming. Avoids comparing files against themselves when multiple paths point to the same file.
Python Yes Yes Yes ? Green tickY dupeguru - runs on various platforms. Special versions for music or picture available.
C++ Yes Yes No Green tickY Green tickY Duplicate Files Finder - GUI Application for Windows and Linux. Project site.
Perl ? ? ? ? ? dupious - Perl-based duplication finder for small to large systems, or multiserver setups. Former finddup.pl
C Yes Cygwin ? ? ? dupmerge - POSIX C compliant and runs on various platforms (Win32/64 with Cygwin, *nix, Linux etc.)
Perl Yes ? Yes ? ? dupseek - Perl with algorithm optimized to reduce reads
Python Yes Yes Yes Green tickY Red XN fastdupes fast and small python command line tool to find duplicates
Perl ? Yes ? ? ? fdf - Perl/c based and runs across most platforms (Win32, *nix and probably others). Uses MD5, SHA1 and other checksum algorithms
Perl Yes Yes Yes ? ? fdupe - a small script written in Perl. Doing its job fast and efficiently.[1]
C Yes ? ? Green tickY Red XN fdupes - Command line tool written in C. MD5 then byte-by-byte. Can also compare hardlinks.
Java Yes Yes Yes Green tickY Red XN findrepe - free Java-based command-line tool designed for an efficient search of duplicate files, it can search within zips and jars.(GNU/Linux, Mac OS X, *nix, Windows)
C Yes Cygwin ? ? ? freedup - POSIX C compliant and runs across platforms (Windows with Cygwin, Linux, AIX, etc.)
Perl Yes ? ? ? ? freedups - Perl script that hardlinks duplicates to save space, caches file checksums.
Python Yes No No Green tickY Green tickY fslint - has command line interface and GUI.
Python Yes ? ? Green tickY Red XN hardlinkpy - A tool to hardlink together identical files in order to save space. It is a complete rewrite and improvement over the original hardlink.c code (which was written by: Jakub Jelinek <jakub@redhat.com>). Performance is orders of magnitude faster than hardlink.c due to a more efficient algorithm.
Python Yes Yes Yes Green tickY Red XN liten - Pure Python deduplication command line tool, and library, using md5 checksums and a novel byte comparison algorithm. (Linux, Mac OS X, *nix, Windows)
Python Yes No Yes Green tickY Red XN liten2 - A rewrite of the original Liten, still a command line tool but with a faster interactive mode using SHA-1 checksums (Linux, Mac OS X, *nix)
C++ Yes Cygwin Yes ? ? rdfind - One of the few which rank duplicates based on the order of input parameters (directories to scan) in order not to delete in "original/well known" sources (if multiple directories are given). Uses MD5 or SHA1.
Python Yes Partial Yes Green tickY Red XN remdups - Small python command line tool with intermediate hash list file to produce an option driven remove file shell script.
C, Perl, SH Yes Cygwin No ? ? repeats - C and SH, from littleutils. File sizes, then partial-read hashes, then full-read hashes, then (optionally) byte-for-byte comparisons. Highly efficient. (Linux, *nix, Cygwin)
Bash Yes N/A N/A ? ? rmdupe - a shell script that uses linux tools to detect and remove duplicates.
C Yes No Experimental Green tickY Red XN rmlint - Fast finder with command line interface and many options to find other lint too (uses MD5), claims to be better than rdfind and fdupes.
C Yes Yes Yes ? ? ssdeep - identify almost identical files using Context Triggered Piecewise Hashing
Java Yes Yes Yes ? Green tickY DFS - search by content / size / name
C++ Yes ? ? Green tickY Red XN ua - Unix/Linux command line tool, designed to work with find (and the like).

Commercial Or With More Restrictive License[edit]

See also[edit]

References[edit]

  1. ^ User "Dr. Liviu Daia" (16:03 GMT-8, 12 Dec 2009). "Re: Comparing large amounts of files".  [1]

External Links[edit]

External Comparisons[edit]