RosettaCodeData/Task/Find-duplicate-files/00-TASK.txt

16 lines
921 B
Plaintext

In a large directory structure it is easy to inadvertently leave unnecessary copies of files around, which can use considerable disk space and create confusion.
;Task:
Create a program which, given a minimum size and a folder/directory, will find all files of at least ''size'' bytes with duplicate contents under the directory and output or show the sets of duplicate files in order of decreasing size.
The program may be command-line or graphical, and duplicate content may be determined by direct comparison or by calculating a hash of the data.
Specify which filesystems or operating systems your program works with if it has any filesystem- or OS-specific requirements.
Identify hard links (filenames referencing the same content) in the output if applicable for the filesystem.
For extra points, detect when whole directory sub-trees are identical, or optionally remove or link identical files.
<br><br>