Deduplicate Files From Several Drives
Deduplicate Files From Several Drives
Hello forum members,
I’m reaching out for your assistance in organizing my 25 years of data.
My storage setup includes a dozen hard drives, two additional backup drives, and a Synology DSM723+ NAS.
The backup drives were created to combine the contents of the twelve older drives, while the NAS was built to consolidate and back up those backup drives.
There’s considerable overlap between the backup drives, but they’re not identical, and the NAS holds most, though not all, of the files from the backup drives.
I’ve saved the twelve older drives once they became too large for their original storage.
Now, my NAS has reached nearly 8TB, so it’s time to clean up the disks. This will help me identify overlapping data, remove duplicates, and create space for new data that hasn’t been backed up yet.
What do you suggest for deduplicating files on the NAS?
Also, what’s your advice for comparing the backup drives and the dozen drives to the deduplicated NAS?
I want to ensure I retain every file I’ve saved and eliminate duplicate files that consume storage.
Additionally, since I’m using RAID 1, I’d like a reliable backup method in case both drives fail, are compromised, or are destroyed.
Thank you in advance for your guidance.
Imagine you own 13 copies of a specific picture of your grandfather's 57 Chevy. Do all of them share the same name or do they each have unique names? In GB or TB, what is the total number of files across all drives, counting duplicates? What would be your best guess for the total size if you only considered one copy of each file, ignoring any repeats? Are these mainly text documents, mostly images, and mostly related to operating systems or software?
Hi there, Lafong. Thank you for helping me find a solid path forward.
I suspect some files might have been renamed, and it seems like the total number of files could be around 200.
The combined size of all files, including duplicates, is estimated to be between 12 and 14TB.
Mostly, the files are photos, followed by audio, then software, and a few hundred videos.
A challenge I hadn’t considered before is the audio files.
I own several versions of songs—different studio releases or live recordings.
If I were to fully automate this process, I’d be removing files with matching names even though they’re not truly duplicates.
Appreciate your guidance.