Opticallimits

Full Version: backup with too many duplicates, how to solve ?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
I am rearranging my photos and i am facing the issue of duplicate files, it's occupying space and making searching for a photo more complicated any suggestions for a software that can help locating duplicates?

Guest

Well my solution is to write a little script that takes the sha256 and then delete files with matching hash.

I reckon you could try one of apps that a quick google search returns. For example “Duplicate photo fixer” seems to be one of them. There are of course others, some apparently even try to look into image itself. Keep in mind I have no experience with any of these.


I can very much relate to the multiplicity of files, as it used to be growing issue in my case. Years ago, solution for me came with Lightroom, with it’s indestructive approach to post processing.
Yup, that's exactly what media libraries are designed to avoid (if used properly).

The issue often reminds me of the early MP3 days: there was the "Winamp" crowd, read those who organized their files in folders and played with Winamp, and those who dived into iTunes challenge, so let go of the folder idea and relied on a library to handle the files.

(not meaning to promote iTunes in general with this post... in fact it has been a terrible peace of software for quite a while, almost like Outlook Wink Just mentioning it as a an example for the library idea, which means relying (and trusting) an application handle everything on file level, while on the user interface level lots of new possibilities open up, like searching for images by file date, aperture, keywords, smart albums, etc...)
Andt there was also the battle between library users (be it iPhoto, Aperture, Photos.app) and the ones who thought their system of manually organizing files in folders is better and not bound to an app. Using Capture One, but keeping the files out of a library was a "pro" thing to do.


If I say "To me learning Aperture was the best way to deal with pictures" became half true. A well organized library or 6 of them is not worth much if it can't grow anymore because the programmers decided it's best to stop the development. And leave the users like penguins on a drifting iceberg.


I am sure in Aperture (or iTunes for the music) I will not find a single duplicate in +50.000 pictures, but with the folder organizsation i can't say the same.
Winamp does have a library; doesn't mean I don't put everything in accurately attributed folders. Smile

The dangers of using proprietary software are known well enough for me to have never invested into any solution - I just prefer to store them pics in a well maintained directory system as well.

F:\PhotoArchiveMaster\#year\#month\#day\##shoot-description\(optionally)subshoot (for example, singling out a person whose portrait I shot while working on an event).

 

I store processed files (cropped / resized for publication) elsewhere entirely, in the working folders for the journalistic stuff (which includes texts, audio recordings and photos in any combination) following a similar structure. Helps me avoid mixing up the archive and the stuff in the works.

All quite nice and sorted, Rover, but the disadvantage of your system is the inflexibility and also the redundance of the fullsize or low-res copies, therefore a lot of diskspace is needed. That might work with 8-16 MP, but 45 MP 16bit TIFs are eating a harddrive for breakfast.

 

To me, there's only one, "sacred" original RAW - all other interpretations, crops, b/w versions are basically just duplictaed and variied settings. I'm used to cross-referencing and I'm also used to keep a good DAM in order. Downside: If the DAM is so poorly programmed like the one of Capture One, your way appears to be the better one

From the file size perspective, storage devices are becoming larger and cheaper, one terabye was a huge amount of data when I got my 300D in 2004 now that's nothing so wasting space with big files in never a problem in my bood

Harddrives will fail - not a question, if, just when. One backup only? To me not enough. So each new TB I buy is actually 3 new HDs. True, they are cheap. But have you ever copied 3 TB to another drive? And afterwards noticed that a couple of files appear to be defect? That can happen pretty easily during long copy sessions, because I reckon you don't run a UPS (uninterruptible power supply)? If there's power loss during a copy session, the original files can get damaged, too.

 

My principle is trying to reduce the risk, whenever possible.

Quote:All quite nice and sorted, Rover, but the disadvantage of your system is the inflexibility and also the redundance of the fullsize or low-res copies, therefore a lot of diskspace is needed. That might work with 8-16 MP, but 45 MP 16bit TIFs are eating a harddrive for breakfast.

 

To me, there's only one, "sacred" original RAW - all other interpretations, crops, b/w versions are basically just duplictaed and variied settings. I'm used to cross-referencing and I'm also used to keep a good DAM in order. Downside: If the DAM is so poorly programmed like the one of Capture One, your way appears to be the better one
Inflexible how? I'm not bitching, I'm genuinely interested in how to make the system better. Smile I know that there's only so much description one can cram into a folder name - after all, the full path needs to be at most 255 characters long - but once I tried making file_id.diz text files for every folder/shoot and it did not work for very long - I just got bored quickly. Right now I can find the required photos - and I'm at times amazed that photos which seemed utterly irrelevant and unnecessary may end up being used years upon years later - with reasonable accuracy. That doesn't mean I don't want to improve the whole thing. Smile

 

Since I'm not shooting in RAW, I only have to store the source JPEGs and the edited versions. The latter are usually not a size issue because for where I'm working now 2000*1333 is usually fine (and each file is therefore sub-1MB). Even for the newspaper work, unless I was aiming to use the shots for an exhibition later - a very rare occurence - I was slightly compressing the end results after levels / cropping / tilt adjustments / dust removal. So those are not the chief offenders. Smile

 

Regarding backups, I'm running two external HDDs - one for everything just as it is appearing, another only for the well-sorted / culled / described data. The contents of the latter are mirrored (mostly) in the cloud, although there I'm already close to running out of space. Smile

Pages: 1 2 3