• Log in with Facebook Log in with Twitter
Photo

backup with too many duplicates, how to solve ?


  • Please log in to reply
21 replies to this topic

#1 toni-a

toni-a

    Advanced Member

  • Members
  • PipPipPip
  • 1,317 posts
  • LocationLebanon

Posted 04 November 2017 - 01:00 AM

I am rearranging my photos and i am facing the issue of duplicate files, it's occupying space and making searching for a photo more complicated any suggestions for a software that can help locating duplicates?

#2 you2

you2

    Advanced Member

  • Members
  • PipPipPip
  • 1,014 posts

Posted 04 November 2017 - 11:51 AM

Well my solution is to write a little script that takes the sha256 and then delete files with matching hash.


  • stoppingdown likes this

#3 MatjazO

MatjazO

    Advanced Member

  • Members
  • PipPipPip
  • 35 posts
  • LocationSlovenia

Posted 04 November 2017 - 03:04 PM

I reckon you could try one of apps that a quick google search returns. For example “Duplicate photo fixer” seems to be one of them. There are of course others, some apparently even try to look into image itself. Keep in mind I have no experience with any of these.

I can very much relate to the multiplicity of files, as it used to be growing issue in my case. Years ago, solution for me came with Lightroom, with it’s indestructive approach to post processing.

#4 mst

mst

    Advanced Member

  • Moderators
  • 2,095 posts
  • LocationWesterwald, Germany

Posted 05 November 2017 - 11:11 AM

Yup, that's exactly what media libraries are designed to avoid (if used properly).

The issue often reminds me of the early MP3 days: there was the "Winamp" crowd, read those who organized their files in folders and played with Winamp, and those who dived into iTunes challenge, so let go of the folder idea and relied on a library to handle the files.

(not meaning to promote iTunes in general with this post... in fact it has been a terrible peace of software for quite a while, almost like Outlook ;) Just mentioning it as a an example for the library idea, which means relying (and trusting) an application handle everything on file level, while on the user interface level lots of new possibilities open up, like searching for images by file date, aperture, keywords, smart albums, etc...)
Editor
photozone.de

#5 JoJu

JoJu

    Advanced Member

  • Members
  • PipPipPip
  • 2,877 posts
  • LocationSwitzerland

Posted 05 November 2017 - 11:38 AM

Andt there was also the battle between library users (be it iPhoto, Aperture, Photos.app) and the ones who thought their system of manually organizing files in folders is better and not bound to an app. Using Capture One, but keeping the files out of a library was a "pro" thing to do.

If I say "To me learning Aperture was the best way to deal with pictures" became half true. A well organized library or 6 of them is not worth much if it can't grow anymore because the programmers decided it's best to stop the development. And leave the users like penguins on a drifting iceberg.

I am sure in Aperture (or iTunes for the music) I will not find a single duplicate in +50.000 pictures, but with the folder organizsation i can't say the same.

#6 Rover

Rover

    Advanced Member

  • Members
  • PipPipPip
  • 1,606 posts
  • LocationRussia

Posted 07 November 2017 - 09:29 AM

Winamp does have a library; doesn't mean I don't put everything in accurately attributed folders. :)

The dangers of using proprietary software are known well enough for me to have never invested into any solution - I just prefer to store them pics in a well maintained directory system as well.

F:\PhotoArchiveMaster\#year\#month\#day\##shoot-description\(optionally)subshoot (for example, singling out a person whose portrait I shot while working on an event).

 

I store processed files (cropped / resized for publication) elsewhere entirely, in the working folders for the journalistic stuff (which includes texts, audio recordings and photos in any combination) following a similar structure. Helps me avoid mixing up the archive and the stuff in the works.



#7 JoJu

JoJu

    Advanced Member

  • Members
  • PipPipPip
  • 2,877 posts
  • LocationSwitzerland

Posted 07 November 2017 - 09:39 AM

All quite nice and sorted, Rover, but the disadvantage of your system is the inflexibility and also the redundance of the fullsize or low-res copies, therefore a lot of diskspace is needed. That might work with 8-16 MP, but 45 MP 16bit TIFs are eating a harddrive for breakfast.

 

To me, there's only one, "sacred" original RAW - all other interpretations, crops, b/w versions are basically just duplictaed and variied settings. I'm used to cross-referencing and I'm also used to keep a good DAM in order. Downside: If the DAM is so poorly programmed like the one of Capture One, your way appears to be the better one



#8 toni-a

toni-a

    Advanced Member

  • Members
  • PipPipPip
  • 1,317 posts
  • LocationLebanon

Posted 07 November 2017 - 10:18 AM

From the file size perspective, storage devices are becoming larger and cheaper, one terabye was a huge amount of data when I got my 300D in 2004 now that's nothing so wasting space with big files in never a problem in my bood



#9 JoJu

JoJu

    Advanced Member

  • Members
  • PipPipPip
  • 2,877 posts
  • LocationSwitzerland

Posted 07 November 2017 - 11:17 AM

Harddrives will fail - not a question, if, just when. One backup only? To me not enough. So each new TB I buy is actually 3 new HDs. True, they are cheap. But have you ever copied 3 TB to another drive? And afterwards noticed that a couple of files appear to be defect? That can happen pretty easily during long copy sessions, because I reckon you don't run a UPS (uninterruptible power supply)? If there's power loss during a copy session, the original files can get damaged, too.

 

My principle is trying to reduce the risk, whenever possible.



#10 Rover

Rover

    Advanced Member

  • Members
  • PipPipPip
  • 1,606 posts
  • LocationRussia

Posted 07 November 2017 - 11:26 AM

All quite nice and sorted, Rover, but the disadvantage of your system is the inflexibility and also the redundance of the fullsize or low-res copies, therefore a lot of diskspace is needed. That might work with 8-16 MP, but 45 MP 16bit TIFs are eating a harddrive for breakfast.

 

To me, there's only one, "sacred" original RAW - all other interpretations, crops, b/w versions are basically just duplictaed and variied settings. I'm used to cross-referencing and I'm also used to keep a good DAM in order. Downside: If the DAM is so poorly programmed like the one of Capture One, your way appears to be the better one

Inflexible how? I'm not bitching, I'm genuinely interested in how to make the system better. :) I know that there's only so much description one can cram into a folder name - after all, the full path needs to be at most 255 characters long - but once I tried making file_id.diz text files for every folder/shoot and it did not work for very long - I just got bored quickly. Right now I can find the required photos - and I'm at times amazed that photos which seemed utterly irrelevant and unnecessary may end up being used years upon years later - with reasonable accuracy. That doesn't mean I don't want to improve the whole thing. :)

 

Since I'm not shooting in RAW, I only have to store the source JPEGs and the edited versions. The latter are usually not a size issue because for where I'm working now 2000*1333 is usually fine (and each file is therefore sub-1MB). Even for the newspaper work, unless I was aiming to use the shots for an exhibition later - a very rare occurence - I was slightly compressing the end results after levels / cropping / tilt adjustments / dust removal. So those are not the chief offenders. :)

 

Regarding backups, I'm running two external HDDs - one for everything just as it is appearing, another only for the well-sorted / culled / described data. The contents of the latter are mirrored (mostly) in the cloud, although there I'm already close to running out of space. :)



#11 JoJu

JoJu

    Advanced Member

  • Members
  • PipPipPip
  • 2,877 posts
  • LocationSwitzerland

Posted 07 November 2017 - 12:05 PM

I know that you're not bitching  :) Just because there's so much of a difference and also learning curve, if one goes from file system/Windows to a database, which is mostly creating links.

 

But of course, with JPGs it's a different story in terms of diskspace.

 

I just like to describe the way Aperture, LR or C1 sort of "work", some better, some worse. In Aperture, I don't need to think about folder/file names, that's completely Aperture's job (which also detects dunplicates if I try to import files twice - I can, but I get a warning message).

 

At the import, I can choose to create a new folder (within the database, not directly connected to the internal structure of the library), new project, new album. Project could be "Kiev 2017-10", within the project albums like "parks", "nightlife", "architecture", "churches". That's a normal album and you see, some pictures would fit in more than 1 category. Or get more than one keyword. Like a church in a park at night-time with an interesting architecture.

 

The link to such pictures would be point to the same source, but appear in all three albums.

 

If I work only with keywords (I don't, I suck in discipline), I could create smart albums like "Keyword contains "church", AND "architecture" AND "nightlife"....) These smart albums are updated constantly at the moment the chosen criteria apply to a new picture.

 

Smart albums find pictures like

"Camera Name = D810 AND serial number contains 2365 AND focal length between 10 and 15 mm AND aperture between f/1.4 and f/5.6 AND shooting distance between 0.5 and 2 m AND face belongs to "Rover" or "JoJu"". Aperture uses all EXIF info, including Nikon's special EXIFs like metering mode or picture control setting. And all that without any effort of me - just because the files contain the necessary data or the faces are tagged with face detection.

 

That's smart albums - there are also smart slideshows, lighttables or books which I don't use often, contrary to the albums.

 

Of the files in Aperture I can lookup how much pictures I made with a certain lens. This year or last decade or the last 90 days. Or in a certain geotagged location. So one part of teh library is a tightly fixed structure administrated by Aperture and I only touch it, if something is broken (but Aperture also has a three step trouble shooting routine which ususally speeds up the 1.4 TB library)

 

Do that with a file based system - you need some kind of metadata catalogue and move one folder accidentally, you need to fix the structure. And to find all animal portraits you did between 2013 and 2015 - wll I guess you need to scroll through the folders?

 

At the beginning I had troubles to understand the concept. That was 12 years ago. Today I miss these well-thought concept every day I start up Capture One.



#12 mst

mst

    Advanced Member

  • Moderators
  • 2,095 posts
  • LocationWesterwald, Germany

Posted 07 November 2017 - 12:33 PM

Winamp does have a library


Well, so does C1 now, but "back then" both didn't ;) Sorry, haven't used WinAmp for more than a decade, I just used it as an example for your preferred style of workflow, which is folder-based. Been there, too, used folder and winamp exclusively, until I tried the iTunes way, eventually.

The advantages are maybe more obvious for music: easily find all music by Sting in your library, even if it's a single track on a compilation, or find all metal music from the 90s.

For images, JoJu summed it up very nicely. I use smart folders for example to sort the MTF shots by aperture. It's an empty smart album with subfolders by aperture setting. Just throw the whole set of test chart images at it and find it sorted as needed by aperture immediately.
Or search for images shot with a given lens... maybe not a common task, but a feature I use quite often when collecting sample images for reviews, especially for lenses I have used for longer than just a few days.
  • JoJu likes this
Editor
photozone.de

#13 mst

mst

    Advanced Member

  • Moderators
  • 2,095 posts
  • LocationWesterwald, Germany

Posted 07 November 2017 - 12:35 PM

From the file size perspective, storage devices are becoming larger and cheaper, one terabye was a huge amount of data when I got my 300D in 2004 now that's nothing so wasting space with big files in never a problem in my bood


It is over here. PZ data is exlusively on an 8 TB drive over here, there's some space left on it currently, but it's filling up and will definitely fill a lot faster in the upcoming months after the D850 arrives here. And that's just storage... not talking about backup of that amount of data...
Editor
photozone.de

#14 chrismiller

chrismiller

    Advanced Member

  • Members
  • PipPipPip
  • 227 posts
  • LocationGlasgow, UK

Posted 07 November 2017 - 12:51 PM

JoJo, if you are worried about photos being corrupted during copying, you could use something like Git to manage the files. This creates hashes of the repository, so will spot issues.



#15 toni-a

toni-a

    Advanced Member

  • Members
  • PipPipPip
  • 1,317 posts
  • LocationLebanon

Posted 07 November 2017 - 12:57 PM

JoJo, if you are worried about photos being corrupted during copying, you could use something like Git to manage the files. This creates hashes of the repository, so will spot issues.

What is Git exactly ?

#16 chrismiller

chrismiller

    Advanced Member

  • Members
  • PipPipPip
  • 227 posts
  • LocationGlasgow, UK

Posted 07 November 2017 - 01:03 PM

https://en.wikipedia.org/wiki/Git

 

a version control system, originally written to for the linux kernal by Linus Torvalds. Its really designed for tracking source code for software, but you can stick any binary file you like in it and it will track it. If you change the file you can check in a new version of that file and Git will keep track of both copies. If you're working on text files there are other things you can do such as performing a diff to see the changes or merging changes in other branches.



#17 JoJu

JoJu

    Advanced Member

  • Members
  • PipPipPip
  • 2,877 posts
  • LocationSwitzerland

Posted 07 November 2017 - 01:03 PM

JoJo, if you are worried about photos being corrupted during copying, you could use something like Git to manage the files. This creates hashes of the repository, so will spot issues.

 

No, photos are comparatively small. It were some movies I ripped from my own DVDs and copied to another HD - using a double bay which was supposed to do 1:1 copies from source to target. It was an experiment, I was already suspecting that process would not work so well as Mac OS is pretty picky in terms of disk structures.

 

But then: Using a separate app to be sure you have a copy? Is that what we want, we need? Not for me - I expect an OS to be able to manage these processes. Mac OS is, the firmware of that bay is not - lesson learnt. I wanted to save some bandwidth and processor time during one day of copying.



#18 Rover

Rover

    Advanced Member

  • Members
  • PipPipPip
  • 1,606 posts
  • LocationRussia

Posted 07 November 2017 - 06:10 PM

I can see where you're coming from.  B) But so far, the system I'm using hasn't bothered me much. If I need to find something, I'm usually doing a basic filesystem search - it's not that I need to find something by its EXIF entries (like focal length and whatnot) very often. If I have it described, I will eventually find it.

 

Another thing is, as the database size grows, there may be hiccups. That's why the data in my mail.ru cloud has to be updated manually - at some point the tray utility has stopped working and I never found out why; presumably due to the sheer amount of entries (200K+ files, not only photos, by then).

 

BTW Markus, this is the 18th year since I began using Winamp. :) The library appeared sometime in mid-2000s but after a short spell I gave up using it. I have a pretty good idea what I have on hand, and at any given time I'm usually listening to just a few artists until I get bored and switch to the next bunch. Rinse, repeat, rotate. :) Besides, only two artists have remained in my toplist (to be spun regularly) for the last 15/20 years (Vangelis and Iron Maiden), the rest come and (sometimes) go. I have no appetite for Blackmore's Night now, for example, but give me (pretty much) anything with harsh vocals.  :lol:



#19 JoJu

JoJu

    Advanced Member

  • Members
  • PipPipPip
  • 2,877 posts
  • LocationSwitzerland

Posted 07 November 2017 - 06:53 PM

With growing folder quantities, your system also slows down  ;)

 

I know of people who did the transition from Aperture to Capture One and have an annual growing rate of around 1.5 TB - they organize their libraries by year and then are done. Although I suspect the presenter of Capture One himself nearly panicked because they might try around 100MB or so and are happy if it no crashes on a daily basis...  <_<

 

iTunes grew up to 500 GB and the response time after a search is a fraction of a second. All I want to say: It's terribly comfortable to work with a cool database system. And you're terribly f***ed up if the prgrammers decide to take a sabbatical...



#20 mst

mst

    Advanced Member

  • Moderators
  • 2,095 posts
  • LocationWesterwald, Germany

Posted 07 November 2017 - 07:04 PM

We started using Winamp roughly at roughly the similar time, then... I think 1.91 was the first version I downloaded :) I even bought a license for it when it still was shareware, and I had some really nice custom skins for it, too :)

I had similar hearing habits, just like you. In fact, when I considered switching to the Mac in 2006 (when they switched to Intel CPUs), I spent quite some time looking for a Mac application that offered the same features as winamp: browse my Music folders and just play or queue whatever I double-click on. It took me a while to accept that there is nothing similar on the Mac and I might eventually have to have a closer look to iTunes (Winamp was running in VMWare at that time...). Once I did... well, see JoJu's post above, "12 years ago" ;)

Not arguing in favor of any approach: both have their advantages. I like that I can search across my whole library when I'm looking for something and get instant results while typing already. However, there is music that I used to listen to quite often "back then", but because it didn't have proper tags, it got lost somewhere in the back of the library database and only get's rediscovered when I use search phrases that happen to be part of their file names.

I even subscribed to streaming services by now. They have disadvantages and I feel that listening habits can easily be "mainstreamed" by their suggestions and moderated playlists, however I also discovered lots of music on Spotify that I really enjoy and that would have never gotten my attention otherwise.
Editor
photozone.de




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users



© by photozone.de