Professional Photo Management with the Power of Open Source

digiKam Fuzzy Search Tools Under Construction

by digiKam

One month ago, with Marcel, we have worked on a new sets of Search Tools in digiKam for KDE4, called Fuzzy Search

During my digiKam presentation at LGM2008 i have introduced the concept to be able to search duplicates items around the whole collection of photos. But the concept is not just limited to find the similars photos by using copy, it even allows user to drawn a sketch of photo what user memories and shows photos what has similar shapes and colors as on sketch.

This is not a new concept in fact. An old program for Linux named ImgSeek provide already this feature. By my opinion, it's time to update old interface of ImgSeek and make it more suitable for end users by implementing the technology into digiKam.

The way how we are be able to search photos, looks great for me. It's very fast, and give pertinent results with my huge collection of photos under ImgSeek (it took over one hour to learn how the old GUI worked). This is why we have decided to study the Fast Multiresolution Image Querying paper which describe how to implement these features in digiKam. The core of the method is based on image fingerprints computed with Haar wavelets theory.

During one month, we have backported the fingerprints generator C++ code from ImgSeek to digiKam. For performance reason, only this part is written in pure C++ in ImgSeek, all the rest of the program is written in Python. We reviewed the algorithm and rewritted the code using Qt API and we have included new interface for it, so digiKam is able to record and retrieve finterprints from database. Like ImgSeek do not use a SQL database as digiKam, but a serialized and customized binary file to host finterprint data, we have changed the algorithm to be compatible with an SQL database.

Like you can see on the both videos below, Fuzzy Search is implemented on unstable code for KDE4:

The first video is a Sketch Search tool in action. The editor itself is very simple to use, you only select the color and size of the brush and then you draw the template to sketch area. The second video is a tool dedicated to find similar photos by using a reference photo. The whole collection is scanned to find dublicates. The results are given in a real time using digiKams search KIO-slave. And both tools gives the results very fast from database (my collection of photos include around 10.000 photos and runs on double core CPU with 10Gb of RAM).

The other tool is under development but not yet suitable to find all duplicates photos from your collections. It will work like the old kipi-plugin "find duplicates" and it gives a list of all candidates. The advantage on current solution is the re-usage of digiKam icon view and the sidebar, and not a separated dialog: so integration and usability is better with the rest of digiKam..

In the future, we will introduce a new config to set the level of relevance of matches from database, and to display these values on the icon view. An undo/redo operation for sketch editor will be greated too.

tineye integration

Do you think in the future you could implement a plugin to search for usage of your images on the web? I am thinking something along the lines of what tineye offers


Tineye sound like a great

Tineye sound like a great tool, but it's free to use ? There is an api for developpers ?


I was already excited about

I was already excited about this feature but I had expected it to be initially fairly undeveloped. However, ImgSeek appears to be pretty powerful, I'll be very pleased if all of that functionality finds it's way into DigiKam.

Good work.

Always nice to have videos to watch too.

ps your captchas are too damn hard.

i second that. I wanted to

i second that. I wanted to comment something earlier but didnt manage to get past the captcha and then i lost interest to do so

Captcha settings

Captcha settings fixed...


Captcha problems?

The captcha protection is bretty hard but not too hard, tought it could set little easier but I see it's fine now :-)