digiKam Fuzzy Search Tools Under Construction
One month ago, with Marcel, we have worked on a new sets of Search Tools in digiKam for KDE4, called Fuzzy Search
During my digiKam presentation at LGM2008 i have introduced the concept to be able to search duplicates items around the whole collection of photos. But the concept is not just limited to find the similars photos by using copy, it even allows user to drawn a sketch of photo what user memories and shows photos what has similar shapes and colors as on sketch.
This is not a new concept in fact. An old program for Linux named ImgSeek provide already this feature. By my opinion, it's time to update old interface of ImgSeek and make it more suitable for end users by implementing the technology into digiKam.
The way how we are be able to search photos, looks great for me. It's very fast, and give pertinent results with my huge collection of photos under ImgSeek (it took over one hour to learn how the old GUI worked). This is why we have decided to study the Fast Multiresolution Image Querying paper which describe how to implement these features in digiKam. The core of the method is based on image fingerprints computed with Haar wavelets theory.
During one month, we have backported the fingerprints generator C++ code from ImgSeek to digiKam. For performance reason, only this part is written in pure C++ in ImgSeek, all the rest of the program is written in Python. We reviewed the algorithm and rewritted the code using Qt API and we have included new interface for it, so digiKam is able to record and retrieve finterprints from database. Like ImgSeek do not use a SQL database as digiKam, but a serialized and customized binary file to host finterprint data, we have changed the algorithm to be compatible with an SQL database.
Like you can see on the both videos below, Fuzzy Search is implemented on unstable code for KDE4:
The first video is a Sketch Search tool in action. The editor itself is very simple to use, you only select the color and size of the brush and then you draw the template to sketch area. The second video is a tool dedicated to find similar photos by using a reference photo. The whole collection is scanned to find duplicates. The results are given in a real time using digiKams search KIO-slave. And both tools gives the results very fast from database (my collection of photos include around 10.000 photos and runs on double core CPU with 10Gb of RAM).
The other tool is under development but not yet suitable to find all duplicates photos from your collections. It will work like the old kipi-plugin "find duplicates" and it gives a list of all candidates. The advantage on current solution is the re-usage of digiKam icon view and the sidebar, and not a separated dialog: so integration and usability is better with the rest of digiKam..
In the future, we will introduce a new config to set the level of relevance of matches from database, and to display these values on the icon view. An undo/redo operation for sketch editor will be greater too.