While Kevin was working on an initial Matlab implementation I began working on an OpenCV implementation to see which would perform better. OpenCV tends to run faster than even the optimized Matlab code, so we are most likely going to stick to a final implementation written in C++ using OpenCV + other stuff.
OpenCV isn't without its annoying quirks though. Often times when creating a supposedly all black new image, many pixels in the image will be filled with random values, resulting in a noisy image. This is corrected by multiplying the image with a zero filled matrix, but the extra step adds time and we haven't figured out a way around this yet.
Right now the saliency maps we are generating are essentially the same as those used by Itti, Koch, and Niebur. We deviate slightly in regard to our color implementation. Instead of using an RGB image to extract intensity, RG, and BY values, we convert use the L*a*b* color model. This model stores the image in thee channels: intensity (lightness), RG (red versus green), and BY (blue versus yellow). The end result is the same - we compute an intensity, red versus green, and blue versus yellow map of the image.
We also use a bank of eight Gabor filters on a grayscale version of our image. These filters are of varying rotations and scales and respond strongly whenever their orientation matches that found in the image.
This is the general algorithm we are following right now:
- Split the image into L*, a*, b*, as well as grayscale channels
- Compute center surround differences on the L*, a*, and b* images. This basically means that we feed the image through a guassian pyramid and compute differences between scales, normalizing at each step.
- For each gabor filter in the bank, compute the center surround differences and normalize as in the color maps. The final gabor map is the mean of every image in the bank.
- Take the four maps and compute a final map, weighting luminosity, color, and orientation equally.
We are still experimenting with different weights for the final map. Right now each color map receives half the weight of the luminosity map.
I've also been experimenting with thresholding throughout the process and have found that removing all pixels less than 1/10th the (max - min) in the L*a*b* process results in a cleaner end map.
And now some pretty pictures:
One issue you may notice from these images is that our maps seem to drift towards the bottom right of the image. This is an issue brought up by Dirk Walther that results from decimating an image after a gaussian kernel has been applied. It can be mostly corrected by convolving the image again with a simple kernel such as [1 1]/2. This image shows the effect of the second convolution (from Walther):
Papers:
Walther, Interactions of Visual Attention and Object Recognition: Computational Modeling, Algorithms, and Psychophysics, 2006