Wednesday, February 25, 2009

Slow and Steady

We've been working away on implementing the bilateral symmetry filter from Kovesi. He has Matlab code available to perform the operation which we have been translating into OpenCV/C++ code. This process takes a while since even the simplest lines in Matlab can take ten times as much code to write in C++. This is approximately half way done at this point.

In addition to this code translation, we have been doing more research on symmetry and have found several other papers that deal with the subject in relation to visual attention. Two of the papers we are reading explicitly mention Itti and Koch and contrast their symmetry methods with Itti et al.

I fixed some bugs in our normalization code so the final maps that resulted from doing operations using integers instead of floats, so our results look like normal saliency maps now. The DoG filter code we implemented based upon Walther's saliency toolbox still doesn't give us the kind of results we would like - often the images will come out black after being fed through the filter or will only have one very intense region of marginal importance. So there are still some bugs to work out with that filter, but otherwise things are moving smoothly and our results look good.

Our plans for wrapping up the project include fully implementing this bilateral symmetry filter and fixing any normalization quirks in the code. We then aim to perform a small experiment asking participants to rate our saliency maps against other methods.

And now, pretty pictures (Original, LAB, RGB):



















Papers:


Biologically Inspired Saliency Map Model for Bottom-up
Visual Attention : http://www.springerlink.com/content/7wxq0npr7b09hlj3/fulltext.pdf

Paying Attention to Symmetry : http://www.ai.rug.nl/~gert/download/kootstra08bmvc.pdf

Friday, February 20, 2009

Tis the Time for an Update

We have been working on the various things we mentioned last time. Some progress has been made with the normalization technique described last time, along with the floating point implementation. Normalization no longer just blurs the image, but it does not yet work consistently (the image is either normalized properly, or blacked out entirely). This seems to be due to the parameters to our D0G, which we have yet to find any that work for all images.

Also, as mentioned last time, we are working on a symmetry map. It has been implemented in MATLAB, though it works very slowly. One example:





Our symmetry map is based off the work by Peter Kovesi, Symmetry and Asymmetry From Local Phase (ps)

Tuesday, February 10, 2009

Most Recent Update Ever Up Until Another Update

As mentioned in the previous post, images can seemingly shift towards the bottom right corner when using the standard gaussian kernel in OpenCV in its pyramid function. To get around this, we wrote our own code to do the pyramids and now this issue is far less noticeable. This is a side effect from using the pyramids - they are much faster than applying a wide bank of filters but come with this small downside.

We are experimenting with different ways to weight the various maps we combine for our final saliency map. We tried one method which waited them according to their mean intensity values, but found the results to be very inconsistent.

We're also producing both an RGB based and Lab based set of maps now so we can visually compare them. In most cases Lab seems to outperform the RGB based maps - sometimes to a great degree:

Original Image:

Final Lab Map:

Final RGB Map:

I've also been playing around with how we are representing images. Currently we are using 8 bits per channel so pixel values are integers from 0 to 255. I tried an implementation where all image data was stored as floating point values, 0 to 1, and the results were slightly different. This is something we need to investigate more as we continue.

The normalization technique we are currently using involves multiplying the image by (max - mean(local_max))^2, which has the issue of non-fixed range, which is why our final maps have large regions of similar intensities instead of a more gradual build up. A newer technique proposed by Itti et al. involves iterating a Difference of Guassians filter over the image to perform normalization. We have experimented with this but are having issues getting it to work as expected - our images come out very blurred.

Our immediate plans are to implement a low level symmetry filter and use that in our final saliency map. In addition we are going to refine our current methods as stated above and see if any improvement is detected.

Wednesday, February 4, 2009

Moving to OpenCV

While Kevin was working on an initial Matlab implementation I began working on an OpenCV implementation to see which would perform better. OpenCV tends to run faster than even the optimized Matlab code, so we are most likely going to stick to a final implementation written in C++ using OpenCV + other stuff.

OpenCV isn't without its annoying quirks though. Often times when creating a supposedly all black new image, many pixels in the image will be filled with random values, resulting in a noisy image. This is corrected by multiplying the image with a zero filled matrix, but the extra step adds time and we haven't figured out a way around this yet.

Right now the saliency maps we are generating are essentially the same as those used by Itti, Koch, and Niebur. We deviate slightly in regard to our color implementation. Instead of using an RGB image to extract intensity, RG, and BY values, we convert use the L*a*b* color model. This model stores the image in thee channels: intensity (lightness), RG (red versus green), and BY (blue versus yellow). The end result is the same - we compute an intensity, red versus green, and blue versus yellow map of the image.

We also use a bank of eight Gabor filters on a grayscale version of our image. These filters are of varying rotations and scales and respond strongly whenever their orientation matches that found in the image.

This is the general algorithm we are following right now:

  • Split the image into L*, a*, b*, as well as grayscale channels
  • Compute center surround differences on the L*, a*, and b* images. This basically means that we feed the image through a guassian pyramid and compute differences between scales, normalizing at each step.
  • For each gabor filter in the bank, compute the center surround differences and normalize as in the color maps. The final gabor map is the mean of every image in the bank.
  • Take the four maps and compute a final map, weighting luminosity, color, and orientation equally.

We are still experimenting with different weights for the final map. Right now each color map receives half the weight of the luminosity map.

I've also been experimenting with thresholding throughout the process and have found that removing all pixels less than 1/10th the (max - min) in the L*a*b* process results in a cleaner end map.

And now some pretty pictures:













































One issue you may notice from these images is that our maps seem to drift towards the bottom right of the image. This is an issue brought up by Dirk Walther that results from decimating an image after a gaussian kernel has been applied. It can be mostly corrected by convolving the image again with a simple kernel such as [1 1]/2. This image shows the effect of the second convolution (from Walther):

Papers:

Walther, Interactions of Visual Attention and Object Recognition: Computational Modeling, Algorithms, and Psychophysics, 2006

Orientation and the Gabor Filter

To examine orientation as saliency channel, we adopted the model proposed by Itti et al, wherein a Gabor Filter is applied to the image. The Gabor Filter uses a 2D Gabor function (a sinusoid multiplied by a Gaussian kernel) as its impulse response. This can roughly be visualized as:
1D
2D

Note the size and ellipticity of the Gabor function, and standard deviation of the Gaussian are variable.

The filter then applies this function to the image, emphasizing edges along the rotation of the Gaussian envelope.

Below are several examples of applying different orientations of the Gabor function to an image.

pi/4
pi/2
3pi/4
pi


Note a wavelength of 10 and a circular (ellipticity = 1) are used with a StD of ~5.6.

Each orientation map is created in the usual way, by taking differences of several scales of the filtered image.

J Movellan, Tutorial on Gabor Filters
OpenCV implementation: http://www.personal.reading.ac.uk/~sir02mz/CGabor/example.html