# Scale Invariant Feature Transform

SIFT is a local descriptor which is invariant to image translation, scaling and rotation, and partially invariant to occlusion/noise and change in illumination and viewpoint [32]. Lots of algorithms have been proposed recently that successfully utilize SIFT for image classification and object recognition [2, 10, 62, 64]. There are mainly four stages of computation involved in the generation of the SIFT feature described in the following sections.

## Scale-Space Extrema Detection

The first stage of computation is to identify the locations of candidate interest points which are invariant to orientation and scale change. A continuous function of scale, also named as scale space, is used to search stable features through all possible scales. It is verified that the Gaussian function is the only possible scale-space kernel [31], therefore the scale space of an image *I* can be generated by

1 -(x^{2}+y^{2})

where *G(x, y, <**7**) =* 5772 exp *2**7*^{2} is a Gaussian function and * is the convolution operation.

The locations of stable keypoints in scale space are detected by using the extrema of the difference-of-Gaussian (DoG) function convolved with the original image. It can be formulated by

An efficient way to generate *D(x, y,a)* is illustrated in Fig.2.13 [32]. Each octave of scale space is divided into *t* intervals (t should be an integer number) and *r = **2** ^{l/t }*is a multiplicative constant. The Gaussian image in next octave is the one in the

**Fig. 2.13 **Construction of *D(x, y, a).* The scale space images are produced by convolving initial image incrementally with Gaussians. The DoG images are produced by subtracting adjacent Gaussian images previous octave downsampled by a factor of 2. It is verified that we have to produce *t +* 3 Gaussian images for each octave to insure that the extrema detection can cover the complete octaves. The DoG images are generated by subtracting adjacent blurred images.

To find the local extrema of the DoG images, each sample has to be compared to its 26 neighbors in 3 x 3 regions of the DoG images at the current scale (eight neighbors), above scale (nine neighbors) and below scale (nine neighbors). Samples which are the maxima or minima among all of their neighbors are identified as the keypoint candidates.