# HEp-2 Cell Image Representation in the Adaptive CoDT Feature Space

In the previous section, we model the CoDT feature space of HEp-2 cell images as a GMM, and learn the adaptive parameters *X = {w _{t}, fi_{t}, E_{t}, t =* 1, 2,

*T}*of the GMM. The samples

*X*can be described by the following gradient vector, a.k.a.

*score function:*

The gradient describes how the parameters *X* should be justified to best fit the input *X*. To measure the similarity between two HEp-2 cell images, a *Fisher Kernel* (FK) [15] is calculated as

where *F _{X}* is the

*Fisher Information Matrix*(FIM) formulated as

The superscript *T* means the transpose of *G*_{X}. Fisher information is a measurement about the amount of information that *X* carries with respect to parameters X.

As *F _{X}* is symmetric and positive semi-deflnite, and

*Ff*can be decomposed as

^{1}*F*

^{fl}= L^{T}kL_{X}, the FK can be rewritten as

where

The normalized gradients with respect to the weights *w*_{t}, the mean *fi _{t}* and covariance

*X*also correspond respectively to 0-order, 1st-order and 2nd-order statistics.

_{t}Let *z(t)* denote the occupancy probability of the CoDT feature *x _{n}* for the

*t*-th Gaussian:

It can be also regarded as the soft assignment of *x _{n}* to the

*t*-th Gaussian.

To avoid enforcing explicitly the constraints in (7.6), we use a parameter *e _{t}* to re-parameterize the wight parameter

*w*following the soft-max formalism, which is defined as:

_{k}The gradients of a single CoDT feature *x _{n}* w.r.t the parameters

*e*

_{t},

*/u*and

_{t}*a*of the GMM can be formulated as

_{t}

where the superscript *d* denotes the *d*-th dimension of the input vector.

Then, the normalized gradients are computed by multiplying the square-root inverse of the diagonal FIM. Let *f*_{e},, *f^* and *f _{a}d* be the entry on the diagonal of

*F*

corresponding to *y _{St}* log

*p(x*|X),

_{n}*y^d*log

*p(x*|X) and

_{n}*y*log

_{a}d*p(x*|X) respectively, and calculated approximately as Д =

_{n}*w*,

_{t}*f^ = w*)

_{t}/(ad^{2}and

*f*)

_{a}d = 2w_{t}/(a_{t}^{d}^{2}. Therefore, the corresponding gradients as follows:

The Fisher representation is the concatenation of all the gradients for *d = *1, 2,*D* dimension of the CoDT feature and for *T* Gaussians. In our cases, we only consider the gradients with respect to the mean and covariance, i.e., *G^d* (*X*) and *G _{a}d* (

*X),*since the gradient with respect to the weights is verified that bring little additional information [13]. Therefore the dimension of the resulting representation is 2

*DT*. The CoDT features are embedded in a higher-dimensional feature space which is more suitable for linear classification.

To avoid dependence on the sample size, we normalize the final image representation by the size of CoDT features from the HEp-2 cell image, *N*, i.e., *G (X*) = N*Gx (X).* After that, two additional normalization steps [23] are conducted in order to improve the results, that are the power normalization and ^-normalization.

Power normalization is performed in each dimension as:

In this study, we choose the power coefficient *т =* 2. The motivation of power normalization is to “unsparsify” the Fisher representation which becomes sparser while the number of Gaussian components of the GMM is increasing.

^-normalization is defined as:

Our proposed AdaCoDT method has several advantages over the BoW framework [13, 23]. Firstly, it is a generalization of the BoW framework. The resulting representation is not limited to the occurrences of each visual word. It additionally includes the information about the distribution of the CoDT features. It overcomes the information loss raised by the quantization procedure of the BoW framework. Secondly, it defines a kernel from a generative model of the data. Thirdly, it can be generated from a much smaller codebook and therefore it reduces the computational cost compared with the BoW framework. And lastly, with the same size of vocabulary, it is much larger than the BoW representation. Hence, it assures an excellent performance with a simple linear classifier.