# Neural Network for Regression

Using DNNs for predicting the signal on an additional shell is motivated by the UAT, which states that a feed forward neural network with one hidden layer and a finite number of neurons can approximate any continuous function with arbitrary accuracy [4, 10]. However, the UAT is rather related to function *representation *with neural networks instead of learnability in practice. In order to ensure that the network parameters (i.e. weights) can be learned from training data, a sufficiently high number of training samples is required. For this reason, a large dataset consisting of 100 subjects from the HCP database is employed in the learning process.

The proposed DNN is designed to predict spherical harmonics coefficients representing a HARDI signal on one shell from HARDI signals on one or more other shells of the same voxel.

## Spherical Harmonics

Spherical harmonics (SH) are an orthonormal basis for spherical functions that can represent dMRI signals in a compact manner. In this work, we utilize the modified SH basis as defined in [5], which restricts the SH basis to be real and symmetric. The dMRI signal S in matrix form can be written as a linear combination of the modified SH basis B and a SH coefficient vector C, i.e. S = BC. The SH coefficients C are

**Table 1 **Topology of the neural network

# |
Type |
Parameters |

1 |
Input |
#neurons = #gradients |

2 |
Fully-connected |
100 neurons |

3 |
ReLU |
- |

4 |
Fully-connected |
10 neurons |

5 |
ReLU |
- |

6 |
Fully-connected |
200 neurons |

7 |
Output |
#neurons = #SH coefficients |

calculated for every shell using a least-squares fit with regularization

with X *=* 0.006 as explained in [6].

## Deep Neural Network

The DNN that predicts SH coefficients consists of an input layer which is fed with dMRI signals, three hidden layers and an output layer comprising one neuron for every SH coefficient. In contrast to the original formulation of the UAT, we incorporate several hidden layers instead of one hidden layer only, as more recent research related to deep learning suggests that deep networks represent functions more efficiently than shallow networks [5]. The activation functions between hidden layers are Rectifying Linear Units (ReLUs) with *f (x) =* max(0, *x).* With SH coefficients *c* for the corresponding shell and the predicted SH coefficients *C*; as the DNN output, we choose the loss function to be the mean squared error

where *N* is the number of SH coefficients. The loss is minimized with the Adagrad optimizer [7], which is an advancement of stochastic gradient descent with an adaptive learning rate. An overview of the networkâ€™s topology is provided in Table 1.