Paper: [Link]

Code: N/A

**Features:**

- Use CP decomposition with NLS (non-linear least squares) method
- minimizes the L2-norm of the approximation residual (for a user-defined fixed R) using Gauss-Newton optimization.

- Decompose the 4D kernel tensor
- The convolution kernel itself constitutes a 4D tensor with dimensions corresponding to the two spatial dimensions, the input image maps, and the output image maps.

**Findings:**

- CP-decomposition approximates the convolution with a 4D kernel tensor by the sequence of four convolutions with small 2D kernel tensors. This decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with small kernels.
- fine-tune the
**entire**network on training data using back-propagation.- This discriminative fine-tuning works well, even when CP-decomposition has large approximation error.

**Other Knowledge:**

- On the theoretical side, these results confirm the intuition that modern CNNs are over-parameterized, i.e. that the sheer number of parameters in the modern CNNs are not needed to store the information about the classification task but, rather, serve to facilitate convergence to good local minima of the loss function

**Useful reference:**

- Suggested a scheme based on the CP-decomposition of parts of the kernel tensor obtained by biclustering (alongside with a different decompositions for the first convolutional layer and the fully-connected layers). CP-decompositions of the kernel tensor parts have been computed with the greedy approach. Only fine-tunes the layers above the approximated one.
- Denton, Emily, Zaremba, Wojciech, Bruna, Joan, LeCun, Yann, and Fergus, Rob. Exploiting linear structure within convolutional networks for efficient evaluation. arXiv preprint arXiv:1404.0736, 2014.

- Effectively approximate the 4D kernel tensor as a composition (product) of two 3D tensors, perform “local” fine-tuning that minimizes the deviation between the full and the approximated convolutions outputs on the training data.
- Jaderberg, Max, Vedaldi, Andrea, and Zisserman, Andrew. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference (BMVC), 2014a.

- there is no finite algorithm for determining canonical rank of a tensor. [Paper]

**Dataset:**

- ImageNet