“Tensorizing Neural Networks” — Alexander Novikov et al. 2015

Paper: [Link]

TensorNet Code: [Link]

Features:

  • Use Tensorization to build a tensor from a vector and a matrix
  • Optimize fully-connected layers

Findings:

  • Convert the dense weight matrices of the fully-connected layers to the Tensor Train format. Use TT-format to do TT-layer and learning steps.
  • Potentially address the two issues of wide and shallow network. (see below for the issues)
  • Result: for the Very Deep VGG networks [21] we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times.

Other Knowledge:

  • These advances of Deep neural networks have become possible because of algorithmic advances, large amounts of available data, and modern hardware.
  • State-of-the-art neural networks reached the hardware limits both in terms the computational power and the memory. A large number of works tried to reduce both hardware requirements (e. g. memory demands) and running times.
  • One of the most straightforward approaches is to use a low-rank representation of the weight matrices. Recent studies show that the weight matrix of the fully-connected layer is highly redundant and by restricting its matrix rank it is possible to greatly reduce the number of parameters without significant drop in the predictive accuracy.
  • Matrix and tensor decompositions were recently used to speed up the inference time of CNNs
    • Matrix: E. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure within convolutional networks for efficient evaluation,” in Advances in Neural Information Processing Systems 27 (NIPS), 2014, pp. 1269–1277.
    • Tensor: V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky, “Speeding-up convolutional neural networks using fine-tuned CP-decomposition,” in International Conference on Learning Representations (ICLR), 2014.
  • TT-format is immune to the curse of dimensionality and its algorithms are robust.
  • An arbitrary tensor A a TT-representation exists but is not unique.
  • An attractive property of the TT-decomposition is the ability to efficiently perform several types of operations on tensors if they are in the TT-format:
    • basic linear algebra operations, such as the addition of a constant and the multiplication by a constant, the summation and the entrywise product of tensors (the results of these operations are tensors in the TT-format generally with the increased ranks); computation of global characteristics of a tensor, such as the sum of all elements and the Frobenius norm.
  • Traditionally, very wide shallow networks are not considered because of high computational and memory demands and the over-fitting risk.

Useful reference:

  • Tensor Train paper: I. V. Oseledets, “Tensor-Train decomposition,” SIAM J. Scientific Computing, vol. 33, no. 5, pp. 2295– 2317, 2011.
    • Application: A. Novikov, A. Rodomanov, A. Osokin, and D. Vetrov, “Putting MRFs on a Tensor Train,” in International Conference on Machine Learning (ICML), 2014, pp. 811–819.
  • Hierarchical Tucker paper: W. Hackbusch and S. Kuhn, “A new scheme for the tensor representation,” J. Fourier Anal. Appl., vol. 15, pp. 706–722, 2009.

Dataset:

  • MNIST: small, handwritten-digit recognition
    • Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST database of handwritten digits,” 1998.
  • CIFAR-10: small, 50,000 train and 10,000 test 32*32 3-channel images, assigned to 10 different classes
    • A. Krizhevsky, “Learning multiple layers of features from tiny images,” Master’s thesis, Computer Science Department, University of Toronto, 2009
  • ImageNet: large, 1000-class ImageNet ILSVRC-2012 dataset, 1.2 million training images and 50,000 validation images.
    • A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25 (NIPS), 2012, pp. 1097–1105.
Advertisements

2 thoughts on ““Tensorizing Neural Networks” — Alexander Novikov et al. 2015

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s