“Tensor Contraction Layers for Parsimonious Deep Nets” — Jean Kossaifi et al. 2017

Name: TCL

Paper: [Link]

Code: N/A


  • First work on applying tensor decomposition as a general layer or replace fully-connected layers.
  • Use TTM operations
  • Tensor modes meaning: height, width, channel in order.
  • In TCL layers, height size (128-512 in the paper) is much larger than width size (always 3 in paper).
  • Good references


  • TCLs reduce the dimensionality of the activation tensors (only for two spacial modes of images, leaving the modes corresponding to the channel and the batch size untouched) and thus the number of model parameters, at the same time, preserve high accuracy.
  • Optimize fully-connected layers using tensor factorizations, using two approaches:
    • TCL as an additional layer: reducing the dimensionality of the activation tensor before feeding it to the subsequent two (or more) fully-connected layers and softmax output of the network. This approach preserves or even increase the accuracy.
    • TCL as replacement of a fully-connected layer (partial or full replacement of fully-connected layers): this approach affect the accuracy a bit, but significantly reducing the number of parameters
    • Take the input to the fully-connected layers as an activation tensor X of size (D1, …, DN), we seek a low dimensional core tensor G of sealers size (R1, … RN).
  • Both number of parameters and time complexity of a TCL is smaller then a fully-connect layer. (Detailed comparison of the complexity and number of parameters is in the paper.)
  • To avoid vanishing or exploding gradients, and to make the TCL more robust to changes in the initialization of the factors, we added a batch normalization layer [8] before and after the TCL.

  • Future work
    • we plan to extend our work to more net- work architectures, especially in settings where raw data or learned representations exhibit natural multi-modal structure that we might capture via high-order tensors.

    • We also endeavor to advance our experimental study of TCLS for large-scale, high-resolutions vision datasets.

    • Plan to integrate new extended BLAS primitives which can avoid transpositions needed to compute the tensor contractions.
    • we will look into methods to induce and exploit sparsity in the TCL, to understand the parameter reductions this method can yield over existing state-of-the-art pruning methods.

    • we are working on an extension to the TCL: a tensor regression layer to replace both the fully-connected and final output layers, potentially yielding in- creased accuracy with even greater parameter reductions.

Other Knowledge:

  • Fully-connected layers hold over 80% of the parameters.

Useful reference:


  • AlexNet
  • VGG


  • CIFAR100
  • ImageNet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s