Why use tensors:
 natural data representation
 get better compression when constructing a tensor from a vector or matrix, and efficient operation on the compressed tensor formats (e.g. canonical, Tucker, TT formats)
Problems:
 No library to support fast tensor operations and tensor decompositions [source]
 Dimensionality curse (Need to be more clear)
 Space
 Running time
Tensor Decompositions:
 CP Decomposition:
 The decomposition of tensor T is unique (up to scaling and permutation) if none of the vector pairs are colinear.
 Matrix decomposition (e.g. SVD) is not unique.
 Algorithm: CPALS, CPAPR
 Tucker Decomposition:
 tensor power method:
 Tensor Train: [Paper]
 Hierarchical Tucker: [Paper]
Tensor Decomposition Applications:
 Healthcare:
 Phenotyping, using unsupervised tensor decomposition instead of traditionally supervised methods. (Check more on Jimeng’s class)
 Why use tensor decomposition instead? (Reason in “Marble: Highthroughput Phenotyping from Electronic Health Records via Sparse Nonnegative Tensor Factorization” — Joyce Ho et al. 2014)
 Deep Learning:
 An area where tensors are used is the class of recursive tensor networks. These are compositional models which require multilinear operations. Having efficient tensor contractions which do not require data movement can greatly improve the efficiency of training these models. [source]
 Use tensor decomposition to compress the kernel tensor for convolutional layers (“SPEEDINGUP CONVOLUTIONAL NEURAL NETWORKS USING FINETUNED CPDECOMPOSITION” — Vadim Lebedev et al. 2015) or weight matrices of the fullyconnected layers (“Tensorizing Neural Networks” — Alexander Novikov et al. 2015)
 Machine Learning:
 design learning algorithms for estimating parameters of latent variable models like Hidden Markov Model, Mixture of Gaussians and Latent Dirichlet Allocation, community models, probabilistic ContextFreeGrammars, and twolayer neural networks. [source]

Tensor methods are very competitive for unsupervised learning of largescale probabilistic latent variable models, as opposed to traditional methods such as expectation maximization (EM) or Markov chain Monte Carlo (MCMC). The main gain is in terms of computation: (i) tensor methods are embarrassingly parallel and scalable to large problems, (ii) they can build on efficient linear algebraic libraries, but are much more powerful and informative compared to matrix methods. On the other hand, tensor methods are not sample efficient, meaning they require more samples than EM to reach the same level of accuracy (assuming computation is not an issue). Improving statistical efficiency of spectral methods is an ongoing research topic. [source]
 Data compression
Build tensors:
 Build tensors from algorithm property, then do tensor decomposition
 Build tensors from applications nature, then do tensor approximation
 Build tensors from vectors or matrices, then do tensor approximation for data compression