Summary of Tensor Decomposition from High Performance Computing View (Updating)

Survey:

Tensor Formats:

Sparse Tensor Parallelization:

Current Development:

  • Fundamental tensor operations
    • TTM
    • MTTKRP
    • Tensor contraction
  • Tensor Decompositions
    • CP decomposition
    • Tucker decomposition
    • (Anima’s )
    • Tensor Train decomposition
    • Hierarchical Tucker decomposition

Applications:

  • Healthcare
  • Deep Learning
  • Traditional Machine Learning
  • Social Networks

Software:

  • Matlab
    • Tensor Toolbox
    • N-way Toolbox
  • C++
    • CTF (Cyclops Tensor Framework)
    • SPLATT
    • ParTI
Advertisements

Review of Numerical Methods

Basics:

  •  Vectors:
    • plus, minus, scalar multiplication
    • magnitude and direction
    • inner product and angle
      • Cauchy-Schwarz inequality
    • parallel and orthogonal
    • projection
    • cross product
      • orthogonal to both the two input vectors
      • right-hand rule
      • magnitude is the area size of the parallelogram, or twice of the area size of the triangle
  • Linear equations
    • Lines in two-dimensions
      • Parameterization: define a line: basepoint, direction vector
      • Direction vector for Ax+By=k is [B, -A]. A normal vector is [A, B].
      • Two lines are parallel if their normal vectors are parallel vectors.
      • If two lines are not parallel, then they have a unique intersection.
      • If two lines are parallel, they may not have intersection at all or the same line with infinitely many intersections.
      • Two parallel lines are equal, <=> the vector connecting one point on each line is orthogonal to the lines’ normal vectors.
      • If two non-parallel lines Ax+By=k1, Cx+Dy=k2; then A and C have one zero at most, AD-BC!=0.
        • The intersection is x=(Dk1-Bk2)/(AD-BC); y=(-Ck1+Ak2)/(AD-BC).
      • Use normal vectors is better for high dimensions.
    • Planes in three dimensions
      • Ax+By+Cz=k,
      • Normal vector: [A,B,C]
      • If two planes are equal <=> the vector connecting one point on each plane is orthogonal to the planes’ normal vectors.
      • Given Ax+By+Cz=k1, Dx+Ey+Fz=k2, possible solutions sets are:
        • a line with direction vector [A,B,C] x [D,E,F], if planes are not parallel;
        • no solutions, if planes are parallel but not equal;
        • a plane, if the planes are the same.
      • More planes could intersect in a single point.
      • We need at least two lines in two variables to obtain a unique intersection; We need at least three planes in three variables to obtain a unique intersection.
      • Rules for manipulating equations
        • Should preserve the solution, should be reversible
        • swap order of equations
        • multiply an equation by a nonzero number
        • Add a multiple of an equation to another
      • A system is inconsistent <=> we find 0=k for k nonzero during Gaussian elimination.
      • It’s not enough to count # of equations (usually) or look for 0=0 to determine if infinitely many solutions.
      • A consistent system has a unique solution <=> each variable is a pivot variable.
      • #Free variables = dimension of solution set

Tensor Summary

Why use tensors:

  • natural data representation
  • get better compression when constructing a tensor from a vector or matrix, and efficient operation on the compressed tensor formats (e.g. canonical, Tucker, TT formats)

Problems:

  • No library to support fast tensor operations and tensor decompositions [source]
  • Dimensionality curse (Need to be more clear)
    • Space
    • Running time

Tensor Decompositions:

  • CP Decomposition:
    • The decomposition of tensor T is unique (up to scaling and permutation) if none of the vector pairs are co-linear.
    • Matrix decomposition (e.g. SVD) is not unique.
    • Algorithm: CP-ALS, CP-APR
  • Tucker Decomposition:
  • tensor power method:
  • Tensor Train: [Paper]
  • Hierarchical Tucker: [Paper]

Tensor Decomposition Applications:

  • Healthcare:
  • Deep Learning:
  • Machine Learning:
    • design learning algorithms for estimating parameters of latent variable models like Hidden Markov Model, Mixture of Gaussians and Latent Dirichlet Allocation, community models, probabilistic Context-Free-Grammars, and two-layer neural networks. [source]
    • Tensor methods are very competitive for unsupervised learning of large-scale probabilistic latent variable models, as opposed to traditional methods such as expectation maximization (EM) or Markov chain Monte Carlo (MCMC). The main gain is in terms of computation: (i) tensor methods are embarrassingly parallel and scalable to  large problems, (ii) they can build on efficient linear algebraic libraries, but are much more powerful and informative compared to matrix methods. On the other hand, tensor methods are not sample efficient, meaning they require more samples than EM to reach the same level of accuracy (assuming computation is not an issue). Improving statistical efficiency of spectral methods is an ongoing research topic. [source]

  • Data compression

Build tensors:

  • Build tensors from algorithm property, then do tensor decomposition
  • Build tensors from applications nature, then do tensor approximation
  • Build tensors from vectors or matrices, then do tensor approximation for data compression

Tensor Researchers

Machine Learning:

  • Animashree Anandkumar, UC Irvine
  • Jimeng Sun, GaTech
  • Joyce Ho, Emory
  • Nicholas D. Sidiropoulos, UMN
  • Christos Faloutsos, CMU
  • Lieven De Lathauwer, KU Leuven
  • Evangelos E. Papalexakis, UC Riverside

Numerical:

  • Tamara Kolda, Sandia NL

HPC:

  • George Karypis, UMN
  • Bora Uçar, Inria and LIP