Why use tensors:
- natural data representation
- get better compression when constructing a tensor from a vector or matrix, and efficient operation on the compressed tensor formats (e.g. canonical, Tucker, TT formats)
- No library to support fast tensor operations and tensor decompositions [source]
- Dimensionality curse (Need to be more clear)
- CP Decomposition:
- The decomposition of tensor T is unique (up to scaling and permutation) if none of the vector pairs are co-linear.
- Matrix decomposition (e.g. SVD) is not unique.
- Algorithm: CP-ALS, CP-APR
- Tucker Decomposition:
- tensor power method:
- Tensor Train: [Paper]
- Hierarchical Tucker: [Paper]
Tensor Decomposition Applications:
- Deep Learning:
- Machine Learning:
- design learning algorithms for estimating parameters of latent variable models like Hidden Markov Model, Mixture of Gaussians and Latent Dirichlet Allocation, community models, probabilistic Context-Free-Grammars, and two-layer neural networks. [source]
Tensor methods are very competitive for unsupervised learning of large-scale probabilistic latent variable models, as opposed to traditional methods such as expectation maximization (EM) or Markov chain Monte Carlo (MCMC). The main gain is in terms of computation: (i) tensor methods are embarrassingly parallel and scalable to large problems, (ii) they can build on efficient linear algebraic libraries, but are much more powerful and informative compared to matrix methods. On the other hand, tensor methods are not sample efficient, meaning they require more samples than EM to reach the same level of accuracy (assuming computation is not an issue). Improving statistical efficiency of spectral methods is an ongoing research topic. [source]
- Data compression
- Build tensors from algorithm property, then do tensor decomposition
- Build tensors from applications nature, then do tensor approximation
- Build tensors from vectors or matrices, then do tensor approximation for data compression
Introduction to Tensor Methods: Link
- sparse tensor
- sparse factor matrices
- Poisson regression, based on CP-APR
- For count data
Phenotyping application background:
- A major limitation of existing phenotype efforts is the need for human annotation of case and control samples, which require substantial time, effort, and expert knowledge to develop.
- phenotyping can be viewed as a form of dimensionality reduction, where each phenotype forms a latent space
- citation: G. Hripcsak and D. J. Albers. Next-generation phenotyping of electronic health records. JAMIA, 20(1):117–121, Dec. 2012.
Tensor factorization vs matrix factorization:
- Matrix factorization, a common dimensionality reduction approach, is insufficient as it cannot concisely capture structured EHR source interactions, such as multiple medications prescribed to treat a single disease
- Constraints on the factor matrices to minimize the number of non-zero elements
- augmentation of the tensor approximation
- Marble decomposes an observed tensor into two terms, a bias (or offset) tensor and an interaction (or signal) tensor (similar to CP-APR factorized tensor).
- The bias tensor represents the baseline characteristics common amongst the overall population and also provides computational stability. The interaction term is compromised of concise, intuitive, and interpretable phenotypes in the data.
- Marble achieves at least a 42.8% reduction in the number of non-zero elements compared to CP-APR without sacrificing the quality of the tensor decomposition.
- concept discovery: U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. Gigatensor: Scaling tensor analysis up by 100 times-algorithms and discoveries. In KDD 2012, pages 316–324, 2012.
- network analysis of fMRI data: I. Davidson, S. Gilpin, O. Carmichael, and P. Walker. Network discovery via constrained tensor analysis of fMRI data. In KDD 2013, Aug. 2013.
- community discovery: Y.-R. Lin, J. Sun, H. Sundaram, A. Kelliher, P. Castro, and R. Konuru. Community discovery via metagraph factorization. ACM Transactions on Knowledge Discovery from Data, 5(3), Aug. 2011
Sparsity constrained factor matrices for sparse tensor decomposition:
- Traditional sparsity-inducing penalties such as ℓ1 and ℓ2 regularization only deal with the standard least-squares minimization.
- D. Wang and S. Kong. Feature selection from high-order tensorial data via sparse decomposition. Pattern Recognition Letters, 33(13):1695–1702, 2012.
- Non-parametric Bayesian approaches to sparse Tucker decomposition
- Z. Xu, F. Yan, Yuan, and Qi. Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis. In ICML 2012, pages 1023–1030. Alan, 2012.
- A multi-layer NTF has been proposed to achieve sparse representations for various cost functions including KL divergence using a nonlinearly transformed gradient decent approach
- Proposed sHOPCA, based on HOOI algorithm
- A. Cichocki, R. Zdunek, S. Choi, R. Plemmons, and S.-I. Amari. Novel multi-layer non-negative tensor factorization with sparsity constraints. In ICANNGA 2007, pages 271–280. Springer, 2007.
- CP-APR: E. C. Chi and T. G. Kolda. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299, 2012.
- Survey: A. Cichocki, R. Zdunek, A. H. Phan, and S.-I. Amari. Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. Wiley, 2009.
- CMS data: the Centers for Medicare and Medicaid Services (CMS) provides the CMS Linkable 2008-2010 Medicare Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF), a public available dataset.
- 10,000 pateints, 129 diagnoses, 115 procedures