Paper: [Link]

Code: Will be in SPLATT

**Features:**

- sparse tensors
- TTMc in CSF format on multi-core CPU architecture (no Tucker decomposition result shown)
- Get 20.7x speedup while mostly using less memory (up to 28.5x)
- Fig. 7: good analysis for time and memory!

**Findings:**

- Similar with SPLATT, decrease the time complexity along with tree levels.
- The intermediate memory is limited to a single row of Y(n).
- Parallelized by distributing the I1 (first mode) trees to threads.
- Present synchronization using mutexes for write conflicts when updating any modes other than the first one.
- Using multiple CSF representations
- Heuristically sorting mode order, assume K CSF representations can be stored,
- For X1, sort the modes by their lengths, with the shortest mode placed at the top level.
- The remaining K−1 representations are selected in a greedy fashion: at step k, use (4) to examine the costs associated with TTMc for each mode when provided with X 1, . . . , X k−1. The mode with the highest cost is placed at the top level of Xk, and the remaining modes are sorted by increasing length.
- At the end of this procedure, each mode has the choice of K representations to use for TTMc computation. We assign each mode to the representation with the lowest cost, and use that representation for TTMc.
- Importantly, if ties are broken in a consistent manner, then it happens in practice that several modes can be assigned to the same X k , meaning that fewer than K representations need be kept in memory for computation.

**Useful References:**

- Existing strategies for performing TTMc either rely on memoizing intermediate results to save computation [Efficient and scalable computations with sparse tensors], High-performance parallel algorithms for the tucker decomposition of higher order sparse tensors or operating in a memory-efficient manner at the expense of additional floating-point operations (FLOPs) .
- HOSVD is popular for decomposing dense tensors and efficient parallel algorithms have been developed [Efficient and scalable computations with sparse tensors], [On optimizing distributed tucker decomposition for dense tensors.]. However, the computation becomes progressively more dense during HOSVD and it is not of- ten applied to sparse computations. Thus, HOOI is the most popular algorithm for sparse tensors and is the focus of this work.
- Dimension tree are flexible data structures which partition the modes of a tensor in a hierarchical fashion. Proposed by [Hierarchical singular value decomposition of tensors]

**Dataset:**

- Nell-2, Enron [FROSTT]
- Netflix (not public), Alzheimer (From Muthu Baskaran)
- Synthetic: Possion3D, Possion4D

**Platform & Software:**

- Intel Haswell E5-2680v3
- OpenMP
- Intel compiler with Intel MKL

**Comparison Software:**

- HyperTensor: HT-FLAT, HT-BTREE