Paper: [Link]

Code: [Link]

**Features:**

- For sparse, semi-sparse tensors and sparse, semi-sparse matrices
- semi-sparse: one mode is either sparse or dense.

- Support CSR, DCSR, BCSR (for matrices) and CSF (for tensors)
- Need to specify the order of dimensions
- Can support MTTKRP with sparse matrices, this can be used later.

- No auto-tuning now, decision is left to the user.
- Evaluate SpMV, compound linear algebra, high-order tensor algebra (e.g. MTTKRP, TTM, TTV, INNERPROD, ELEMENTWISE-PLUS)
- Doesn’t support parallel execution if the output is sparse.
- “assemble” method is to assemble the sparse index structure of the output tensor and preallocate memory for it.

**Findings:**

**“tensor storage formats**that separately designate each dimension as dense or sparse and specify their storage order, which can describe several widely used formats but generalizes to many more (Section 3);**iteration graphs**that describe how to co-iterate through the multi-level index data structures of the sparse operands in a compound tensor expression (Section 4);- Each node corresponds to a loop in a loop nest.
- A tensor path is a tuple of index variables.
- Do not support cycles, because back edges require traversal in the wrong direction of a format. The solution now is to provide a function to change tensor formats that can be used to break cycles and transpose tensors.

**merge lattices**that describe how to merge the index data structures of sparse tensors that are used in the same tensor expression (Section 5);**a code generation algorithm**that uses the above concepts to generate efficient code that computes a tensor expression with dense, sparse, and mixed operands (Section 6).”

**Motivation:**

- “First, sparse input tensors are compressed into indexed data structures that the kernels must operate on. Second, only non-zero output values should be produced, and the calculations differ between computed components depending on the input tensors that contribute non-zero values. Finally, sparse tensor data structures typically do not support O (1) random access, so kernels must carefully orchestrate co-iteration over multiple tensors.”
- “Sparse data structures can only be accessed efficiently in one direction.” (Why SPLATT CSF code achieve similar performance for each mode?)
- “The number of formats increases exponentially as dimensionality increases by d!2^d.”
- “A sparse storage dimension does not allow efficient random access to indices and values but is optimized for iteration in a specific order.”

- “Libraries provide a limited set of hand-optimized operations and programmers compute compound operations through a sequence of supported operations using temporary tensors. This reduces locality and effciency, and for some compound operations the temporaries are much larger than the kernel’s inputs and output. On the other hand, it is infeasible to write optimized code for every compound operation by hand because of the combinatorial explosion of all possible compound operations, tensor orders, and tensor formats.”
- “Our technique generates code entirely from tensor index notation and simple format descriptors that fully specify the compressed data structures. Thus, we do not perform pointer alias or dependency analysis at compile time, nor at runtime as in inspector-executor systems.”

**Useful References:**

**Dataset:**

- SuiteSparse Matrix Collection
- FROSTT

**Platform & Software:**

- C++ library

**Comparison Software:**

- A batch of sparse matrix libraries (e.g. OSKI, pOSKI, MKL, Eigen, uBLAS, Gmm++)
- Tensor: SPLATT, Tensor Toolbox

**Questions:**

- why “384 possible kernels for all the combinations of formats and implementations” (p.3)?
- Why serial CSR SpMV code from taco is faster than other libraries? taco didn’t do any particular optimization.
- Why Fig. 14 serial, dense-blocked SpMV taco code is slower than OSKI, but its parallel code is faster?

Advertisements