S-HOT@WSDM'17

S-HOT: Scalable High-Order Tucker Decomposition

Jinoh Oh¹, Kijung Shin², Evangelos E. Papalexakis³, Christos Faloutsos², Hwanjo Yu^*1

¹Pohang University of Science and Technology (POSTECH), Pohang, South Korea
²School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
³Dept. of Computer Science and Engineering, University of California River Side, CA, USA
{kurin, hwanjoyu}@postech.ac.kr¹, {kijungs, christos}@cs.cmu.edu², epapalex@cs.ucr.edu³

Abstract

Multi-aspect data appear frequently in many web-related applications. For example, product reviews are quadruplets of (user, product, keyword, timestamp). How can we analyze such web-scale multi-aspect data? Can we analyze them on an off-the-shelf workstation with limited amount of memory? Tucker decomposition has been widely used for discovering patterns in relationships among entities in multi-aspect data, which are naturally expressed as high-order tensors. However, existing algorithms for Tucker decomposition have limited scalability, and especially, fail to decompose high-order tensors since they explicitly materialize intermediate data, whose size rapidly grows as the order increases (≥ 4). We call this problem M-Bottleneck (“Materialization Bottle-neck”). To avoid M-Bottleneck, we propose S-HOT, a scalable high-order tucker decomposition method that employs the on-the-fly-computation to minimize the materialized intermediate data. Moreover, S-HOT is designed for handling disk-resident tensors, too large to fit in memory, without loading them all at once. We provide theoretical analysis on the amount of memory space and the number of scans of data required by S-HOT. In our experiments, S-HOT showed better scalability not only with the order but also with the dimensionality and the rank than baseline methods. In particular, S-HOT decomposed tensors 1000× larger in terms dimensionality than baseline methods. S-HOT also successfully analyzed a real-world tensors that are both large-scale and high-order on an off-the-shelf workstation with limited amount of memory, while baseline methods failed. For reproducibility, we make source code of S-HOT publicly available [link].

Overview

In this paper, we pointed out that baselines are not able to decompose a tensor with high order or high dimensionality, due to intermediate explosion problem. To resolve this problem, we proposed S-HOT and S-HOT_scan. The key idea of S-HOT for reducing the size of intermediate data is to fully utilize on-the-fly computation. As presented in Figure 1, S-HOT and S-HOT_scan perform Tucker decomposition only with the small fraction of reduced tensor, while baselines require to materialize (or store) all the entries of the reduced tensor, which is usually dense, in main memory.

Experiments

Competitors

Model	Description
BaselinNaive	A naive method computing the intermediate tensor in a straight-forward way. We used the implementation built in Tensor toolbox 2.6
BaselineOpt [Kolda et al.]	The state-of-the-art Memory Efficient Tucker decomposition which computes the intermediate tensor fiber by fiber. We used the implementation built in Tensor toolbox 2.6
S-HOT	Our proposed model. [Download]
S-HOT_scan	A variant of S-HOT. S-HOT_scan is designed to optimize I/O cost for scanning disk-resident tensor. [Download]

Results

For more detailed explanation, please refer to our paper

S-HOT shows better scalability for various factors: dimensionality, order of tensor, and rank

Figure 2. S-HOT decomposes 103× higher dimension tensor

Figure 3. S-HOT shows better order-scalability.

Figure 4. (c) S-HOT shows better rank-scalability.

These plots verify the excellence of S-HOT in terms of various factors: dimensionality, order of tensor, and rank. Specifically, in Figure 2, both of S-HOT and S-HOT_scan decompose a tensor with dimensionality higher than 10⁴ while baselines fail to decompose thosehigh tensor. Similarly, in Figure 3, both of proposed methods decpompose a tensor with high order (≥ 5) while baselines fail.

Download

Citing S-HOT

We encourage you to cite our paper if you have used our implementation in your work. You can use the following BibTex citation:

@inproceedings{Oh:2017:SHOT,
  author = {Jinoh Oh and Kijung Shin and Evangelos E. Papalexakis and Christos Faloutsos and Hwanjo Yu},
  title = {S-HOT: Scalable High-Order Tucker Decomposition},
  booktitle = {Proceedings of the 10th ACM International Conference on Web Search and Data Mining}, 
  series = {WSDM '17},
  year = {2017},
  location = {Cambridge, UK},
  doi = {10.1145/3018661.3018721},
  publisher = {ACM},
  address = {New York, NY, USA}
}

Our code is accesible at here [Github].