Position: Ph.D. Candidate

Current Institution: MIT

Abstract:
Blind Regression: Nonparametric Regression for Latent Variable Models via CollaborativeFiltering

We introduce the framework of blind regression motivated by matrix completion for recommendation systems: given n users, m movies, and a subset of user-movie ratings, the goal is to predict the unobserved user-movie ratings given the data, i.e., to complete the partially observed matrix. Matrix completion has been well analyzed under the low-rank assumption, in which matrix factorization based approaches have been proven to be statistically efficient, requiring only $rn \log n$ samples, where $r$ is the rank of the matrix. Unfortunately, the low-rank assumption may not hold in practice, as a simple nonlinear transformation of a low-rank matrix could easily produce an effectively high-rank matrix, despite few free model parameters. Following the framework of nonparametric statistics, we posit that user u and movie

Following the framework of nonparametric statistics, we posit that user u and movie i have features x_1(u) and x_2(i) respectively, and their corresponding rating y(u,i) is a noisy measurement of f(x_1(u), x_2(i)) for some unknown function f. Whereas the matrix factorization literature assumes a particular function f(a,b) = a^T b, we relax this condition to allow all Lipschitz functions. In contrast with classical regression, the features x = (x_1(u), x_2(i)) are not observed, making it challenging to apply standard regression methods to predict the unobserved ratings.

Inspired by the classical Taylor’s expansion for differentiable functions, we provide a prediction algorithm that is consistent for all Lipschitz functions. In fact, the analysis through our framework naturally leads to a variant of collaborative filtering, shedding insight into the widespread success of collaborative filtering in practice. Assuming each entry is sampled independently with probability at least \max(m^{-1/2+\delta},n^{-1+\delta}) with \delta > 0, we prove that the expected fraction of our estimates with error greater than \epsilon is less than \gamma^2 / \epsilon^2 plus a polynomially decaying term, where \gamma^2 is the variance of the additive entry-wise noise term. Experiments with the MovieLens and Netflix datasets suggest that our algorithm provides principled improvements over basic collaborative filtering and is competitive with matrix factorization methods.

We show that the algorithm and analysis naturally extend to higher order tensor completion under similar model assumptions. We can reducing tensor completion to matrix completion by flattening the dimensions and verifying that the required model assumptions still hold. We show that our simple and principled approach is competitive with respect to state-of-art Tensor completion algorithms when applied to image inpainting data. Our estimator is naively simple to implement, and its analysis sidesteps the complications of non-unique tensor decompositions. The ability to seamlessly extend beyond matrices to higher order tensors suggests the general applicability and value of the blind regression framework.

Bio:

I am a current PhD candidate at MIT in the Laboratory for Information and Decision Systems (LIDS) at Massachusetts Institute of Technology. I am advised by Professors Asuman Ozdaglar and Devavrat Shah in the Department of Electrical Engineering and Computer Science. Before coming to MIT, I received my B.S. in Computer Science from California Institute of Technology in 2011. I received a M.S. in Electrical Engineering and Computer Science from MIT in May 2013. I received the MIT Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowship in 2011, and the National Science Foundation Graduate Research Fellowship in 2013.

I have worked on sparse matrix methods, specifically how to exploit sparsity or graph properties to approximate a single component of the solution to a linear system, or the largest eigenvector of a stochastic matrix. In the context of Markov chains, we proposed and analyzed a truncated Monte Carlo method, in which the algorithm samples node-centric random walks. The algorithm exhibits a behavior in which it trades-off between estimation accuracy and time complexity, thresholding nodes with low stationary probability, and obtaining more accurate estimates for nodes with high stationary probability, while maintaining bounded number of random walk steps. In the context of solving linear systems, we provided an analysis showing that an asynchronous implementation of coordinate descent converges to an estimate for a single component of the solution, where the convergence rate is a function of the sparsity of the matrix in addition to the condition number.

Recently, I have been interested in building a mathematical framework for social data driven decisions. This includes both preference learning, which may involve regression techniques and latent variable modeling, as well as learning structural relationships in the data, which involves spectral graph theoretic methods. We developed a nonparametric regression framework to design algorithms for estimating latent variable models in the context of matrix completion. Our method and analysis provides theoretical justification for the popularly used heuristic of collaborative filtering, and leads to natural extensions to higher order tensor completion. We are continuing to explore how to reduce sample complexity to handle unique sparsity challenges that arise in online matching market contexts. I also hope to explore connections between recommendation systems and social effects that result from human user responses to the system. Through of both the inference as well as human decision aspect of this problem, I hope to build a more encompassing and systematic approach to designing recommendation systems.