Recommender Systems

Digital streaming services like Netflix and Amazon use data about the content that a customer has viewed in the past, as well as data from other customers, to suggest other content for the customer. As a concrete example, some years back, Netflix had customers rate each movie that they had seen with a score from 1–5. This resulted in a very big n × p matrix for which the ( i, j ) element is the rating given by the i th customer to the

12.3 Missing Values and Matrix Completion 519

Figure 12.6

FIGURE 12.6. As described in the text, in each of 100 trials, we left out 20 elements of the USArrests dataset. In each trial, we applied Algorithm 12.1 with M = 1 to impute the missing elements and compute the principal components. Left: For each of the 50 states, the imputed first principal component scores (averaged over 100 trials, and displayed with a standard deviation bar) are plotted against the first principal component scores computed using all the data. Right: The imputed principal component loadings (averaged over 100 trials, and displayed with a standard deviation bar) are plotted against the true principal component loadings.

j th movie. One specific early example of this matrix had n = 480 , 189 customers and p = 17 , 770 movies. However, on average each customer had seen around 200 movies, so 99% of the matrix had missing elements. Table 12.2 illustrates the setup.

In order to suggest a movie that a particular customer might like, Netflix needed a way to impute the missing values of this data matrix. The key idea is as follows: the set of movies that the i th customer has seen will overlap with those that other customers have seen. Furthermore, some of those other customers will have similar movie preferences to the i th customer. Thus, it should be possible to use similar customers’ ratings of movies that the i th customer has not seen to predict whether the i th customer will like those movies.

More concretely, by applying Algorithm 12.1, we can predict the i th cusˆ tomer’s rating for the j th movie using xij =[�] [M] m =1 [a][ˆ] [im][ˆ] [b][jm][.][Furthermore,] we can interpret the M components in terms of “cliques” and “genres”:

  • ˆ

  • aim represents the strength with which the i th user belongs to the m th clique, where a clique is a group of customers that enjoys movies of the m th genre;

  • [ˆ] bjm represents the strength with which the j th movie belongs to the m th genre .

Examples of genres include Romance, Western, and Action.

Principal component models similar to Algorithm 12.1 are at the heart of many recommender systems. Although the data matrices involved are

520 12. Unsupervised Learning

Jerry Maguire
Oceans
Road to Perdition
A Fortunate Man
Catch Me If You Can
Driving Miss Daisy
The Two Popes
The Laundromat
Code 8
The Social Network
· · ·
Customer 1




4





· · ·
Customer 2


3



3


3
· · ·
Customer 3

2

4




2

· · ·
Customer 4
3









· · ·
Customer 5
5
1


4





· · ·
Customer 6





2
4



· · ·
Customer 7


5




3


· · ·
Customer 8










· · ·
Customer 9
3



5


1


· · ·











Jerry Maguire
Oceans
Road to Perdition
A Fortunate Man
Catch Me If You Can
Driving Miss Daisy
The Two Popes
The Laundromat
Code 8
The Social Network
· · ·
Customer 1




4





· · ·
Customer 2


3



3


3
· · ·
Customer 3

2

4




2

· · ·
Customer 4
3









· · ·
Customer 5
5
1


4





· · ·
Customer 6





2
4



· · ·
Customer 7


5




3


· · ·
Customer 8










· · ·
Customer 9
3



5


1


· · ·











Customer 1
Customer 2
Customer 3
Customer 4
Customer 5
Customer 6
Customer 7
Customer 8
Customer 9




4





· · ·


3



3


3
· · ·

2

4




2

· · ·
3









· · ·
5
1


4





· · ·





2
4



· · ·


5




3


· · ·










· · ·
3



5


1


· · ·










TABLE 12.2. Excerpt of the Netflix movie rating data. The movies are rated from 1 (worst) to 5 (best). The symbol • represents a missing value: a movie that was not rated by the corresponding customer.

typically massive, algorithms have been developed that can exploit the high level of missingness in order to perform efficient computations.

서브목차