For many years, the main goal of Netflix’s personal recommendation system has been to get the right titles in front of each of its members at the right time. It is crucial to recommend a catalog spanning thousands of titles and various member figures in a hundred member account, each of which is appropriate for each member. But the recommendation does not end there.
Why should we care about any of the special titles we recommend? What can we say about a new and unfamiliar title that will pique your interest? How do we assure you that the title is worth seeing? The answers to these questions are important in helping our members find great content, especially for unfamiliar titles. One avenue to address this challenge is to consider the artwork or imagery we use to portray the titles.
If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual “evidence” for why the title might be good for you. The artwork may highlight an actor that you recognize, capture an exciting moment like a car chase, or contain a dramatic scene that conveys the essence of a movie or TV show. If we put that whole image on your homepage (and they say: an image is worth a thousand words), then, maybe, maybe you should give it a try.
Netflix Conductor–microservices orchestrator
we discussed an effort to find the single perfect artwork for each title across all our members. Through multi-armed bandit algorithms, we hunted for the best artwork for a title, say Stranger Things, that would earn the most plays from the largest fraction of our members. However, given the enormous diversity in taste and preferences, wouldn’t it be better if we could find the best artwork for each of our members to highlight the aspects of a title that are specifically relevant to them?
Let us consider trying to personalize the image we use to depict the movie Good Will Hunting. Here we might personalize this decision based on how much a member prefers different genres and themes. Someone who has watched many romantic movies may be interested in Good Will Hunting if we show the artwork containing Matt Damon and Minnie Driver, whereas, a member who has watched many comedies might be drawn to the movie if we use the artwork containing Robin Williams, a well-known comedian.
In the second scenario, let’s imagine how different choices for cast members can influence the personalization of artwork. A member who watches many movies featuring Uma Thurman will likely respond positively to the artwork for Pulp Fiction involving Uma.
Contextual bandits approach
Most Netflix recommendation engines are powered by machine learning algorithms. Traditionally, we collect batches of data on how our members use the service. Then we run a new machine learning algorithm on this batch of data. Next, we test this new algorithm by testing A / B against the current production system.
Trying out a random subset of A / B test members helps us see if the new algorithm. Which is better than our current production system. Members in Group A get experience with the current product while members in Group B get a new algorithm. If members of Group B have more affiliation with Netflix, then we roll out the new algorithm into the entire member population.
In this learning nonlinear learning setting, we train our reference bandit model for each context. We usually have a few dozen candidate artwork images per title. To find out the model of choice, we can consider simplifying the problem by giving the images a title for the member independently.
Once the model is trained as above, we use it to rank the images for each context. The model predicts the probability of play for a given image in a given a member context. We sort a candidate set of images by these probabilities and pick the one with the highest probability. That is the image we present to that particular member.
After experimenting with many different models offline and finding ones that had a substantial increase in a replay. We ultimately ran an A/B test to compare the most promising personalized contextual bandits against un-personalized bandits. As we suspected, the personalization worked and generated a significant lift in our core metrics. We also saw a reasonable correlation between what we measured offline and what we saw online in the models. The online results also produced some interesting insights. For example, the improvement of personalization was larger in cases where the member had no prior interaction with the title.