DL Explained – How Netflix uses AI to Predict Your Next Series Binge – 2020

Wait, how did Netflix know I wanted to watch that? Spooky… right? Well, not exactly. Through the use of Machine Learning, Collaborative Filtering, NLP and more, Netflix uses AI by undertaking a 5 step process to not only enhance UX, but to create a tailored and personalised platform to maximise engagement, retention and enjoyment.

  • The five current key development stages which include ML at Netflix: Ranking & Layout > Similarity & Promotion > Evidence & Search > Model Improvement > Exploit Learning
  • Key Challenges – Understanding user intent, personal preference at any given time, correlation between users across different accounts
  • Key application/techniques mentioned: ML, RL, Collaborative filtering, NLP
  • Included > Intro to ML, 5 key areas currently focussed on, Q&A with AI Researcher, Videos of Netflix presentations.

In the last decade, learning algorithms and models at Netflix have evolved with multiple layers, multiple stages and nonlinearities. This has developed to the stage at which they now use machine learning and deep variants to rank large catalogues of content by determining the relevance of each of their titles to each user, creating a personalized content strategy. Not only is the content customized, it is then also ranked from most to least likely to be watched. This process also includes the selection of images for each to best depict the title to your preference. This process is explained below from the very initial uses of ML, to its use for ranking layout & the improvement of model accuracy at Netflix.  

Machine Learning

In regard to Machine Learning there are two key areas to focus on:

  1. Collect massive data sets
  2. Try billions of hypotheses to find* one(s) with support

An observation of the weather over many days using Binary (0,1) where 0 is yes it will rain and 1 is no it won’t rain. Similarly, we can view cloudy or sunny skies as another binary variable.

While we can learn the probability distribution from observations when they contain just a few variables, when there are thousands of variables, it is much harder to describe the probability of them jointly. We keep things tractable by limiting interactions between variables through a network. This gives better computational and statistical efficiency.

How this applies to storytelling

Storytelling goes back thousands of years, to the beginning of humanity. When this is applied digitally, the story teller doesn’t know if the listener is engaged (e.g. laughing, frowning, etc.). Netflix acts as the story teller to its customers (the listeners). At Netflix, user engagement is judged through observation of ‘interaction’ data, studying the triggers which see an individual fast-forward, exit a title, browse the interface (search v scroll) and many accompanying variables. All this data is added to a ML model to create somewhat of a clearer picture per user.

Machine Learning at Netflix

There are 5 key areas Netflix focusses on:

Step One: Ranking & Layout

The entire catalogue of movies and shows at Netflix is ranked and ordered for each user in a personalized manner (you can blame your flatmate for messing up your algorithms). Through prolonged use, Netflix can work out what a customers’ favourite shows are based on their activity.  If Customer X has watched a few comedies (understandable in times like these), it can be presumed that they have an interest in comedy films/shows. Therefore, comedies would hold a higher ranking score than thriller films for example. Sounds simple right? Keep reading…

On a basic level, the recommender system learns from your account which type of series or movie you’re likely to be interested in based on your previous history, and suggests the most relevant titles.

A simplified layout of complicated algorithms

Step Two: Similarity & Promotion

Once they have found your favourites, the data is then used to find similarities across the platform for content suggestion. The similarity in regard to plot line, actor/actresses, age restriction are all taken into account.  

Through testing, correlations can then be drawn between people’s interests, watch history, etc. The results of these tests give evidence as to what is working and what isn’t. Better search and acquisition of new movies to encourage people to sign up is a machine learning problem. One of the methods used for this is ‘Collaborative filtering’, an ML technique whereby you try to group similar users together and then extrapolate from their consumption patterns to recommend relevant and highly personalized movie and TV shows to members with similar taste. Finding similar users is a hard problem and one which many Netflix Data Scientists spend their days trying to work out.

Step Four: Improve Models

The first stage of model improvement is the aforementioned data collection period, lasting several months to build up a large amount of good quality data. Then, A/B testing is carried out to say whether this new model is better than the current model. At this time, half the users get the new model and half the users get the old model and the results are analysed to decide which model gets rolled out.

There are many problems with batch learning methods. In the time in which it takes to build an understanding of UX on the new platform, the customer may have many months of what they deem to be a worse experience during testing.

Step Five: Explore / Exploit Learning

For explore / exploit learning, Netflix then sample a large number of hypothesises and suppress the ones that aren’t doing as well as others.

  1. Uniform population hypotheses
  2. Choose a random hypothesis h
  3. Act according to h and observe outcome
  4. Re-weight hypotheses
  5. Go to 2

Netflix then use explore/exploit learning to find which pictures best describe movies; therefore, Netflix modifies the imagery that represents the movie to suit each customer. To be successful with this, Netflix run tests to see which images are better for each movie and how other factors such as a customers’ genre preference affect their choices.

The Answers to your questions

Answers – Anoop Deoras, ML Research Scientist, Netflix

What are the main challenges in personalising users’ home screen?

We aim to maximize joy of our Netflix member while the member is engaged with our service. We believe that by providing relevant and personalized recommendations to the user, we can aim towards maximizing this joy. A recommender system which is unaware of the user state (user’s context, user’s intent etc), can do personalization only to a certain degree. Thus, the main challenges in personalization is to get the user’s state correct and modelled well.

How are you using a combination of NLP, DL and recommender systems to provide the optimum user experience?

NLP and RecSys are applications of machine learning and more often than not, the underlying machine learning methodologies turn out to be similar. When I was at Microsoft working on virtual personal assistants, our main priority was to get the user intent correct and be good at proactive recommendations (showing you traffic conditions before your flight, re-arranging the tickers in your stock app based on which stocks you like to see most when you wake up etc..). At Netflix too, we strive to get our member’s intent correct so that the time it takes to play something the member truly likes is minimized. Intent detection then becomes the common machine learning problem, applicable to both NLP and RecSys. There are several other commonalities between NLP and RecSys and we tend to borrow ideas from one field and apply it to other. Matrix Factorization, which is a very standard ML model for RecSys, was recently used to learn word embeddings in an NLP application. Deep Learning (DL), on the other hand, is not an application of ML, unlike NLP and RecSys, but it is a ML tool in itself. You can apply DL to do intent detection, for instance.

How are Netflix using AI for a positive impact?

Recommendation systems are means to an end. We try to build models for recommendations so that we can maximize Netflix member’s enjoyment of the selected item while minimizing the time it takes to find them. Enjoyment integrated over time i.e. goodness of the item and the length of view, interaction cost integrated over time i.e. time it takes the member to find something to play, are some of the factors we consider while building our ML/AI models for a positive impact on our 100M+ members.

What developments of AI are you most excited for, and which industries do you think will be most impacted?

I am most excited for Reinforcement Learning (RL), a sub field of AI. Applications of RL are abound, but I feel the biggest impact RL will have will not just be in Robotics but also in human computer interaction (dialogue modeling). Human computer interaction will be everywhere, from talking to phones to get suggestions on restaurants or local interests, to talking to machines to get assistance on some medical question. If you have not already, I would recommend ‘Alpha Go’ movie, which is available to stream on Netflix worldwide, in which RL experts from Google DeepMind train a ML model using RL to win the most complex game of Go against the world champion, Lee Seedol.

AI and machine learning raises many ethical concerns such as bias, security & privacy amongst others. What are your opinions on this and what can be done to avoid biased machines?

AI and ML are function of the data that we feed into training them. A lot of times, data used to train ML models has inherent biases and those biases are reflected in the model predictions and AI behaviour. It is therefore very important for a ML practitioner to be cognizant of the fact that such biases exists in the data and that it is necessary to ensure such biases are addressed, through either explicit de-biasing, data stratification, model adaptations or a combination of them. At Netflix, bias manifests in the form of ‘feedback loops’ i.e. impression biases inflating popularity of certain item (movie / TV show). Production biases in ML models can cause these feedback loops to be reinforced by the recommendation system. We do research and development in the causal recommendation space specifically to get our recommender models out of this feedback loop caused due to heavy production bias.

Check out my BlockDelta profile for additional articles.


Please enter your comment!
Please enter your name here