Wednesday, January 8, 2014

Netflix and Its People Problem

Felix Salmon recently wrote an excellent piece detailing some current problems with the Netflix model. Studios have all the bargaining power, anytime they see big profit numbers generated by streaming providers they can just demand higher prices for their content. This is why you see Netflix and HBO running towards creating their own content to try and escape this intensifying bidding war.

As a result of these bidding sprees, Netflix has begun to lose out on content quality. To rectify this, Netflix’s recommendation algorithms has had to get more sophisticated to try and determine preference patterns amongst a landscape devoid of quality. Without high quality content, Netflix now has to grope around a dark room of content, using touch and feel in lieu of the more accurate vision, leading to bumps, bruises and constantly recommending Iron Man 2.

This approach runs into two seemingly related problems. First, as my previous post eluded to, individuals do not have innate preferences for many goods and experiences. Let’s say someone described everything there was to know about ice cream, from the sweet sensation of the cream as it melts on your tongue, to the molecular structure of cream, sugar and ice particles. Would you then be able to predict whether you would like it or dislike it? Well I like cold things, like snow, but ice cream isn’t really snow. No I like milk a lot, but what about that sugar and those flavorings. Would your enjoyment of the individual parts of ice cream guarantee that you were going to love ice cream? This is what Netflix’s new recommendation system is betting on, I can tease apart differential aspects of movies and triangulate stable preferences. Unfortunately, very subtle things can change experiences of events.

A now famous study by Dan Ariely highlights the malleability of experience. He opened a class with a brief reading of a Walt Whitman poem, and told students he would be doing a few short poetry readings one evening. The class was then split into two groups. The first group was told the cost of the show would be $10, and asked if they would accept that and what they would be willing to pay to see the show. The second group was told that they would be paid $10 to see the show, and then asked the same set of questions. The first group said they would be willing to pay $1 to $5 dollars versus the second group who said they would be willing to go if receiving $1 to $4 dollars. Note that the group that had been asked if they would pay $10 could have asked to be paid to go to the show, but they did not. Preferences for goods, especially experiential goods, are highly context dependent.

I’m not denying there exists certain dispositional tastes. I habitually watch sci fi movies. I have seen more outer space prison break movies than movies made before 1960. But that is a whole different ballgame than presuming it is something as specific as "Foreign Satanic Stories from the 1980s." I like many romantic comedies from the late 80s and early 90s, but to say this links Say Anything and Pretty Woman seems a bit of a stretch. 

This strategy runs into a second problem, overfitting. When sampling from complex, feedback driven systems the model used must be very robust to future deviations. Gerd Gigenrenzer gave a great talk on the robustness of simple models. The two graphs shown below show a problem with this phenomenon.

The first graph shows two different models fitted to yearly temperature data. One model is a 12th degree polynomial, whereas the other is a 3rd degree polynomial. As can be seen the 12th degree polynomial has lower error, it fits the temperature data better. However, as seen from the second graph, the predictive ability of the 12th degree polynomial is much worse. The more one attempts to boil complex systems to singular, large algorithms the more problems one runs into out-of-sampling.

Now it seems I might be painting myself into a corner. I have both stated that simple models predict complex systems, while simultaneously acknowledging that simple stimulus-response systems such as Netflix are insufficient in the face of this complexity.  What I am trying to show is that there is no one algorithm to rule them all. How can simple models be used in the human realm? Utilizing our already existing pattern recognition devices, other people’s brains. By supplanting these algorithms with expert human judgment, we are able to see increased success. Salmon writes another article on exactly this subject:
Nate Silver himself has written thoughtfully about examples of this in his book, The Signal and the Noise. He cites baseball, which in the post-Moneyball era adopted a “fusion approach” that leans on both statistics and scouting. Silver credits it with delivering the Boston Red Sox’s first World Series title in 86 years. Or consider weather forecasting: The National Weather Service employs meteorologists who, understanding the dynamics of weather systems, can improve forecasts by as much as 25 percent compared with computers alone. A similar synthesis holds in eco­nomic forecasting: Adding human judgment to statistical methods makes results roughly 15 percent more accurate. And it’s even true in chess: While the best computers can now easily beat the best humans, they can in turn be beaten by humans aided by computers.

He gives short shrift to where I believe this might be most ground-breaking, in the experiential good market. There is already a model for supplanting algorithmic results with human interaction, and that is “A BetterQueue.” It’s a very simple website that links Rotten Tomatoes and Netflix. The key here is that a person can pre-select categories and rating criteria that will pass through their filter. We have not only human reviewers as a filtering scheme, but also the person who will be experiencing the good itself. I think this is why online dating sites are so successful, as well. For all the hype of the algorithms used to generate potential matches, these systems have that ultimate backstop in that people will generally have to talk to one another before any real commitment is made. So while an algorithm gets people to the door, it is a person who is tasked with figuring out if this is the right one for them. These are types of systems I believe will be key to these markets developing in the future. Ignoring how human judgment and relationships helps to create a good fit for individuals will mean these systems will continue to be sub-optimal recommendation schemes.