October 11, 2016
[liveblog] PAPIs: Cynthia Rudin on Regulating Greed
I’m at the PAPIs (Predictive Applications and APIS) [twitter: papistotio] conference at the NERD Center in Cambridge.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
The first speaker is Cynthia Rudin, Director of the Prediction Analysis Lab at MIT. Her topic is “Regulating Greed over Time: An Important Lesson for Practical Recommender Systems.” It’s about her Lab’s entry in a data mining competition. (The entry did not win.) The competition was to design a better algorithm for Yahoo’s recommendation of articles. To create an unbiased data set they showed people random articles for two weeks. Your algorithm had to choose to show one of the pool of articles to a user. To evaluate a recommender system, they’d check if your algorithm recommended the same thing that was shown to the user. If the user clicked on it, you could get an evaluation. [I don’t think I got this right.] If so, you sent your algorithm to Yahoo, and they evaluated its clickthrough rate; you never got access to Yahoo’s data.
This is, she says, a form of the multi-arm bandit problem: one arm is better (more likely to lead to a pay out) but you don’t know which one. So you spend your time figuring out which arm is the best, and then you only pull that one. Yahoo and Microsoft are among the companies using multi-arm bandit systems for recommendation systems. “They’re a great alternative to massive A-B testing
] [No, I don’t understand this. Not Cynthia’s fault!.].
Because the team didn’t have access to Yahoo’s data, they couldn’t tune their algorithms to it. Nevertheless, they achieved a 9% clickthrough rate … and still lost (albeit by a tiny margin). Cynthia explains how they increased the efficiency of their algorithms, but it’s math so I can only here play the sound of a muted trumpet. But it involves “decay exploration on the old articles,” and a “peak grabber”: If any articles gets more than 9 clicks out of the last 100 times they show the article, and they keep displaying it: if you have a good article, grab it. The dynamic version of a Peak Grabber had them continuing to showing a peak article if it had a clickthrough rate 14% above the global clickthrough rate.
“We were adjusting the exploration-exploitation tradeoff based on trends.” Is this a phenomenon worth exploring?The phenomenon: you shouldn’t always explore. There are times when you should just stop and exploit the flowers.
Some data supports this. E.g., in England, on Boxing Day you should be done exploring and just put your best prices on things — not too high, not too low. When the clicks on your site are low, you should be exploring. When high, maybe not. “Life has patterns.” The Multiarm Bandit techniques don’t know about these patterns.
Her group came up with a formal way of putting this. At each time there is a known reward multiplier: G(t). G is like the number of people in the store. When G is high, you want to exploit, not explore. In the lower zones you want to balance exploration and exploitation.
So they created two theorems, each leading to an algorithm. [She shows the algorithm. I can’t type in math notation that fast..]