[liveblog][PAIR] Maya Gupta on controlling machine learning
At the PAIR symposium. Maya Gupta runs Glass Box at Google, which looks at black box issues. She is talking about how we can control machine learning to do what we want
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people. |
The core idea of machine learning are its role models, i.e., its training data. That’s the best way to control machine learning. She’s going to address by looking at the goals of controlling machine learning.
A simple example of monotinicity. Let’s say we’re tring to recommend nearby coffee shops. So we use data about the happiness of customers and distance from the shop. We can fit the model ot a linear model. Or we can fit it to a curve, which works better for nearby shops but goes wrong for distant shops. That’s fine for Tokyo but terrible for Montana because it’ll be sending people many miles away. A montonic example says we don’t want to do that. This controls ML to make it more useful. Conclusion: the best ML has the right examples and the right kinds of flexibility. [Hard to blog this without her graphics. Sorry.] See “Deep Lattice Networks for Learning Partial Monotonic Models,” NIPS 2017; it will soon by on the TensorFlow site.
“The best way to do things for practitioners is to work next to them”The best way to do things for practitioners is to work next to them.
A fairness goal: e.g., we want to make sure that accuracy in India is the same as accuracy in the US. So, add a constraint that says what accuracy levels we want. Math lets us do that.
Another fairness goal: the rate of positive classifications should be the same in India as in the US, e.g., rate of students being accepted to a college. In one example, there is an accuracy trade-off in order to get fairness. Her attitude: Just tell us what you want and we’ll do it
Fairness isn’t always relative. E.g., E.g., minimize classification errors differently for different regions. You can’t always get what you want, but you sometimes can or can get close. [paraphrase!] See fatml.org
It can be hard to state what we want, but we can look at examples. E.g., someone hand-labels 100 examples. That’s not enough as training date, but we can train the system so that it classifies those 100 at something like 95% accuracy.
Sometimes you want to improve an existing ML system. You don’t want to retrain because you like the old results. So, you can add in a constraint such as keep the differences from the original classifications to less than 2%.
You can put all of the above together. See “Satisfying Real-World Goals with Dataset Constraints,” NIPS, 2016. Look for tools coming to TensorFlow.
Some caveats about this approach.
First, to get results that are the same for men and women, the data needs to come with labels. But sometimes there are privacy issues about that. “Can we make these fairness goals work without labels? ”Can we make these fairness goals work without labels? Research so far says the answer is messy. E.g., if we make ML more fair for gender (because you have gender labels), it may also make it fairer for race.
Second, this approach relies on categories, but individuals don’t always fit into categories. But, usually if you get things right on categories, it usually works out well in the blended examples.
Maya is an optimist about ML. “But we need more work on the steering wheel.” We’re not always sure we want to go with this technology. And we need more human-usable controls.