Mobile is Eating the World, Adaptive Attention via a Visual Sentinel, and Google’s Great AI Awakening
Yitaek HwangYitaek Hwang
The following presentation is from Benedict Evans of Andreessen Horowitz via his website: “Mobile is Eating the World”
Much has been written and talked about how autonomous cars will transform the future. But changes in eCommerce and retail seem more immediate and certain. The use of data in the 1990s shaped better supply chain and logistics. The 2000s was marked by data-driven advertising. Now with machine learning, scaling curation will be possible. Also with frictionless computing, like Amazon’s dash button, many intermediaries will be cut out and fresh purchasing journeys will influence new kinds of purchasing decisions.
The intersection of computer vision and natural language processing has brought us the technology to automatically tag and generate captions for images. This technology can not only help visually impaired users (see OrCam or Aipoly), but also help sort through unstructured images (Google, Facebook photo search). One of the papers from NIPS 2016 that garnered much attention is from Salesforce Research — Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. Here we see efforts to take advantage of our understanding of language for better captioning rather than just sticking a RNN to CNN to merge the two networks.
*** For other highlights, check out Andreas Stuhlmüller’s “50 things I learned at NIPS 2016”
The authors show that the model successfully “learns” when to attend to the image or the visual sentinel. The model tested on the COCO image captioning dataset and Flickr30K show significant improvements to the existing models. Salesforce Research team’s success is encouraging as more work will be done in interdisciplinary research problems both in academia and in industry.
The most important thing happening in Silicon Valley right now is not disruption. Rather, it’s institution-building — and the consolidation of power — on a scale and at a pace that are both probably unprecedented in human history. Google Brain has interns; it has residents; it has “ninja” classes to train people in other departments.
But even enormous institutions like Google will be subject to this wave of automation; once machines can learn from human speech, even the comfortable job of the programmer is threatened…The kinds of jobs taken by automatons will no longer be just repetitive tasks that were once — unfairly, it ought to be emphasized — associated with the supposed lower intelligence of the uneducated classes. We’re not only talking about three and a half million truck drivers who may soon lack careers. We’re talking about inventory managers, economists, financial advisers, real estate agents. What Brain did over nine months is just one example of how quickly a small group at a large company can automate a task nobody ever would have associated with machines.
“The Great A.I. Awakening” by Gideon Lewis-Kraus on the NYT has been featured by all sorts of A.I. and data science blogs this week. The article creates a fascinating narrative about how Google’s CEO, Sundar Pinchai, turned Google from a mobile-first to an A.I.-first company, through the development of Google Translate. It is a long read, but Lewis-Kraus does an excellent job laying down the history of neural networks from Google Brain’s perspective. If you are interested in how Jeff Dean, Andrew Ng, Geoffrey Hinton, and Quoc Le came to apply deep learning onto translation tasks, give it a full read.
But the most interesting part of the article doesn’t come until the epilogue: Machines Without Ghosts. Here Lewis-Kraus introduces Berkeley philosopher John Searle’s thought that “there is something special about human ‘insight,’ [that] you can draw a clear line that separates the human from the automated.” Famous linguist and cognitive scientist, Noam Chomsky, also belittled the advances in A.I., saying that “the whole enterprise [is a] mere statistical prediction, a glorified weather forecast, [unable to] reveal [something] profound about the underlying nature of language.”
Then, Lewis-Kraus asks an acute question for us all. Even if a machine can’t tell us whether a “pronoun took the dative or the accusative case,” it is already detecting tumors in medical scans better than human radiologists, scanning through legal documents faster than credentialed lawyers, and automating almost every task you can imagine. Skeptics might point to the fact that machines trained to do complex pattern matching, cannot reason or perform logical analysis like humans. But do humans really have logical reasoning? Radiologists don’t really tell you what caused the cancer either; they’re just telling you it’s there based on patterns they’ve seen in the past. How is that fundamentally different than what machines are doing now?
The social implication of all this is that all jobs — not just the blue-collar jobs associated with repetitive tasks — can be easily replaced. No, this isn’t a Luddite’s alarmist view; it’s what Google taught us is possible. Last week’s discussion on Amazon Go by Hannah White sparked a good conversation. Whether you like it or not, it’s a conversation we will be having for years to come.
New Podcast Episode
Recent Articles