Facebook's long-term roadmap is focused on building technologies in three areas: connectivity, artificial intelligence and virtual reality. Context, knowledge about the world, reasoning and predicting will add human-like intelligence to computers, enabling whole new ways for people to connect, Facebook says.
Facebook is conducting research to help drive advancements in AI disciplines like computer vision, language understanding and machine learning. This research is then used to build infrastructure that anyone at Facebook can use to build new products and services. Facebook is also applying AI to help solve longer-term challenges as the company pushes forward in the fields of connectivity and VR. And to accelerate the impact of AI, Facebook is tackling the furthest frontiers of research, such as teaching computers to learn like humans do - by observing the world.
As the field of AI advances, Facebook is turning the latest research breakthroughs into tools, platforms and infrastructure that make it possible for anyone at Facebook to use AI in the things they build. For example, FBLearner Flow, the backbone of AI-based product development at Facebook, makes AI available to everyone at Facebook for a wide variety of purposes. With the help of the platform, Facebook is now seeing twice as many AI experiments run per month at Facebook compared with six months ago.
In addition, the AutoML infrastructure allows Facebook engineers to optimize new AI models using existing AI. And Lumos, a new self-serve platform, allows Facebook teams to harness the power of computer vision for their products and services without the need for prior expertise.
AI is already improving Facebook products and services. According to Mike Schroepfer, Chief Technology Officer at Facebook, AI assists in automatically translating posts for friends who speak different languages, and in ranking News Feed to show people more relevant stories.
AI can also enable new tools for creativity and connection. Facebook has started working on style transfer, a technology that can learn the artistic style of a painting and then apply that style to every frame of a video. "This is a technically difficult trick to pull off, normally requiring that the video content be sent to data centers for the pixels to be analyzed and processed by AI running on big-compute servers. But the time required for data transfer and processing made for a slower experience. Not ideal for letting people share fun content in the moment," Schroepfer said.
Three months ago, Facebook shipped an AI-based style transfer running live, in real time, on mobile devices. The result is Caffe2Go, a new deep learning platform that can capture, analyze and process pixels in real time on a mobile device.
"We found that by condensing the size of the AI model used to process images and videos by 100x, we’re able to run deep neural networks with high efficiency on both iOS and Android. This is all happening in the palm of your hand, so you can apply styles to videos as you’re taking them," Schroepfer added.
Having an industrial-strength deep learning platform on mobile enables other possibilities too. Facebook can create gesture-based controls, where the computer can see where you’re pointing and activate different styles or commands. Facebok can recognize facial expressions and perform related actions, like putting a "yay" filter over your selfie when you smile.
In VR, image and video processing software powered by computer vision is helping to support hardware advances. Earlier this year Facebook announced a new stabilization technology for 360 videos, powered by computer vision. And computer vision software is enabling inside-out tracking to help usher in a whole new category of VR beyond PC and mobile, as Facebook announced at Oculus Connect 3 last month. This will help make it possible to build high-quality, standalone VR headsets that aren’t tethered to a PC.
Work on speech recognition is also helping Facebook create more-realistic avatars and new UI tools for VR. You can see an example from Facebook's social VR demo at Oculus Connect 3, when the avatars moved their lips in sync with the speaking voices. This helps to create a feeling of presence with other people in VR. To do this, Facebook built a custom library that maps speech signals into visemes (visual lip movement).
Speech recognition can also make it easier to interact with your environment in VR through hands-free voice commands. Facebook's Applied Machine Learning team is working with teams across Facebook to explore more applications for social VR and the Oculus platform.
AI technologies are also contributing to Facebook's connectivity projects, including aerial systems like Aquila and terrestrial systems like Terragraph. With computer vision tools Facebook can perform better analyses of potential deployment plans as the company explores different modes of connectivity technology. This has already helped Facebook map the world’s population density in much more accurate detail than ever before, giving a clearer picture of where specific connectivity technologies would be most effective. And now Facebook is applying computer vision to 3D city analysis to help plan deployments of millimeter wave technologies like Terragraph in dense urban areas. As wireless networks become denser with increasing bandwidth demand, this automated solution lets Facebook process more radio installation sites at a finer granularity. The system first detects possible installation sites for network equipment by separating poles from other aspects of the urban environment (trees, ground, and wires) using 3D city data. Then the AI algorithm performs line-of-sight analysis to identify radio propagation paths connecting nearby sites with clear line-of-sight. Finally, an optimization framework will use the data to automatically plan a network with optimal site and path selection to serve the bandwidth demand.
To continue accelerating the impact of AI, Facebook is also investing in long-term research.
Computers are quickly getting better at understanding a visual scene and identifying the objects within each frame. Even in the last few years, computer systems have advanced from basic image segmentation (drawing a box around the area where an object is located) to an ability to segment these objects more precisely and label them with information. Now, Facebook can even apply this to video to calculate human poses in real time.
With the ability to label objects, computers can generate captions about what’s happening in a photo. This is what helps Facebook describe photos to visually impaired people on Facebook today. But at the same time, the technology is still very early and it’s not perfect yet, according to Schroepfer.
And while computers can label objects more or less accurately, they still can’t take it one step further to understand the context surrounding the objects they see. Computers don’t have contextual understanding of the world. Some of Facebook's research has focused on giving computers this contextual understanding. To do this, Schroepfer needs to give computers a model by which they can understand the world. Computers should also remember multiple facts at once.
Facebook's team has trained computers with structured data and Memory Networks to enable simple reasoning. A few months ago, Facebook published research in which they trained computers to perform 19 out of 20 tasks correctly. And just last week, Facebook submitted a paper for academic review that presents a new type of system, Recurrent Entity Network, that can solve all 20 tasks.
The problem is, most data is not neatly structured in the real world. So to reason more like humans, computers must be able to pick relevant facts from an unstructured source, like a Wikipedia article, and apply those facts to answering a question.
"It’s early, but we’re working on this with research on key value memory networks," Schroepfer said.
Despite this progress, there’s a lot more work required to make computer systems truly intelligent. Prediction is one important component of intelligence that humans learn naturally but computers can’t yet do.
Facebook is coming up with methods that allow computers to learn by observing the world. Schroepfer said that this area of research is very early, and there’s a long way to go, but computers could eventually learn the ability to predict the future by observing, modeling and reasoning.