Three Guys in a Garage Are Turning Your Eyes Into Powerful Remote Controls
Forget voice controls as the future of TV. Instead, it will be all about your eyes. The technology already exists, and it was built by three young men in a garage. In their spare time.
PredictGaze is software that turns the camera on ordinary computing devices into a sophisticated eye-tracking technology. Reading a book on your iPad and want to turn the page? Just look down at the bottom right-hand corner. Watching a smart TV and want to see what’s on the other channel? Direct your gaze at a corner of the screen. Reading a long article on your iPhone? The story will scroll down as your eye moves down the page. You can even play "Pong" using only your eyes to zip your paddle back and forward.
None of this requires any special equipment, and it works in changing light conditions, from a distance of 12 feet, and on shaky handheld devices that are constantly on the move.
But that’s not the end of it. PredictGaze also has facial recognition technology that beats anything on the market. PredictGaze will recognize your face even if you are wearing sunglasses in a room that is completely dark. And it will do so on any device that has a VGA camera.
You can see all this demonstrated in the videos below. The videos look very amateur, and that’s because they are. Not only is the PredictGaze team bootstrapping the startup, but it is building it at nights and in weekends, when the guys aren’t at their tech-company day jobs.
PredictGaze's Santa Clara headquarters
Aakash Jain, Abhilekh Agarwal, and Saurav Kumar met a couple of years ago at Cornell University. Kumar and Agarwal were studying towards a master’s degree in computer science, and Jain was doing a master’s in electrical and computer engineering. They started off as competitors in the Yahoo Hack University competition in 2010 but then joined forces for other hackathons. They loved to brainstorm ideas and realized they shared an entrepreneurial spirit. It’s just as well, because now they all live together in a house in Santa Clara, where they’ve converted the small garage space into a makeshift office.
The three men, who are all in their mid-20s, have to maintain full-time jobs because they’re all from India and in the US on non-immigrant work visas (the infamous H1-B). That means they work most of their waking hours, sleeping only between 5am and 9am each day, according to Kumar. Once or twice a month, they will sleep for a 24-hour stretch to replenish themselves.
Since setting up shop in the valley, the guys have added Ketan Banjara, an ex-Yahoo executive, in a business development role. Thanks to him, they’ve been talking to some major electronics, hardware, and software companies, and will soon make an announcement about a big partnership. Aside from a couple of fleeting mentions on blogs, they are totally unknown.
PredictGaze's entire staff
Kumar, the CEO and ideas guy, says he knows PredictGaze might be criticized for spreading its technology too thin across a broad range of products. Many startups focus on only one problem to solve. But PredictGaze is not a collection of products – it’s actually a robust, unified architecture. Because of its weighted algorithms and machine learning, it has immense range. “It’s not us,” says Kumar, with a sheepish smile over Skype. “Our architecture is independent. You give it something, it learns about it.”
Even the face-recognition technology has a flexible range of uses. One company is interested in using it to gauge consumer interest in retail stores. Using normal cameras and PredictGaze, retailers will be able tell where shoppers are looking on shelves, whether or not they’re smiling while doing so, and to which demographic they belong. And yes, that does sound very “Brave New World” scary. One way the startup hopes to alleviate concerns about privacy, however, is by processing all the images locally and in real-time. In other words, the only information that gets sent to a server are the analytics, not the images. As Kumar suggests: “That puts us in a very great place.”
The technology also recognizes gestures, so if you want to mute your music while taking a call, you can look at your device’s camera and raise your finger to your lips in a “shush” gesture. If you’re watching a movie on TV and you walk out of the room, the film will pause until you re-enter.
PredictGaze is not alone in facial recognition, eye-tracking, or gesture controls. Tobii has had eye-tracking technology for years, although it relies on special hardware, as demonstrated by its recently announced concept tablet. Google and Facebook have facial recognition technologies, but they're ineffective in bad light conditions. The Y Combinator-backed Flutter, meanwhile, has software that lets you start and stop music that’s playing on iTunes or Spotify by using hand movements. And then, of course, there’s Leap, which brings amazing “Minority Report”-like powers to gesture controls. That also requires its own hardware, though, and is so far focused on desktop computers.
Kumar says PredictGaze applies a hierarchical approach to its architecture, which consists of dozens of weighted algorithms that work independently. When it comes to facial recognition, that allows the technology to identify faces in various states and conditions. Thanks to machine learning, PredictGaze can thus quickly determine which algorithm to place more trust in for each variable, whether it be low light, motion, or what someone’s eyes look like. Even if someone turns up in front of the camera unexpectedly wearing an eye patch, the system can determine who that person is by relying on symmetry to predict what the other side of the face would otherwise look like. And calibration doesn’t matter – you don’t have to hold your head in a particular position for the machine to know who you are.
These are the very earliest of days for PredictGaze, but you can expect big things to emerge from that garage in Santa Clara. But first the startup has some other problems to solve. The team is seeking funding to take the company to the next level – and to solve that visa issue.
Demo videosGaze-tracking on TV
Gaze-tracking within browser
Face recognition in the dark
Pausing the TV by walking out of the room
Muting music with a "shush" gesture
[Lead image from Shutterstock.com]