Say what? Duolingo points to data's important role in online education

By Hamish McKenzie , written on May 30, 2013

From The News Desk

When it comes to online education, much of the focus so far has been on online video lectures, peddled by the likes of Udacity, Coursera, and edx, and given the off-putting acronym “MOOC.” But one of the most important innovations from the movement could ultimately be the application of a data-driven approach in the lesson-planning process. And when it comes to that strategy, there’s no more interesting player to watch than Duolingo, a online language teacher that learns from you as you learn from it.

Now up to 3 million users and growing at a rate of 15,000 people a day – 75 percent of which come from outside of the US – Pittsburgh-based Duolingo has just taken an important step in expanding its scope even further: launching on Android.

Until yesterday, Duolingo was available only through a Web browser and as an iPhone app. Even on those platforms, Duolingo, which is totally free to use, has made significant inroads, finding its way into the top 10 education apps in many countries’ App Stores. But when the company looks back on its early years it may well find that the move to Android was the decisive one. Android, of course, is now the dominant mobile platform worldwide – at least in terms of number of users. But equally as important is that it is also comes on the most affordable devices, meaning Duolingo now has an opportunity to reach lower-income people who wouldn’t ordinarily be able to afford language lessons.

luis-von-ahn_200x300 Luis von Ahn

Duolingo was brought into the world by Luis von Ahn, a Carnegie Mellon computer science professor who was born in Guatemala. He was also the creator of CAPTCHA and its younger brother reCAPTCHA, which takes advantage of the user-dialogue system that verifies to Web services that you’re human and uses the power of crowdsourcing to make sense of disparate chunks of information, such as words from a book, or numbers on letterboxes. Google bought reCAPTCHA in 2009.

Duolingo relies on the same system to make its money. By drawing on the vast wealth of translation information, content owners can use the service to get large quantities of information translated into other languages quickly and cheaply. Duolingo is on the verge of confirming a deal with a large news agency to take care of the translations of its articles from English to Spanish, but von Ahn can’t yet disclose the name of the company. Duolingo has so far raised $18.3 million of funding over two rounds.

Data analysis is central to Duolingo’s success. Using the app, you work your way through a series of lessons in one of the six languages available. Each lesson is comprised of a series of questions, requiring you to type out translations, respond to voice prompts, identify which pictures relate to specific words or sentences, and select answers from a multichoice list. The iPhone app also has voice recognition, allowing you to test the way you speak the language. That functionality is not yet available on Android but is coming soon.

At each step along the way, Duolingo pays attention to which questions you struggle with, which ones you fly through, and what sorts of mistakes you make. It then aggregates that data with the vast swathes of other data is processes and learns from the patterns it sees. That information informs which questions it delivers to you, and at what times. In other words, it is constantly, dynamically tailoring your lessons so that you are being challenged in the most relevant ways. At a broader level, Duolingo also tests new features with sample groups of its users to see if they improve learning scores before rolling them out to a wider user set.

Since it launched its private beta in late 2011, Duolingo has found out a range of fascinating things about how we learn languages. For instance, von Ahn says, Italian women learn English better than do Italian men. Perhaps counterintuitively, men have proven better than women at learning food-related language, including all the cooking stuff, throughout the world. Women, meanwhile, are better at the sports material.

Duolingo has also found that when it comes to teach English to Spanish students, it’s better to teach the word “it” later in the course than other personal pronouns, such as “you,” “I,” “he,” and “she.” That’s because in Spanish, there is no “it” – only “he” and “she.” By delaying the introduction of “it,” the Spanish students stayed with the course for longer and learned better.

The efficacy of Duolingo’s data-driven approach has been backed up by a study that the company commissioned. Conducted by professors at City University of New York and the University of South Carolina, the study found that 34 hours on Duolingo was equivalent to the value of a first-year college semester, which takes in the order of 130+ hours. The same study found that Rosetta Stone users took between 55 and 60 hours to learn a similar amount.

“At the end of the day, you can either teach things because you have some sort of philosophy of how to teach things, or you can do things scientifically,” von Ahn says of what makes Duolingo’s approach different. He compares the change to how Google Ads made advertising efficacy measurable. What you care about is the outcomes, he says.

Von Ahn can’t apply a data-driven approach to his teaching at Carnegie Mellon because he doesn’t have enough data. Most of his classes have about 30 students, when you need at least 30,000 users before you start noticing any patterns. But his work in the classroom has informed his work on Duolingo. For example, the company decided to do quite a bit of hand-holding with its lessons. From his experience, von Ahn has found that most people are not motivated enough to take a self-directed approach to learning.

Ultimately, von Ahn expects data to take on an even more important role in the evolution of online education. “Online education may not take over offline, but it will certainly take over a fraction of it,” he says, adding that the movement is just getting started. “The first step was take what we know in the offline world and put them online, but clearly that cannot be the best possible way to do things online.”