Pando

Apple’s Next Act

By Adam L. Penenberg , written on August 29, 2012

From The News Desk

Hollywood blockbusters are, often predictably, divided into three acts. So are Harlequin romances. "E! True Hollywood Stories" are structured in three acts (with commercials ingeniously inserted just before the big reveals). To Americans and other Western World denizens, stories that unfold in three parts, with a clear beginning, middle and an end, are as integral to our cultural narrative as nouns, verbs, and adjectives are to grammar.

A priori we expect a story to be conveyed in three acts in the same way a joke needs a punch line — otherwise it isn’t a joke, at least in the classic sense. Part of the reason may be biological. We humans can instantly count four objects in front of us but anything more requires us to think — and that takes effort since it involves our prefrontal cortex. But with concepts, colors, and fonts, three seems the magic number. Our brains are inherently lazy lumps of gray matter architected to conserve energy; we dislike effortful things like, well, thinking. (Insert joke here about political conventions.)

This notion of threes offers the path of least resistance, and it carries over to wildly disparate arenas. It is perhaps why writers list examples in threes — two feels like too few, four seems too many — and why the guy behind the counter at your local Indian takeout asks if you want “mild,” “medium,” or “spicy.” It's why we learn our A, B, Cs, not our A, B, C, Ds and count to three before "Go!", not "one, two, three, four, go!" It may also explain why authors and journalists superimpose this three-act storytelling construct over all manner of non-linear story phenomena, from people and the lives they lead to dramatic events, business, sports, politics, and much more. And it can also be used to relate the story of Apple.

Let’s pretend we’re shooting a movie about Steve Jobs’ life and wish to express it as a three-act story arc (with apologies to Aaron Sorkin, who’s penning the screenplay for the Steve Jobs biopic based on Walter Isaacson’s book).

Act I

The “Set Up”

In a suburban garage Steve Jobs and Steve Wozniak start Apple Computer, which Jobs forges into the most innovative consumer tech company in history, although it roils him that he’s cast as the curmudgeonly underdog to Bill Gates and Microsoft, the ever-dour dominatrix of the desktop.

Act II

“Conflict”

Jobs, as enfant terrible, is tossed out by John Sculley and the Apple Board. Licking his wounds, he rebuilds his career and reputation with NeXT and Pixar and, of course, grows as a person, i.e. his character arc.

Act III

“Resolution”

With Apple reeling Jobs, infused with greater wisdom, gallops in on his white stallion, bringing with him the elixir it so badly needs in the form of a new operating system from NeXT, and not only saves the company from certain destruction, he transforms it into a global juggernaut. Eventually Apple far outpaces its bitter foe, Microsoft, before Jobs’ tragic and untimely death. You can imagine the teary deathbed farewell between a now avuncular Bill Gates and Steve Jobs, as they make amends.

You can also view Apple’s corporate history through this three-act prism: its rise, fall from grace, and rebirth. While you’re at it, you could layer in shorter subthemes molded into three-act structures such as Microsoft aping the look and feel of Apple’s graphical interface, Apple suing, and Microsoft prevailing in court; or the wider arc of the two rivals pursuing divergent business strategies: 1.) Microsoft licenses its operating system while Apple chooses to control the hardware, 2.) Microsoft seemingly wins while Apple flounders, 3.) Microsoft stumbles as Apple, with its closed ecosystem, takes the lead on new technologies like tablets and mobile devices, leaving the former beast of Redmond in its wake.

All of this, however, doesn’t take into account Apple’s future, its fourth act, if you will. Obviously it has one, and I think its fortunes are tied to another three-act techno-drama that has been unfolding over the past quarter century. It involves the very essence of how we interact with computers, phones, music players and all other electronic devices large and small, and the stakes are enormous.

Thus far there have been two overarching innovations in personal computing: the graphical user interface (GUI), which involves a mouse that helps users navigate icons on a screen, and the gestural interface (the multi-touch screen a la iPhone and iPad). Now there’s a third, the conversational interface, which is what Siri is all about. Apple didn’t invent GUI (you can thank Xerox) nor was it the first to conjure the multi-touchscreen (others, like Jeff Han, the creator of CNN’s “magic wall,” were developing it at roughly the same time.) It didn’t create speech recognition either. But Apple has been the greatest innovator of the first two and with Siri may end up doing it a third time, although some, like Jobs' Apple co-founder Steve Wozniak, argue that Android's speech recognition is thus far superior.

This conversational interface presents tremendous opportunities. The app free-for-all we witnessed after the introduction of the iPhone, iPad, and Android-guided smartphones is coming again to a device near you. Whole industries may stack atop this speechifying platform as they have with the multi-touch screen and graphical user interface. Already nine automakers struck deals with Apple to incorporate Siri in their cars and trucks. Expect that one day soon talking computers will become as ubiquitous as touchscreens.

This doesn’t mean the conversational interface will replace touchscreens, any more than touchscreens have supplanted the mouse/icon on your desktop and laptop. Computers haven’t displaced TV, which hasn’t superseded movies, and they didn’t wipe out radio, and none of them have snuffed out reading. Instead they build on one another, each gravitating toward what it does best. If they didn't, they would disappear.

Now, we know Siri has serious problems with oral comprehension. I’d love for someone to create a Twitter sitcom premised on Siri and iPhone Autocorrect having a series of misunderstandings. Not only doesn’t Siri understand Samuel L. Jackson when he says “hotzpacho,” it (she?) doesn’t get “gazpacho” either. But let’s cut Siri some slack. It’s in beta, and when have you ever known Apple to release a product before it was ready (well, besides MobileMe)? After all, Apple is no lean startup impatient to get some minimally viable product out there.

Siri is a work in progress because it has to be. It depends on its users for social engagement so it can transmit the data back to the Apple home office, where its artificial intelligence engines can figure out what went right and wrong and constantly tune Siri to make it better, smarter, savvier. The more people who rely on Siri the more that Siri learns. It is amassing a huge database of information about its users — their likes and dislikes, desires, thoughts, vocabulary, speech patterns, diphthongs and dialects. Siri serves you but you teach Siri, and in the process Siri gets better at serving you. If Siri were a movie it might be titled “Educating Siri.”

Speech recognition can do more than just tell your car where you want to go, what music you want to hear, or let you engage in light banter with your smartphone when you’re bored, like John Malkovich. It could serve as a bridge between written and spoken communication.

Five years ago I wrote a story about Podzinger (later renamed EveryZing, now RAMP), a startup spun out of a military contractor. It had developed a speech recognition program that could spider the web, seek out audio files, and make transcriptions of all the spoken words in them. While it wasn’t perfect — transcripts were about 70% accurate — still it was mighty impressive. In a transcript, you could click on a word and it would take you to that very spot in the video. Imagine every "60 Minutes" segment ever aired transcribed and sold as part of an archive, and you could find the most relevant portion of the video, instantly. Alas, Podzinger fizzled as a business, although I expect that one day this idea will take root.

Video is all well and fine, but I prefer reading, and if I click on a link from Twitter and it leads to a video, I usually click away — not just because it takes time to load. It's the very nature of video that's the issue. Who has time to sit through an online video, especially if there’s a pre-roll ad? (For that, I turn the sound down and check email or Twitter until it’s over.) With reading we can do things we can’t do by listening. Like skim, for one.

Which takes us back to the advent of reading, which Nicholas Carr and others have reported was first conceived of as an adjunct to listening and speaking. Words were rendered as one continuous stream, transcribed from what someone said. Imagine if this essay were expressed as one long sentence? You’d probably give up after the first paragraph, which, without punctuation, wouldn’t even exist. It took the advent of spaces between words in the 9th century to make it easier to discern meaning, and this led to silent reading, which really took hold after Gutenberg invented the printing press. Ultimately reading became fundamental to human understanding, a way to express and digest copious amounts of information far beyond what we can absorb simply by listening.

As reading added depth to spoken language, the conversational interface, while an important innovation, will complement touchscreens, keyboards and typing; it won't replace them. Yet, to be sure, this three-act Apple-fueled amalgam of interfaces will not only be a prime player in the next installment of the company's history, it will likely transform how we interact with computers, and this has profound implications for how we will relate to one another.

What might be the next human-computer interface after Siri & co.? For this, you have to jettison the three-act structure and await a fourth.

First there was touch once removed (GUI) then direct touch (gestural interface) followed by speech recognition (Siri). Next could come thought recognition. Already researchers have created a means for people to control electric wheelchairs just by thinking — a boon for someone who’s paralyzed. Down the road maybe users will strap on goggles to enhance their world views while they slalom around obstacles. Perhaps they'll even  control cars with their brains. (Take that, Google!)

At some point expect Apple to take the lead on this, too. Also don’t be surprised if there’s an app for it.

[Illustrations by Hallie Bateman]