I recently updated my mac’s operating system. The new OS, named Sierra, has a few new features that I was excited to try but the biggest one was the ability to use Siri to search my files and launch applications. Sierra was bringing me one step closer to the human-computer interaction fantasy that was set up for me at an early age when I watched Picard, La Forge, and Data solve a complicated problem with the ship’s computer. In those scenes they’d ask fairly complicated questions, ask follow-up questions with pronouns and prepositions that referenced the first question, and finish their 24th century Googling session with some plain language query like “anything else?” Judging by the demo I had seen on the Apple website it seemed like I could have just that conversation. I clicked the waveform icon, saw the window pop up indicating that my very own ship’s computer was listening and… nothing.
The problem wasn’t with Siri, it was with me. I had frozen. It was as if a rainbow spinning beach ball was stuck in my mouth. I was unable to complete a simple sentence. I closed the window and tried again:
Show me files that I created on… Damnit
Sorry I did not get that.
Show me files from… That I made on Friday.
Here are some of the files you created on Friday.
In all honesty, I should have seen this coming. I frequently use Siri to set reminders or to put things in my calendar but I always use my digital assistant in secret: the moment between getting in the car and starting the engine, alone at my desk, or (sorry) while I am using the bathroom. It works almost every time but when something goes wrong, it is my commands not Siri’s execution, that is left wanting. I pause because I forget the name of the place I need directions to or I stumble when it comes to saying exactly what reminder I want to set. There are several Siri-dictated reminders sitting in my phone right now that don’t want me to forget to “bring it back with you before you go” or “to write email in the morning.” I clam up when I know my devices are listening.
It gets worse when other humans are listening to my awkward commands. The thought of talking to an algorithm in the presence of fellow humans is about as enticing to me as reciting a poem I wrote in high school or explaining a joke that just fell flat. Here I was thinking it was the technology that had to catch up to my cyborg dreams but now it seems that the flesh is the half not willing.
As it turns out I am not alone in my stage fright. Last June the marketing research firm Creative Strategies released a short report (though none of the raw data or a comprehensive methods section) that noted 98% of iPhone owners use Siri but only 3% ever talk to it in public. Most Siri usage seems to happen in the car which they surmise is related to hands-free laws, not “a free choice by consumers to embrace this technology.”
The authors of the report are surprised and seem to have no explanation for their two big findings: that 1) the speaking-to-phones-in-the-car effect is more pronounced in iPhone users than Android users even though Apple Maps is terrible and Google’s maps are the gold standard and, 2) Americans are “uncomfortable” using virtual assistants in public even though “consumers are accustomed to talking loudly on phones in public.”
None of this seems particularly surprising given my own experiences. Cars definitely require more hands-free usage but they are also where I (and most Americans) spend the most time alone. Privacy seems like an equally if not larger precipitating factor, which would mean that maps are not the only thing being used in the car. Additionally, most of Americans’ time in the car is spent commuting to work, and so maps are unnecessary. It is far more likely that we’re asking our phones to play that new album or place a call to mom to see how she’s doing.
Equating human-to-human conversation over the phone with giving orders to a virtual assistant is a digital dualist mistake. Americans are certainly good at yelling at each in public, but that skill may not transfer to digital assistants. Interacting solely with a piece of software is something altogether different, although still social because algorithms are made by people and our interactions are situated within and among other humans.
Moreover, engineers assume a one-on-one relationship with devices with little regard for how a device is used in a group or how others see us use our gadgets. We can know this by just looking at how these services are demonstrated at their launch and subsequently marketed. Commands are clearly stated sentences from a single person into one device. Even devices like Amazon’s Alexa, which are meant to serve the whole home, cannot intercede in a conversation between two or more people. It is always one-on-one.
Many tech critics, unfortunately, tend to reinforce this assumption in their writing by describing psychological effects and rarely sociological observations. Analysis focuses on the extension of individuals’ cognitive abilities or laments eyes focused on screens. Rarely are we treated to a discussion of the role of devices as social actors in a relationship with multiple humans. Part of this is strictly economics: if a device is meant to be shared you cannot sell one to every single person. More insidiously though, the asocial approach to technology reflects a shallow understanding of humans’ communicative practices.
How we are seen talking by third parties, especially when the conversational partner is unknown, is very important. It is the stuff of reputations and flash judgements. One of the myriad scenarios that run through my head when I imagine using Siri in public is that someone might think I am talking to a human the way I talk to Siri, which is to say, talking to them like an asshole. I do not tell Siri please and thank you, nor do I use deferential phrases like “could you” or “would you mind.” I talk to Siri the way I talk to a cable company’s phone tree.
I have not done an exhaustive study of this subfield of HCI, nor am I practitioner myself but a quick look at some of the emerging textbooks and research in what is being called “conversational interfaces” is immediately telling. Michael McTear’s modestly titled The Dawn of the Conversational Interface [PDF] opens with an introduction describing the 2013 movie Her. He does not use this references as a cautionary tale, but as a simple demonstration of what conversational interfaces may soon become. Her is aspirational in a way that makes you hope that McTear stopped watching the movie before the third act. Unmentioned is the romantic relationship this male character has with this feminine AI who is, one must assume, both his secretary and lover. (More on those considerations here.)
McTear’s writing is one example of a fairly common relationship between fictional depictions of technology and very real attempts at making that technology come to life. Engineers and scientists regularly appeal to the fears and hopes depicted in film as a way of building a mythos around their research program. Cyber security research promises to prevent the devastation depicted in action movies, public funding for road infrastructure will deliver us into a Jetsons-like techno utopia (see previous link), and Siri will eventually fall in love with you.
If there is any prescription to be had here it is the work of Philip Agre, Phoebe Sengers, and others who advocate for the integration of critical theory into computer science and similar fields. Agre’s argument that computer scientists would do better work if they were critical of the basic assumptions of their field, is immediately relevant here. Is boss/assistant really the best relationship we could have with our devices? Is this nothing more than a softening of the master/slave terminology [PDF] that still lingers in mechanical engineering and computer science? Are we still beholden to the idea of the robotnik: The Czech word for slave that, through the translation of Karel Capek’s play R.U.R gave us the English word robot.
Perhaps, deep down, we are reticent to bark orders at our phones because we sense the echoes of arbitrary power in the construction of our machine-readable verbal commands. That the embarrassment we feel is a sort of discomfort with being a master, not just looking or sounding awkward. That makes the commands in private seem even worst if I am honest. At least open and notorious commands are exactly what they appear to be. Acting the master in private is a desire for veiled power which, to my mind, seems more sinister.
If Microsoft’s ill-fated Tay was a bellwether of the racist invective endemic to the internet then the cheery submissiveness of our digital assistants are something even darker. Certainly what we say and do to software is (for now) nowhere near as important as what we say and do to our fellow humans but we should think deeply about what we are indulging in when we talk to computers. Whether these practices and relationships are a net positive for a society that could use fewer power differentials. Just because we have talking computers doesn’t mean we’re any closer to the utopic visions we see on TV.
David is on Twitter: @Da_banks