Video: Dr. Kun Xu Discusses Social Cues in Human-Machine Interactions
Dr. Kun Xu, University of Florida College of Journalism and Communications assistant professor in emerging media in the Department of Media Production, Management and Technology, was interviewed on June 2, 2021 about his research on social cues in human-machine interactions and the convergence of various human-machine communication technologies.
Below the video is an edited transcript of the full interview.
How did you get interested in studying human machine communication?
Kun Xu: I grew up in Shanghai back in the 1980s, and my family was fortunate to have the first personal computer in my home. And at that time, both of my parents needed to work during the summer break and the winter break. That left me an opportunity to just get along with the personal computer at home. And at that time, there were already a lot of computer games available on these personal computers. For example, I still remember I played chess and I also played Go game, which is a very typical Asian game, a very old ancient game. And also there are some video games, very interesting video games at that time.
At that time, there was still very limited access to internet, so you played against a pre-programmed computer agent or computer program. I guess that’s the starting point of me feeling that somehow computers can be a companion or can be a thing that can be perceived as a human being. I think the first time in academia that I felt really attracted to the area of human-machine communication is when I went to the National Communication Association conference in 2013 and I saw a poster session where some scholars at that time were looking at a telepresence robot, which is a robot that uses an iPad-like screen to interact with its users.
I guess that’s the starting point for my academic life to begin with the area of human machine communication. And later I benefited a lot from my advisor who also works in this area. And I learned a lot from this notion of so-called computers as social actors paradigm, which is a theoretical framework of talking about how people actually use interpersonal social scripts to interact with computers or other technologies.
Tell us a little bit about your research, what you’re trying to understand and what impact you’re hoping to have with the research.
Kun Xu: People from my generation, we all know Clippy, which is the old Microsoft assistant in a Word document. We found that it’s pretty annoying most of the time. But the idea is that Clippy is supposed to help us, our writing and our working with the computer. And it gives you a lot of textual based social cues. It also has this eye gaze too, which makes it pretty cute. Now we have Siri and Alexa and they all use voice-based communication skills. So the idea is that we actually, as human beings, interact with these technologies through social cues. But the problem is that how social should these cues be? Sometimes humans don’t always welcome a lot of social cues.
There’s this notion called uncanny valley effect, which suggests that as robots become really, really human like, we as human beings will be scared and we’ll be afraid of interacting with these machines because we feel that our identity is threatened and is blurred by designing so many social cues into the machines. There should be an ideal point where there’s a combination of social cues that people like, but simultaneously don’t make people feel afraid of machines being human like.
So how many social cues are enough? How much social do we need while we interact with these machines? Can we find the best combination of these social cues? And if we want to find the best combination of social cues, can we know the single quality of each social cue and then how can we best design them into these technologies? This is the starting point of me looking at the importance of social cues — such as gestures, eye gaze, facial expressions — and design them into technologies such as social robots or computer agents or voice assistants.
Is there a theme or pattern that you’ve discovered that will advance you to the next stage of your research?
Kun Xu: Yeah, it’s pretty complicated. But as I mentioned, there’s a framework called the computers are social actors paradigm. Basically the paradigm suggests that we as human beings treat computers as if they were real human beings. And even though the paradigm was originally developed to explain human-computer interaction, it’s still valid in terms of human-robot interaction or human-voice assistant interaction, or human-agent interaction. We still naturally and mindlessly respond to these technologies as if they were social, as if they were intelligent human beings. So there’s some complicated findings about how many cues are enough. For example, there are gender differences. We found that females are more conservative about robots demonstrating gestures and males are more comfortable with robots demonstrating gestures compared to just random movements.
We did a meta-analysis to compare all these different single social cues. And we found that movement probably has the largest effect compared to other social cues like appearances, voices, sounds, eye gaze, things like that. And a lot of time people think voice and the facial expressions and appearances are most important, but our meta-analysis shows that actually voice may not be as important as expected or as imagined, probably because there are already so many technologies in the markets that use voice-based communications cues, and the people are already somehow used to it and no longer feel that there’s something that is special.
But movement is something rare. Because Alexa doesn’t move. Siri doesn’t move. But imagine if you have an intelligent machine in your home that can move. That may be something that you feel a little bit creepy about or may feel a little bit scared.
The reason that people interact with social robots or other machines as if they were social is because our brain is old, so it’s in more an evolutionary perspective. When our old brain is involved with new media, we still use the old way to interact with these new machines. It’s not that the media is new. It’s because our way for interaction is still based on our interpersonal communication. We interact with a lot of things in a human way, rather than in a more evolved or a more advanced way of interacting with machines.
What other areas of the human machine at communication are you pursuing?
Areas of human machine communication are dynamic, it’s a mutual shaping process. The designers play a role and the user also plays a role. A lot of time you will see that when technology is launched in a market, designers have a vision for the technology. And then users demonstrate a better way to use technology rather than the designer’s initial vision for the technology. So I’m looking at this idea of social constructivism of social robots and trying to see how users may play a role in determining what role social robots play in a society.
I’m also looking at emerging technologies such as virtual reality and augmented reality. What I care is how people feel when they are interacting with an intelligence social being. When I look at virtual reality and augmented reality, I care more about how people feel when they are transported to a different place or space and how people make meanings from that space.
All of these technologies will become more and more convergent. Today, virtual reality, augmented reality and social robots are more separate from each other. In the future five years, they will become more convergent. For example, it’s not hard to imagine that you can be in a virtual environment and interact with a computer agent or interact with a robotic agent just like you interact with other people in your real life. And it’s also not hard to imagine that in the future, when you use telepresence robots, it can also give you a lot of spatial cues like augmented reality cues or virtual reality cues. And this convergence will make our experience with these emergent technologies more complicated.
And it will also require researchers to borrow ideas from VR, AR and social robots and to more closely work with each other and to tackle these questions regarding these convergent technologies. One example I can give is the telepresence robot, which is a robot that usually has an iPad-like screen attached and is on wheels and users can use it to interact with other people. Imagine that there’s a Zoom on wheels and it can move and you can control the movement. So for the people who see the telepresence robots, they can actually not only see my face, but they can also see my movement because my movement is embodied by the robot, is embodied by the machines.
And one challenging part in the past is that telepresence robots usually doesn’t give a lot of spatial cues, which gives a user a hard time to navigate these telepresence robot. And now telepresence robots are designed to give users a lot of spatial cues for them to better navigate the route that the robots can move. And this is a typical combination of not only the social cues delivered to the communication partners, but also the spatial cues that can be used to better use the technology. So in the next five years, the convergence of all these different types of technologies will be something that researchers would need to keep an eye on and to pay attention to, and also borrow concepts from different areas to study this area.
How would you like people to use this research to advance technology? What kind of impact are you hoping to have?
Kun Xu: From a designer’s perspective, we want to find the optimal point where we know what may be the best combination of social cues in a certain context. Because we don’t want people to feel scared when they use technology. Our goal is of course to make human machine communication, more pleasant, more natural and more intuitive. But we also care about the ethical issues. Because we know that these social cues have effects, how can we prevent people from manipulating these social cues to exert influence on people’s behavior? You can imagine that in some contexts, people may be more susceptible to some of the messages based on certain social cues.
What research are you working on now?
Kun Xu: I’m still working on some research on social robots. In a lot of previous research, scholars try to see how robots can actually imitate humans or model humans. Robots have been designed to move like humans, smile like humans, talk like humans. I would like to explore a way that humans can actually model robots. For example, I was thinking about a scenario where robots can be a role model in doing recycling tasks. When humans watch robots do this recycle behavior, can humans model the robots? Will people learn from those social robots? If that’s the case, then that will make a lot of things easier.