In-Depth Activity
In this activity, think about the case studies and reflect on the evaluation methods used.
For the two case studies discussed in this chapter, think about the role of evaluation in the design of the system and note the artifacts that were evaluated: when during the design were they evaluated, which methods were used, and what was learned from the evaluations? Note any issues of particular interest. You may find that constructing a table like the one shown here is a helpful approach.
Name of the study or artifact evaluated When during the design the evaluation occurred? How controlled was the study and what role did users have? Which methods were used? What kind of data was collected, and how was it analyzed? What was learned from the study? Notable issues
What were the main constraints that influenced the evaluations?
How did the use of different methods build on and complement each other to give a broader picture of the evaluations?
Which parts of the evaluations were directed at usability goals and which at user experience goals?
Case studies
Case Study 1: An Experiment Investigating a Computer Game
For games to be successful, they must engage and challenge users. Criteria for evaluating these aspects of the user experience are therefore needed. In this case study, physiological responses were used to evaluate users’ experiences when playing against a friend and when playing alone against the computer (Mandryk and Inkpen, 2004). Regan Mandryk and Kori Inkpen conjectured that physiological indicators could be an effective way of measuring a player’s experience. Specifically, they designed an experiment to evaluate the participants’ engagement while playing an online ice-hockey game.
Ten participants, who were experienced game players, took part in the experiment. During the experiment, sensors were placed on the participants to collect physiological data. The data collected included measurements of the moisture produced by sweat glands of their hands and feet and changes in heart and breathing rates. In addition, they videoed the participants and asked them to complete user satisfaction questionnaires at the end of the experiment. To reduce the effects of learning, half of the participants played first against a friend and then against the computer, and the other half played against the computer first. Figure 14.2 shows the setup for recording data while the participants were playing the game.
Photos depict the physiological data (top right), two participants, and a screen of the game they played.
Figure 14.2 The display shows the physiological data (top right), two participants, and a screen of the game they played.
Source: Mandryk and Inkpen (2004). Physiological Indicators for the Evaluation of Co-located Collaborative Play, CSCW’2004, pp. 102–111. Reproduced with permission of ACM Publications
Results from the user satisfaction questionnaire revealed that the mean ratings on a 1–5 scale for each item indicated that playing against a friend was the favored experience (Table 14.1). Data recorded from the physiological responses was compared for the two conditions and in general revealed higher levels of excitement when participants played against a friend than when they played against the computer. The physiological recordings were also compared across participants and, in general, indicated the same trend. Figure 14.3 shows a comparison for two participants.
Table 14.1 Mean subjective ratings given on a user satisfaction questionnaire using a five-point scale, in which 1 is lowest and 5 is highest for the 10 players
Playing Against Computer Playing Against Friend
Mean St. Dev. Mean St. Dev.
Boring 2.3 0.949 1.7 0.949
Challenging 3.6 1.08 3.9 0.994
Easy 2.7 0.823 2.5 0.850
Engaging 3.8 0.422 4.3 0.675
Exciting 3.5 0.527 4.1 0.568
Frustrating 2.8 1.14 2.5 0.850
Fun 3.9 0.738 4.6 0.699
Graphs depict (a) A participant’s skin response when scoring a goal against a friend versus against the computer, and (b) another participant’s response when engaging in a hockey fight against a friend versus against the computer
Figure 14.3 (a) A participant’s skin response when scoring a goal against a friend versus against the computer, and (b) another participant’s response when engaging in a hockey fight against a friend versus against the computer
Source: Mandryk and Inkpen (2004). Physiological Indicators for the Evaluation of Co-located Collaborative Play, CSCW’2004, pp. 102–111. Reproduced with permission of ACM Publications
Identifying strongly with an experience state is indicated by a higher mean. The standard deviation indicates the spread of the results around the mean. Low values indicate little variation in participants’ responses; high values indicate more variation.
Because of individual differences in physiological data, it was not possible to compare directly the means of the two sets of data collected: subjective questionnaires and physiological measures. However, by normalizing the results, it was possible to correlate the results across individuals. This indicated that the physiological data gathering and analysis methods were effective for evaluating levels of challenge and engagement. Although not perfect, these two kinds of measures offer a way of going beyond traditional usability testing in an experimental setting to get a deeper understanding of user experience goals.
14.4.2 Case Study 2: Gathering Ethnographic Data at the Royal Highland Show
Field observations, including in-the-wild and ethnographic studies, provide data about how users interact with technology in their natural environments. Such studies often provide insights not available in lab settings. However, it can be difficult to collect participants’ thoughts, feelings, and opinions as they move about in their everyday lives. Usually, it involves observations and asking them to reflect after an event, for example through interviews and diaries. In this case study, a novel evaluation approach—a live chatbot—was used to address this gap by collecting data about people’s experiences, impressions, and feelings as they visited and moved around the Royal Highland Show (RHS) (Tallyn et al., 2018). The RHS is a large agricultural show that runs every June in Scotland. The chatbot, known as Ethnobot, was designed as an app that runs on a smartphone. In particular, Ethnobot was programmed to ask participants pre-established questions as they wandered around the show and to prompt them to expand on their answers and take photos. It also directed them to particular parts of the show that the researchers thought would interest the participants. This strategy also allowed the researchers to collect data from all of the participants in the same place. Interviews were also conducted by human researchers to supplement the data collected online by the Ethnobot.
The overall purpose of the study was to find out about participants’ experiences of, and feelings about, the show and of using Ethnobot. The researchers also wanted to compare the data collected by the Ethnobot with the interview data collected by the human researchers.
The study consisted of four data collection sessions using the Ethnobot over two days and involved 13 participants, who ranged in age and came from diverse backgrounds. One session occurred in the early afternoon and the other in the late afternoon on each day of the study. Each session lasted several hours. To participate in the study, each participant was given a smartphone and shown how to use the Ethnobot app (Figure 14.4), which they could experience on their own or in groups as they wished.
Photo depicts the Ethnobot used at the Royal Highland Show in Scotland.
Figure 14.4 The Ethnobot used at the Royal Highland Show in Scotland. Notice that the Ethnobot directed participant Billy to a particular place (that is, Aberdeenshire Village). Next, Ethnobot asks “… What’s going on?” and the screen shows five of the experience buttons from which Billy needs to select a response
Source: Tallyn et al. (2018). Reproduced with permission of ACM Publications
Two main types of data were collected.
The participants’ online responses to a short list of pre-established questions that they answered by selecting from a list of prewritten comments (for example, “I enjoyed something” or “I learned something”) presented by the Ethnobot in the form of buttons called experience buttons, and the participants’ additional open-ended, online comments and photos that they offered in response to prompts for more information from Ethnobot. The participants could contribute this data at any time during the session.
The participants’ responses to researchers’ in-person interview questions. These questions focused on the participants’ experiences that were not recorded by the Ethnobot, and their reactions to using the Ethnobot.
A lot of data was collected that had to be analyzed. The pre-established comments collected in the Ethnobot chatlogs were analyzed quantitatively by counting the responses. The in-person interviews were audio-recorded and transcribed for analysis, and that involved coding them, which was done by two researchers who cross-checked each other’s analysis for consistency. The open-ended online comments were analyzed in a similar way to the in-person interview data.
Overall, the analyses revealed that participants spent an average of 120 minutes with the Ethnobot on each session and recorded an average of 71 responses, while submitting an average of 12 photos. In general, participants responded well to prompting by the Ethnobot and were eager to add more information. For example, P9 said, “ I really enjoyed going around and taking pictures and [ to the question] ‘have you got something to add’ [said] yeah! I have, I always say ‘yes’… .” A total of 435 pre-established responses were collected, including 70 that were about what the participants did or experienced (see Figure 14.5). The most frequent response was “I learned something” followed by “I tried something” and “I enjoyed something.” Some participants also supplied photos to illustrate their experiences.
Bar chart depicts the number of prewritten experience responses submitted by participants to the pre-established questions that Ethnobot asked them about their experience.
Figure 14.5 The number of prewritten experience responses submitted by participants to the pre-established questions that Ethnobot asked them about their experiences
Source: Tallyn et al. (2018). Reproduced with permission of ACM Publications
When the researchers asked the participants about their reactions to selecting prewritten comments, eight participants remarked that they were rather restrictive and that they would like more flexibility to answer the questions. For example, P12 said, “maybe there should have been more options, in terms of your reactions to the different parts of the show.” However, in general participants enjoyed their experience of the RHS and of using Ethnobot.
When the researchers compared the data collected by Ethnobot with that from the interviews collected by the human researchers, they found that the participants provided more detail about their experiences and feelings in response to the in-person interview questions than to those presented by Ethnobot. Based on the findings of this study, the researchers concluded that while there are some challenges to using a bot to collect in-the-wild evaluation data, there are also advantages, particularly when researchers cannot be present or when the study involves collecting data from participants on the move or in places that are hard for researchers to access. Collecting data with a bot and supplementing it with data collected by human researchers appears to offer a good solution in circumstances such as these.