Pros and Cons of Co-participation in Usability Studies
By Chauncey Wilson, author, and Judy Blostein, editor
Co-participation (also known as co-discovery) is a usability evaluation technique where two participants are paired in a usability test and work collaboratively on tasks. They are often asked to think aloud while working together. I've used co-participation for both hardware and software tests and have also used the more traditional one-person-at-a-time technique.
Ive been discussing co-participation with colleagues who have used this method and the more traditional think-aloud approach with a single participant. This article summarizes that feedback and presents the pros and cons of using co-participation. Contributors to this discussion are shown in parentheses at the end of their comments. All contributors gave permission to use their comments.
Summary of Pros
Summary of Cons
Respondents to a query about when co-participation is useful indicate that the co-participation method works well early in the testing or prototyping stage, for validating overall design and for comparing different design alternatives. The main advantage of this technique is that it allows the researcher to gather usability issues early in the design phase.
We have been using the co-discovery technique for many years now. We have had a lot of success with this technique in capturing user problems in software designs using paper prototypes and actual "runnable" software. We tend to use this technique to gather usability issues/concerns earlier in the design stage or for comparing various design alternatives. Generally, we have two users working on the software together with the moderator and sometimes a developer or two in the same room. When we are interested in more objective measures, we tend to use single-user, think-aloud protocols, with little moderator intervention. Overall, we have found the co-discovery technique to be quite effective. (Wayne Ho)
If I'm doing early testing/prototype testing, where what I'm really looking for is a diversity of responses to the interface, I find co-discovery testing works really well. (Mary Beth Rettger)
Co-participation (I call it the "paired user paradigm") provides the most bang for the buck in validating the overall design sanity, for example, the navigational model, window purpose, the success of the primary metaphor or affordances. (Danny Wildman)
Promotes a natural interaction style
Co-participation promotes a natural conversational style between the participant and tester as well as between paired participants. The participants have less "test anxiety" and are willing to help each other even though they may be strangers.
Some users feel more comfortable working with another individual on a problem/task. (Wayne Ho)
The pro has been that they do help each other out and that eliminates some of the frustration for them. Even when they don't know each other well, they seem to adjust to each other very quickly. (Susie Robson)
By contrast, I have found that paired users converse, argue, propose, guess, and joke (natural human dialog) without consciously reporting. In other words, the fact that we're listening and watching is tangential to their focus - they're engaged with the software and with each other, not with the "testing" going on. So the conversation is as natural as can be; it happens without extra cognitive effort. (Danny Wildman)
Produces more comments than think-aloud sessions
Co-participation encourages participants to make a lot of comments. As people work together to problem solve, they need to come to a consensus, or to explain to each other what they are doing and thinking.
Users might make more comments because they are more involved in problem solving. Nielsen cites a study by Hackman and Biers (1992) that notes that co-participation yields more comments than thinking aloud. (Chauncey Wilson)
Users tend to speak more as they work together to solve a problem or accomplish a task. I have found that the two users working together come to a consensus while solving problems. From my experience, one co-discovery session provides more information than one think-aloud session, but I would be hesitant to say it is equivalent to two think-aloud sessions. (Wayne Ho)
Maybe so, but they also make more comments because they have to explain to each other what they're doing and thinking. These conversations often reveal good as well as bad. Users may spontaneously mention that they like/dislike a menu's contents or a command name or a feature. They may offer guesses (to each other) at what a tool bar icon might be. They offer tentative strategies and hypotheses of how to perform a next step. I don't think you get these types of comments from lone users in front of a microphone. (Danny Wildman)
Testing is faster using co-participation, perhaps because people working together are more successful and take less time than people working alone. You can also get data quickly using this method.
Testing is faster since you only have to run half as many sessions. (This implies that you are still testing two participants rather than one group.) Nielsen (Usability Engineering, p.198) notes that "[co-discovery] requires the use of twice as many test users as a single-user thinking aloud". (Chauncey Wilson)
To look at it from another angle, training course participants are often asked to work in pairs during the hands on lab exercises. Occasionally this is to economize on the hardware logistics, but it also saves time. When faced with unfamiliar problems and software, people working in pairs are consistently more successful and take less time than people working alone This implies that if you are after the initial use experience then co-participation probably gives an overly rosy result. On the other hand, this may cancel out so that a co-participation session is closer to the real life experience of an interface after the initial meeting. (Colin Moden)
The very first usability test I ever did was during a training class and involved about a dozen students working in pairs. The company had never heard of usability testing (this was several years ago) but the training instructor was sympathetic to the cause and allowed me to steal an hour of his class time to watch a bunch of people struggle with an installation process. From that one hour, I got enough data to convince management that there was a real opportunity for improvement. It wasn't fancy, but it worked really well to collect several data points very quickly about a limited number of things, such as success rate, average time to completion, percentage who used the manuals, and so on. (Carolyn Snyder)
Easier for the experimenter
Experimenters stay more remote using co-participation since they may use fewer prods to get participants to think aloud. This background stance helps reduce experimenter bias that could influence test results.
The need for prodding often keeps the experimenter close to the user, which opens a Pandora's box for experimenter bias entering the results. Probing questions, body language, and overt intervention may lead talk-aloud users to solutions they might have missed otherwise. Co-participation allows the facilitator to remain remote. (Danny Wildman)
If a person doesn't show, you can still run the lone participant and get some data. (Chauncey Wilson)
Good method for applications where people work together
Co-participation mirrors a real work environment, especially for applications or situations that call for cooperation, consensus, or simply people helping each other.
Useful for applications where people actually do work together on a system, for example, in a corporate video conference facility where a group usually has to figure out the control panel each time they use the video conferencing facilities. (Chauncey Wilson)
If the work situation calls for cooperation, this (co-participation) is a good way to test the system. (Tom Callaghan)
Users tend to come to a consensus (and we encourage them to) while problem solving. (Wayne Ho)
An under-rated advantage, since people at work often do ask peers to help with a new product. (Danny Wildman)
More fun for both
Co-participation is more fun for everyone involved!
The sessions tend to be more interesting for observers. This can help promote usability/user-centered design when you have developers/managers watching sessions. (Wayne Ho)
Our participants have always had a great time and like to come back for more. (Danny Wildman)
Different learning, verbal, cultural or hierarchical styles affect feedback
Differences in learning, verbal, or cultural styles can hamper feedback from the participant, as can the hierarchical relationship of a pair. The experimenter needs to set clear rules to reduce such differences and make the participants comfortable during the test.
Large differences in verbal style (one loud, talkative person versus a shy, quiet person) may mask one participant's feedback. Participants may have different learning styles and this may impede the flow of feedback. (Chauncey Wilson)
Sometimes one user overpowers the other. One user may dominate the entire session. (The moderator must help prevent this.) Similarly, cross-cultural differences between users [affects feedback]: one user may not express a different opinion, for cultural reasons. (Wayne Ho)
Beware of manager-subordinate combinations. During one test, the subordinate was explaining how the interface didn't support the workflow typically used on the job, which happened to be contrary to company policy. The manager responded with, "But you're not supposed to..." It was awkward. (Carolyn Snyder)
Attention should be paid to the group dynamics even in a testing environment. A combination of personality differences AND hierarchical relationship has significant impact on the testing and design of jobs and work-related applications. (Evelyn So)
We'll ask the quiet one to be the initial reader of instructions and scenarios, somewhat leveling the playing field. Once they've accommodated to being the lead speaker, they usually continue to be verbal even when they later "drive" the system. (Danny Wildman)
Careful candidate screening
Carefully screen candidates by using clear selection criteria to establish reliable test results.
Differences in background may have a strong influence on the results. Technical background is something that can be screened for, but the screening might have to be a little more in depth than an "ordinary" screening. (Chauncey Wilson)
Another thing I've tried to increase the chances that two people participate naturally and comfortably together is to recruit them as a pair. After I find the first person, I ask that person to suggest someone else with the "right" characteristics that they'd like to work with. This sometimes works great, as they already know how to relate to each other and they bring their shared work experiences with them. It also occasionally backfires, as they may have too much outside of the test to talk about. (Betsy Comstock)
There are situations where it is useful to have an expert/novice dyad, and there are situations where a researcher may actually want experts to guide novice users. For example, we have had situations where information product developers were planning to create info products for a task. The research questions we were asking were what content needs/doesn't need covering, how can we "chunk" the tasks and subtasks, where do target users require lots of background information (as opposed to merely procedural information), what kinds of scenarios and/or metaphors may be used to explain concepts, and so on. In addressing these research questions, our lab has used co-discovery protocols with expert/novice dyads with very positive results. The major problem in ensuring reliability with this technique is in establishing extremely clear selection criteria for the "expert" subject and the "novice" subject. (Tharon Howard)
Apprehension affects feedback
Participants may or may not feel more anxious about looking bad in front of another person than they would in a single user test.
Evaluation apprehension may affect feedback. You now have an experimenter and a "peer". Maybe you didn't worry too much about the experimenter, but now you have to worry about looking bad in front of someone else. (Chauncey Wilson)
Just as you do not want to lead or instruct the user too much as a moderator (figure things out themselves), you would not want one user (louder, more experienced, etc.) to draw the other through an issue they think is a problem or which they have a question on. (Tom Callaghan)
Users may be intimidated due to the relatively large number of observers/people in the room. (Wayne Ho)
Ironically, we've noted the opposite. In the single-user setting, our users (primary telephone company technicians, perhaps a special breed) were frequently uncomfortable about making a mistake and looking dumb. There may have been an irrational fear that their supervisors would find out. They also tended to be somewhat camera shy when alone. But in the paired situation, they forgot about the camera and the microphone and they dropped inhibitions. Perhaps "looking dumb" was less threatening since the responsibility for mistakes is now shared. Earlier reticence about indicting our design disappeared. Paired users in our trials have been outspoken, direct and open. (Danny Wildman)
Discomfort if co-participation goals or rules unclear
Participants work well together when they understand that co-participation is a goal of the test and they understand the rules for working together.
First meetings are often awkward so the experimenter must be really adept at making people feel comfortable with the lab setting, the rules of the test, the experimenter's involvement, and the other participant. The recruiting script really needs to highlight that the usability test involves two people working together. (Chauncey Wilson)
You can provide rules for how the two participants work (alternate who works the mouse and keyboard as people work on the task), but with direct manipulation, it is sometimes hard for the mouse-less person to contain herself/himself (should there be a system with two mice and two keyboards analogous to the cars that are used for driver training that have multiple sets of controls?). (Chauncey Wilson)
To help two people with different styles work together, I've tried asking them to take turns, as others have mentioned. That often seems to become awkward and artificial. What seems to work better is to tell participants that they should both UNDERSTAND everything that's going on. This seems to get people to pause and include the other person when they see that they're making a choice or wondering about something. It also seems to empower the less dominant person to interrupt and say, "Wait a minute, what are you doing there?" (Betsy Comstock)
I used to have one person do the mouse and one person do the keyboard, until one disastrous day when the keyboard guy knew all the shortcuts, so both users were clicking and typing as fast as they could Now I have one user read task one out loud, and the other user do the work, then switch. They still both have to pay attention, and it seems to work better. (Mary Beth Rettger)
We've found total strangers bonding during the test period. Never had a pair that didn't get along or that didn't overcome any awkwardness almost immediately. Again, ours might be a special audience, since they still share similar work cultures. (Danny Wildman)
Conducting a test with more than one person may or may not be harder than with a single user test.
One con we have discovered is that the two participants start talking to each other which means they don't talk as loud. Observers can't hear a word they are saying. We haven't invested in a microphone since that might be intimidating to hear their own voices booming throughout the room. We're not quite sure how to fix this. We usually have to keep telling them to talk louder and they do for under a minute but then go back to talking to each other very quietly. After all, they are sitting right next to each other. (Susie Robson)
A minor point, but with two people sitting in front of a PC, the orientation will be different than a single person and their view of the screen and distance from the screen may be different than is typical. (Chauncey Wilson)
Decide on the appropriate test method for the task
You need to have clear test goals to decide if the co-participation method is right for the task or situation being tested.
Is the task a collaborative one or not? If you are designing a Web application where 90% of your users will be working alone in their den, is it better to test a single individual rather than a collaborative group? While it is clear that a lab is not a den or home office, it is also clear that enforced collaboration is not the norm. (Chauncey Wilson)
If the work situation calls for cooperation, [co-participation] is a good way to test the system. If it is a solitary task, you might not be predicting what will happen in the "real world". (Tom Callaghan)
My basic rule of thumb is that two people will almost always be able to figure anything out: so, if I'm testing to see if it's possible for a user to negotiate a UI, than I don't think co-discovery is the best choice. (Mary Beth Rettger)
By contrast, it's not a very useful way to measure task completion times, keystroke-level productivity, or, as several people implied, initial learning without peer support. It provides rich qualitative, if not quantitative, data that can be applied directly to design. (Danny Wildman)
More participants needed
Researchers agree that they need more participants than in a single person study, but how many people are needed depends upon the task to test.
If you consider a co-participant group equivalent to an individual in a single person study, then you need twice as many participants (Eight groups of two people equals eight individuals). (Chauncey Wilson)
I don't think of a co-participant situation as "equivalent" to an individual. I think you DO need twice as many participants in a co-participant study. You just get richer results from each session. This raises another point, too. It doesn't have to be TWO. I have frequently run more than two people in the same session when the situation calls for it. For example, in my most recent Day One test at our company, we had two system administrators installing the product in the morning. In the afternoon, they and two end users all used it. Part of what we could then observe was the ways the system administrators chose to explain the product to their end users. Running four people at once was relevant because the product we were testing was a multipoint videoconferencing product. (Betsy Comstock)
Data analysis harder
Perhaps the real strength of co-participation is not in quantitative data analysis, but in the qualitative comments you get from the participants. You may be surprised by some unintended results!
It is harder to do detailed error analysis with two people, though you get a larger list of general issues and problems. (Chauncey Wilson)
[Using pairs] very frequently stretches the test situation in ways you hadn't anticipated. I remember one manager-secretary pair I once observed installing a PC. The secretary didn't like the way the lab furniture was arranged and got her manager to re-arrange it before they proceeded. This was obviously the way these two related in their jobs. But it also pointed out (obviously) that room arrangement is really part of a hardware installation process, even though we hadn't really intended to test that aspect. (Betsy Comstock)
This "con" assumes some sort of statistical analysis, which can be done on any quantitative objective data you measure (Navigations in error? Help accesses? Failures to complete task? Task completion times?), presumably with some accounting for an advantage of two heads behind the wheel. Questionnaire or rating data can be collected from each member of a pair independent of each other (if participants don't discuss the answers) or can be negotiated agreements that consolidate their opinions into a single response. The latter method can be messy and can mask important disagreements, but it stimulates the pair to discuss design issues. The resulting conversation can be quite enriching to the eavesdropping facilitator. But the statistical issues are often moot - it's the users' dialog and comments that inform design, directly and without much need for interpretation. (Danny Wildman)
Dumas, J. and Redish, J. A Practical Guide to Usability Testing. Exeter, England: Intellect, 1993.
Nielsen, J. Usability Engineering. Academic Press: Boston, MA, 1993.
Hackman, G. S. and Biers, D.W. Team Usability Testing: Are Two Heads Better than One? Proceedings of the Human Factors Society 36th Annual Meeting (pp. 1205-1209). Human Factors and Ergonomics Society: Santa Monica, CA, 1992.