Quantifying the User Experience: Practical Statistics for User Research, Second Edition, provides practitioners and researchers with the information they need to confidently quantify, qualify, and justify their data. KEY POINTS FROM THE CHAPTER The primary purpose of this book is to provide a statistical resource for those who measure the • behavior and attitudes of people as they interact with interfaces. Another part is due to our selecting the best procedures for practical user research, focusing on procedures that work well for the types of data and sample sizes you'll likely encounter. Many designers and researchers view usability and design as qualitative activities, which do not require attention to formulas and numbers. Some of you may have never taken a statistics course whereas others probably took several in graduate school. (Jim) Lewis is a senior human factors engineer (at IBM since 1981) with a current focus on the design and evaluation of speech applications and is the author of Practical Speech User Interface Design. A comprehensive discussion of standardized usability questionnaires (Chapter 8). For fifteen years he's been conducting usability and statistical analysis for companies such as PayPal, Walmart, Autodesk and Kelley Blue Book or working for companies such as Oracle, Intuit and General Electric. By comparing data we mean comparing data from two or more groups (e.g., task completion times for Products A and B; see Chapter 5) or comparing your data to a benchmark (e.g., is the completion rate for Pro- duct A significantly above 70%; see Chapter 4). Suppose you're planning to run a formative usability study—one where you're goingtowatch people use the product you're developing and see what problems they encounter. Part 2: Formative Studies. Instead, this book is about working backwards from the most common questions and problems you'll encounter as you conduct, analyze, and report on user research projects. The next major decision is whether you're comparing data or just getting an estimate of preci- sion. Note that methods discussed in Chapter 10 are outside the scope of this book, and receive just a brief description in their sections. Many designers and researchers view usability and design as qualitative activities, which do not require attention to formulas and numbers. Finally, at "Task Time?," take the "Y" path, which leads you to "1-Sample t (Log)." As shown in Table 1.1, you'll find that method discussed in Chapter 4 in the "Comparing a Task Time to a Benchmark" section on p. 54. In general, these activities fall into three areas: 1. A wrap-up chapter with pointers to more information on statistics for user research (Chapter 10). References Lewis, J.R., Sauro, J., 2012. At the "Comparing Groups?" box, select "Y" because there will be two groups of data, one for each product. It includes both standard statistical output (p-values and confidence intervals) and some more user- friendly output that, for example, reminds you how to interpret that ubiquitous p-value and that you can paste right into reports. For many, statistics is a subject they know they should under- stand, but it often brings back bad memories of high school math, poor teachers, and an abstract and difficult topic. To find the appropriate section in each chapter for the methods depicted in Figures 1.3 and 1.4, consult Table 1.3. For this type of problem discovery evaluation, you're not planning any type of comparison, so start with the decision map in Figure 1.4.You're not planning to estimate any parameters, such as task times or problem occurrence rates, so at "Estimating a Parameter?," take the "N" path. Paired t (ch 5) N N Y 1-sample t (log) (ch 4) 1-sample t Sample > 25 t confidence (ch 4) interval (ch 3) Y N Confidence t (Log) interval confidence around interval median (ch 3) (ch 3) FIGURE 1.1 Decision map for analysis of continuous data (e.g., task times or rating scales). We stop at the "Paired Means" procedure (Chapter 6). Converting Continuous Ratings to Discrete....... 52 Comparing a Task Time to a Benchmark. . . . . . . . . . . 285 The Logic of Hypothesis Testing. . . . . . . . 2. . . . . ...... ...... ...... ... 63 Introduction..... ...... ...... ..... ...... ...... ..... ...... ...... ...... ..... ..... 63 Comparing Two Means (Rating Scales and Task Times).............. .......... 63 Within-subjects Comparison (Paired t-test). . . . . . . Table 1.3 Chapter Sections for Methods Depicted in Figures 1.3 and 1.4 Method Chapter: Section [Page] 2 Proportions 6: Sample Size Estimation for Chi-Square Tests (Independent Proportions) [128] 2 Means 6: Comparing Values—Example 6 [116] Paired Proportions 6: Sample Size Estimation for McNemar Exact Tests (Matched Proportions) [131] Paired Means 6: Comparing Values—Example 5 [115] Proportion to Criterion 6: Sample Size for Comparison with a Benchmark Proportion [125] Mean to Criterion 6: Comparing Values—Example 4 [115] Margin of Error Proportion 6: Sample Size Estimation for Binomial Confidence Intervals [121] Margin of Error Mean 6: Estimating Values—Examples 1–3 [112] Problem Discovery Sample Size 7: Using a Probabilistic Model of Problem Discovery to Estimate Sample Sizes for Formative User Research [143]. Table 1.2 Chapter Sections for Methods Depicted in Figure 1.2 Method Chapter: Section [Page] One-Sample z-Test 4: Comparing a Completion Rate to a Benchmark (Large Sample Test) [49] One-Sample Binomial 4: Comparing a Completion Rate to a Benchmark (Small Sample Test) [45] Adjusted Wald Confidence Interval 3: Adjusted-Wald Interval: Add Two Successes and Two Failures [22] McNemar Exact Test 5: McNemar Exact Test [84] Adjusted Wald Confidence Interval for 5: Confidence Interval around the Difference for Matched Difference in Matched Proportions Pairs [89] N − 1 Two-Proportion Test and Fisher 5: N − 1 Two-Proportion Test [79]; Fisher Exact Test [78] Exact Test Adjusted Wald Difference in Proportion 5: Confidence for the Difference between Proportions [81] Chi-Square 10: Getting More Information [269] For example, let's say you want to know which statistical test to use if you are comparing com- pletion rates on an older version of a product and a new version where a different set of people par- ticipated in each test. N − 1 two- proportion proportion test and Fisher exact test (ch 5) Y N N (ch 5) Y Adjusted Large sample? . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . Create Space Publishers, Denver. . 84 Key Points from the Chapter. . . . . . . . . . . . Table 1.1 Chapter Sections for Methods Depicted in Figure 1.1 Method Chapter: Section [Page] One-Sample t (Log) 4: Comparing a Task Time to a Benchmark [54] One-Sample t 4: Comparing a Satisfaction Score to a Benchmark [50] Confidence Interval around Median 3: Confidence Interval around a Median [33] t (Log) Confidence Interval 3: Confidence Interval for Task-Time Data [29] t Confidence Interval 3: Confidence Interval for Rating Scales and Other Continuous Data [26] Paired t 5: Within-Subjects Comparison (Paired t-Test) [63] ANOVA or Multiple Paired t 5: Within-Subjects Comparison (Paired t-Test) [63] 9: What If You Need to Run More Than One Test? The book presents a practical guide on how to use statistics to solve common quantitative problems that arise in user research. At "Different Users in Each Group?," select the "Y" path. This leads you to "Adjusted Wald Confidence Interval," which, according to Table 1.2, is discussed in Chapter 3 in the "Adjusted-Wald Interval: Add Two Successes and Two Failures" section on p. 22. Now we're at the "3 or More Groups" box—we have only two groups of users (before and after) so we select "N." 5. 