it is important to demonstrate using metrics that one design is superior to the other:
- 30% more users completed tasks A,B and C using interface 1 vs's interface 2.
- users completed tasks 45 seconds faster using interface 1 vs's interface 2.
- errors where reduced by 25% using interface 1 vs's interface 2
High level Procedure:
- Hypothesis: make and educated guess as to the results of your experiment
- Control group: compare one group to another
- Only interfaces should vary, user types should be the same (not the same users, but the same profile of users); Tasks to complete, data in the system should also be consistent across the interfaces being tested.
- Use statistical comparisons to demonstrate results, t-test, chi-squared, ANOVA, put some numbers to your claims.
Benchmark summative tests: are used to answer the question of whether or not our interface meets a performance requirement, for example that users can create an online profile in 30s and start using the system or 95% of users tested succeeded at accomplishing "task d" or users make errors less than 5% of the time.
The benchmark summative test is most appropriate when
"Hard task constraints" (eg task must be completed in x amount of time); a perfect example would be a automated kiosk at an airport.
there are defined targets: operators must process a product in less then 30 seconds. with a 1% margin of error.
benchmark summative tests are often appropriate for performance critical domains, such as healthcare or military
High level Procedure:
The benchmark summative test is most appropriate when
"Hard task constraints" (eg task must be completed in x amount of time); a perfect example would be a automated kiosk at an airport.
there are defined targets: operators must process a product in less then 30 seconds. with a 1% margin of error.
benchmark summative tests are often appropriate for performance critical domains, such as healthcare or military
High level Procedure:
- Test users performing tasks using design
- capture the performance of the tasks being completed, accuracy, speed, success rate
- Demonstrates that metrics captured meet defined criteria
- Use statistical methods to to calculate confidence interval
- again since you'r using statistical analysis you really need to test a large number of users to get a good level of confidence in your results
Take aways
Summarative tests are used to determine that a design is better or good enough; they are used to summarize a characteristic about a design and want to make a claim about; your claim is supported by statistical data, summarative tests are rare in User research because the background in statistics that's required. as well as the increased number of users required.
| Summative tests | Formative tests |
|---|---|
| Users perform tasks | |
| Representitive users | |
| Prove a point | Find a problem |
| Quantitative | Qualitative |
| Many users | Few users |
| Rare | Common |