Monday 16 August 2021

Summative tests

Comparative summative tests: to determine if a new design is better than a legacy one, or compare two new designs, ones we designed or ours to a competitors to see which one is better suited.

it is important to demonstrate using metrics that one design is superior to the other:
  • 30% more users completed tasks A,B and C using interface 1 vs's interface 2.
  • users completed tasks 45 seconds faster using interface 1 vs's interface 2.
  • errors where reduced by 25% using interface 1 vs's interface 2
tests are conducted with the same types of users, but not the same users.  once compete a comparison is conducted with regards to the differences in performance based on interface; it goes without saying that the more times you run a test with the greater number of users, the more reliable your results are.

High level Procedure:
  1. Hypothesis: make and educated guess as to the results of your experiment
  2. Control group: compare one group to another
  3. Only interfaces should vary, user types should be the same (not the same users, but the same profile of users); Tasks to complete, data in the system should also be consistent across the interfaces being tested.
  4. Use statistical comparisons to demonstrate results, t-test, chi-squared, ANOVA, put some numbers to your claims.
Benchmark summative tests: are used to answer the question of whether or not our interface meets a performance requirement, for example that users can create an online profile in 30s and start using the system or 95% of users tested succeeded at accomplishing "task d" or users make errors less than 5% of the time.

The benchmark summative test is most appropriate when
"Hard task constraints" (eg task must be completed in x amount of time); a perfect example would be a automated kiosk at an airport.
there are defined targets: operators must process a product in less then 30 seconds. with a 1% margin of error.

benchmark summative tests are often appropriate for performance critical domains, such as healthcare or military

High level Procedure:

  1. Test users performing tasks using design
  2. capture the performance of the tasks being completed, accuracy, speed, success rate
  3. Demonstrates that metrics captured meet defined criteria 
  4. Use statistical methods to to calculate confidence interval
  5. again since you'r using statistical analysis you really need to test a large number of users to get a good level of confidence in your results
Take aways
Summarative tests are used to determine that a design is better or good enough; they are used to summarize a characteristic about a design and want to make a claim about; your claim is supported by statistical data, summarative tests are rare in User research because the background in statistics that's required. as well as the increased number of users required.
Summative tests Formative tests
Users perform tasks
Representitive users
Prove a point Find a problem
Quantitative Qualitative
Many users Few users
Rare Common