Thursday 26 August 2021

What to look for in a user test

When conducting a user test it is important to know what to look for:
  • Where users able to complete the tasks asked of them? and WHY or WHY not 
  • Where users able to compete their tasks promptly and simply? and WHY or WHY NOT
  • Did users make mistakes? did they realize they made a mistake? could they recover from their mistake? Why did they make a mistake? Why did they realize it? 
  • How did users feel about the experience? and WHY did they feel that way?
the key question is why? when we are running a user test we want to know the root cause of a problem or a success we want to know why something works or doesn't work so that we avoid design pitfalls and move towards effective designs.

When conducting a user test we also want to capture information during the test

Qualitative Data
Critical incidents: things that happened during the test that may explain the results
Verbal account: Statements made by the user that indicate the thought process, attitude, and explanations of the user

Quantitative Data
Time to complete a task
Success rate of task

Critical incidents: are the bread and butter of user testing, they include any action taken by the user that explains why they where or where not successful at a task, they can include things like:
  • Clicking the wrong button
  • Ignoring the instruction shown on the screen
  • Providing the wrong information
  • Following the wrong path
  • misinterpreting a label
  • expressing confusion or frustration 
  • asking for help
  • staring at the screen for a long time
  • giving up
Verbal accounts: are just the things that users say while performing tests, they can provide insights into what's going on inside the users head while performing the test, they can say things like:
  • I"m looking for
  • I was expecting to see
  • i wonder what this does
  • Well that doesn't seem right\
  • I think that was right
  • ask a question 
these types of qualitative data are essential for us to establish actionable intelligence that is information that we can leverage to fix a problem, for example quantitative data could be something like 40% of users failed at "Task A", but qualitative data would be "2 out of 5 users could't figure out how to fill in their shipping info resulting in a failure of task A", an even better result would be to link your qualitative data back to a heuristic guidelines.

Data about users
Not only should you collect data about the test, how it went, what the users did or thought but also information about the users themselves; things that will provide better insight into the users being tested. things like:
  • Technical competence:familiarity with computing or with the particular platform, IOS vs Android vs Windows or Mobile vs Desktop 
  • Domain Expertise: if they are familiar with this particular domain, if it's a social media app; do they use social medai, if it's a shopping site do they do online shopping
  • how frequently do they partake in this behavior 
  • general demographics
    • Age
    • gender
    • Education
    • Ethnicity 
the goal of collecting data about our users is to better understand the whys? why a certain group of users failed a specific task; most often the deciding factors will be technical competence and domain expertise.

Saturday 21 August 2021

Formative Tests

Formative tests are performed to identify problems that need to be fixed, they are far more common than Summative tests. where summative tests focus on quantitative data such as 90% of users where able to complete "Task A" in under 30 seconds. Formative focus on qualitative analysis of problems such as:

  • Users struggled to accomplish "Task A" because the button required failed to convey the information that it was a button.
  • Users failed to took too long to complete "Task B" because the the information required wasn't readily available and they had to go searching for it.
Formative tests are the most common type of test leveraged in UX research and design. Formative tests are performed during the design phase with the goal of identifying "Bugs" to fix.

General procedure
  1. Have representative users perform tasks
  2. Observe them, what they do, what they say
  3. capture where they struggle
  4. Identify parts of the design that cause problems

Summative tests Formative tests
Users perform tasks
Representitive users
Prove a point Find a problem
Quantitative Qualitative
Many users Few users
Rare Common

Monday 16 August 2021

Summative tests

Comparative summative tests: to determine if a new design is better than a legacy one, or compare two new designs, ones we designed or ours to a competitors to see which one is better suited.

it is important to demonstrate using metrics that one design is superior to the other:
  • 30% more users completed tasks A,B and C using interface 1 vs's interface 2.
  • users completed tasks 45 seconds faster using interface 1 vs's interface 2.
  • errors where reduced by 25% using interface 1 vs's interface 2
tests are conducted with the same types of users, but not the same users.  once compete a comparison is conducted with regards to the differences in performance based on interface; it goes without saying that the more times you run a test with the greater number of users, the more reliable your results are.

High level Procedure:
  1. Hypothesis: make and educated guess as to the results of your experiment
  2. Control group: compare one group to another
  3. Only interfaces should vary, user types should be the same (not the same users, but the same profile of users); Tasks to complete, data in the system should also be consistent across the interfaces being tested.
  4. Use statistical comparisons to demonstrate results, t-test, chi-squared, ANOVA, put some numbers to your claims.
Benchmark summative tests: are used to answer the question of whether or not our interface meets a performance requirement, for example that users can create an online profile in 30s and start using the system or 95% of users tested succeeded at accomplishing "task d" or users make errors less than 5% of the time.

The benchmark summative test is most appropriate when
"Hard task constraints" (eg task must be completed in x amount of time); a perfect example would be a automated kiosk at an airport.
there are defined targets: operators must process a product in less then 30 seconds. with a 1% margin of error.

benchmark summative tests are often appropriate for performance critical domains, such as healthcare or military

High level Procedure:

  1. Test users performing tasks using design
  2. capture the performance of the tasks being completed, accuracy, speed, success rate
  3. Demonstrates that metrics captured meet defined criteria 
  4. Use statistical methods to to calculate confidence interval
  5. again since you'r using statistical analysis you really need to test a large number of users to get a good level of confidence in your results
Take aways
Summarative tests are used to determine that a design is better or good enough; they are used to summarize a characteristic about a design and want to make a claim about; your claim is supported by statistical data, summarative tests are rare in User research because the background in statistics that's required. as well as the increased number of users required.
Summative tests Formative tests
Users perform tasks
Representitive users
Prove a point Find a problem
Quantitative Qualitative
Many users Few users
Rare Common

Wednesday 11 August 2021

Heuristic Evaluation

Heuristic Evaluation is poor mans UX inspection method, it's cheaper & faster then usability testing, it also doesn't require users which why it's cheaper & faster. It's an inspection method, you go through the interface and apply Nelsen's 10 Heuristics
  1. Visibility of system status
  2. Match between system & real world
  3. User Control & Freedom
  4. Consistency & Standards 
  5. Error Prevention
  6. Recognition instead of Recall
  7. Flexibility & efficiency of use
  8. Aesthetic & minimalist design
  9. Help user recognize, Diagnose, and recover from errors
  10. Help & documentation
Heuristic Evaluation Technique 
  • Select system or set of screens to evaluate
  • Step through the user journey and apply heuristics to potential problems
    • Be sure to test error case
    • Be sure to test help system
  • Write down all violations big or small 
    • Which heuristic they violate
    • Asses the severity of each violation
      1. Cosmetic problem: no real user experience impact
      2. Minor usability issue: Fix if there's time
      3. Major usability issue: Important to fix
      4. Usability Catastrophe: must be fixed
  • Create prioritized list of violations
    • Highlight top 5 to 10 violations
    • Rank in descending order of severity 
    • Use heuristics to explain importance
Describing a heuristic Violation
Description: Drop down list is not identified as such be inspection
Severity 2/4: Minor issue
Heuristic Violated: Recognition instead of Recall
Summary: the user should be able to identify that the "All Tasks" filter is a drop down with out having to remember to click on it.


Recommendation: Add a downward arrow to the right of the selected value indicating that it's a drop down menu.

having 3 to 5 experts evaluate a system individually then pool their findings is one of the most effective ways to identify a large share of violations 

Heuristic EvaluationUser Testing
  • Cheap
  • Fast
  • Doesn't require users
  • More Realistic
  • Finds more problems
  • Asses other UX qualities beyond Usability

It's normal to use multiple techniques in an iterative process to flush out all usability issues from your system

  • Heuristic evaluation is a quick and cheap way to identify significant flaws in a user interface
  • leverages Nielsen's heuristics 
  • Inspect each screen, potential errors, and help options
  • capture and asses violations
  • prioritize 

Tuesday 10 August 2021

Jakob Nielsen's 10 Heuristics

  1. Visibility of system status: the system should always keep the user informed of what is going on within a reasonable amount of time using progress indicators or busy signals.
    • if an action will take less then 100 milliseconds it's instantaneous 
    • if an action takes a second it's noticeable but acceptable
    • if an action takes less then 10 seconds it requires a notification that something is happening
    • if an action takes more then 10 seconds it should be done in the background, with indicators and estimates
  2. Match between system and the real world: the system should leverage real world language that is familiar to the user rather the system jargon. follow real world paradigms making information appear in a natural and logical order.
    • Takes advantage or users schema
    • Align real world actions with digital ones
    • processes should just make sense.
  3. User control and freedom: Users often choose system functions by mistake and will need a clearly marked "Emergency Exit" to leave the unwanted state without having to got through an extended dialogue. Support undo and redo.
    • Support 7 stages of action, perhaps user wants to try approach again but with deviation.
    • Users employ trial and error approach to figure out how to use new system.
  4. Consistency & Standards: users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions
    • helps users transfer schema knowledge from one part of the system to another.
    • coherent conceptual modal helps user learn system effectively.
    • use the same term in the same what through system, don't use search and then find
    • by staying consistent you help users more rapidly learn new systems by following paradigms they're already familiar with.
  5. Error prevention: Even better the good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present the user with a confirmation option before they commit to the action.
    • in-process feedback, give the user feedback before they submit, ie invalid email
    • provide constraints, keep user from making mistakes by asking for input very specifically.
    • confirm if the user is trying to accomplish something dangerous like delete all files
    • prevent users from actions that are likely to fail
  6. Recognition over recall: Minimize the user's memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.
    • try to use recognition over recall
    • is it reasonable for users to have to recall something
    • if recall fails are there cues to help 
  7. Flexibility and efficiency of use: Accelerators, unseen by the novice user may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow to tailor frequent actions.
    • recall for new or inexperienced users is very difficult whereas recall for veteran users is not a problem and both types of user need to be catered to.
    • allow users to customize their experience but don't force them to
    • accelerators are things like keyboard short cuts
    • allow users to create bookmarks and shortcuts
    • personalization tailors experiences based on past usage
  8. Aesthetic and minimalist design: dialogs should not contain information that is not relevant or is rarely needed. every extra unit of information in a dialog competes with the relevant units of information and diminishes their relative visibility.
    • Visual clutter makes it difficult to focus on important actions
    • good use of color, share, animation and gestalt principles guide the eye
    • the more there is to look at the less the user will see.
  9. Help users recognize, diagnose and resolve errors: error messages should be expressed in plain language (no codes), precisly indicate the problem, and constructivly suggest a solution.
    • good error messages will explain what the user did wrong and how to fix it.
  10. Help and documentation: Event though it's better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out and not be too large.
    • sometimes the UI just isn't intuitive enough
    • best if help is not needed
    • if required make sure that
      • help is searchable 
      • task-focused
      • concrete 

these 10 rules of thumb are in place to help design a system that is a pleasure to use

Thursday 5 August 2021

Micro Usability testing

A usability test is a formalized test in which participants are drawn from the target audience, to use pragmatic approach early on we can leverage a "Micro Usability test" this is a scaled back version of a user test:
  • Users can be "Close Enough" to the target audience (whoever is willing and able)
  • Fewer tasks (2 to 3 tasks) closer to 15 to 20 minutes instead of (5 to 10 tasks) 1hr to 1.5hrs
  • No screen recording of user actions
  • No video recording of user
  • No logging of user actions
  • No questionnaires regarding users persona 
The goal is to just capture the 2 to 3 biggest takeaways from the micro usability test, there's no need for correlating the data and doing a in-depth analyses of the what why's and how's

When conducting a micro usability test you want to come up with 2 to 3 tasks:
  • each task should be presented to the participant separate from the other tasks
  • order the tasks from easiest to hardest
  • the tasks should be clear concise
  • tasks should have a clear and defined solution
  • When completing the tasks the user should use the "Speak out loud" technique
  • and when they think they're complete the user should notify the tester
once all the tasks are complete, or the user gives up this is the time for the debrief, this is the testers opportunity to engage with the tester and ask them:
during this point you seemed "surprised", "frustrated", "confused"; could you tell me why, what was wrong with the system. etc it's your opportunity to really investigate and find out what the user was thinking or feeling.

after your ad-hoc investigation you should ask some predefined questions for general feedback
  • have you ever used a product like this? why or why not?
  • do you see yourself using something like this why or why not?
  • some questions that are particular to what your testing.
Once the test is run and the test and post test data is collected, well its time to compile it all into a micro usability test report. The report should consist of 3 sections:

1) Key observations
A few paragraphs about key observations throughout the test
  • Describe the participants, write a persona (Who they are, what kind of experience do they have, with similar systems and with technology over all)
  • How the test went overall
  • Success rate of tasks
  • Partial or complete failures of tasks
2) Problems
Focus on the top 3 to 5 biggest problems observed and diagnose the cause of those problems, things to focus on:

  • What worked well? What didn’t? 
  • What were the most confusing or frustrating aspects of the interface? 
  • What errors or misunderstandings occurred? 
  • What did users think about the interface? 
  • What would they like to see improved?
3) Recommendations 
list the main issues that where brought to light and back them up with evidence from the test, and propose recommendations as to how to rectify the problems.

Sunday 1 August 2021

User Testing

User testing also known as "Usability testing" is one of the main methods used for user testing, at it's core it's basically giving a user a task to accomplish within the system and observing the user try to accomplish that goal. By observing users work with a system you learn

  • What works and what doesn't 
  • Why things work and why some don't
  • User needs you missed or misunderstood

the basic flow for running a "user test" is

  1. Find potential users
    • When picking users, make sure to select ones that are the target audience
    • Pick users that are not current users
  2. Give them tasks to complete within the system
    • Selecting taks is for users to try is much more difficult then it seems
    • Start with the most common tasks
    • Move on to less frequent tasks, focus on the most common tasks and move in decending order
    • Closed ended tasks, 
      • ones that have a clear and defined point of completion
      • have a verifiable outcome
      • follow a predictive path
    • Open ended tasks
      • Are more natural 
      • difficult to asses success because of ambiguity
      • explore paths that may not have been identified
    • Use both open and closed ended tasks.
  3. Observe them compete their tasks
  4. Debrief them after they've successfully or unsuccessfully completed their tasks
  5. Document what you've learned

we do this as part of our assessment iteration so that we can redesign our system to work better.

When selecting our task sets some things to keep in mind are

  • order form easiest to hardest
  • focus on critical tasks the things the system must accomplish
  • should include both "open" and "closed" ended tasks
  • avoid ordering affect: giving a user the answer to a subsequent task in the current one
  • don't lead the user: avoid language that will diverge how to accomplish the tasks
  • avoid ambiguous instructions: when defining you task be specific enough that the user will understand clearly what you want them to do and how, but without leading them.
  • tell the user to indicate that they feel they've completed the task
  • pilot the tasks
    • check the task yourself and have some colleagues try them out to ensure that they meet the above criteria 
Think out loud 
  • participants verbalize out loud what they are thinking as they're accomplishing their tasks:
  • looking for something  
  • reading text
  • hypothesizing how the system might work
  • interpreting system options
  • interpreting system feedback
  • explaining their decisions 
  • feelings: frustrated, happy, etc

it's not normal for users to do this "Think out loud" process, so don't hesitate to remind users that you're interested in how the feel or what they're thinking, use positive reinforcement to coax their thoughts and opinions out.

advantages of this approach are:

  • hear how the user thinks about they task 
  • learn what the user actually sees and notices 
  • hear how the user interprets options and feedback

disadvantages of this approach are:

  • timing: since users are vocalizing what they're doing they wont zip through the system as quickly as the might otherwise
  • Attention to detail: since users are vocalizing and paying more attention to what they're doing they may notice things that otherwise would be overlooked. 
  • users will naturally ask questions, but as the observer you are not suppose to answer them
Post user test (debriefing)
once the user test is complete you can:
  • Review the users problems to try and get more information out of the user
  • ask the user if they find this product useful, if it's something they see themselves using 
  • ask if it was usable, credible, aesthetically good looking 
  • compare it to existing alternatives 
What have you learned?
after you've run your tests and completed you debriefing it's time to summarize your finding, what you should focus on are the critical incidents
  • errors: where users didn't follow the correct path expected didn't do what was expected
  • expressions of frustration: users got stuck, seemed confused as how to proceed 
  • breakdowns: where simple tasks took a long time, or users detoured from the defined journey but still got to where they where suppose to
  • pleasant surprises: things that the user enjoyed, things that where easier then expected
Assess if the user failed or succeeded and to what degree? Capture the users demeanor, where they happy with they system or do they think it's a load of bullocks. most importantly capture as much objective and subjective data as soon as possible, ideally during the test, debrief and directly after. Write down all the critical incidents you can:
  • Mental modal mismatches 
  • Misinterpretations
  • Invalid assumptions made by the system
  • Missing user needs
  • Too little flexibility 
  • Too little guidance 
while summarizing your results you really want to capture overall reactions to specific aspects of the system and link those with the users successes and failures.