{"id":7,"date":"2021-08-03T12:14:41","date_gmt":"2021-08-03T12:14:41","guid":{"rendered":"http:\/\/tailinhares.tech\/?p=7"},"modified":"2021-08-30T11:01:13","modified_gmt":"2021-08-30T11:01:13","slug":"deutsche-bahn-booking-experience","status":"publish","type":"post","link":"https:\/\/tailinhares.tech\/index.php\/2021\/08\/03\/deutsche-bahn-booking-experience\/","title":{"rendered":"Deutsche Bahn Booking Experience"},"content":{"rendered":"\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-drop-cap has-medium-font-size\">Establishing quality metrics across platforms is a huge challenge especially for companies that offer both online and offline experiences. In a team composed by six TU Berlin students, I designed and planned an experiment for measuring user satisfaction in the services offered by Deutsche Bahn (DB), the German railway company. The study, which compares the website and the ticket machine booking experiences, was an assignment for the seminar \u201cHuman Machine Systems\u201d at TU Berlin. Besides designing and deploying the human-subject experiment, another requirement was using at the same time methods that capture physiological, behavioral, and attitudinal data. As the goal of this project was to evaluate the service quality of experience instead of optimizing the system, I started by defining a research question instead of an objective that maximizes a key performance indicator.<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p><strong>Research Question<\/strong>: which DB booking experience generates higher user satisfaction?<\/p><\/blockquote><\/figure>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">The metric that I choose for assessing user satisfaction was the <strong>Customer Satisfaction Score<\/strong> (CSAT-score), which measures the percentage of users satisfied with the service in the episode of usage, i.e., right after experiencing it. Experiment participants could answer the question \u201cHow would you rate your overall satisfaction with the booking experience?\u201d with a score ranging from very unsatisfied (1) to very satisfied (5). When performing user testing that could include non-experienced users, the CSAT-score is preferred to the Net Promoter Score (NPS), which has the goal of assessing the user loyalty, that is the user satisfaction in a multi-episodic history of usage.<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p><strong>Hypothesis<\/strong>: the user satisfaction is higher with the website because it is more user friendly, it is easier to use, and less stressful.<\/p><\/blockquote><\/figure>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"has-large-font-size wp-block-heading\">The UX Researcher toolbox<\/h2>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">By now I have a <strong>dependent variable<\/strong> called user satisfaction that is impacted by the <strong>independent variable<\/strong> booking experience, which has two levels: website and ticket machine. In this study I understand user satisfaction as a function of system user friendliness and the effort necessary to accomplish a task. In order to quantify the influence of different booking experiences in these variables, I decided to measure the user friendliness with the <strong>System Usability Scale<\/strong> (SUS) survey, besides considering the click\/time rate and a physiological stress metric as proxies for required effort (Table 1).<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"221\" src=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/table-user-satisfaction-1024x221.png\" alt=\"\" class=\"wp-image-11\" srcset=\"https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/table-user-satisfaction-1024x221.png 1024w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/table-user-satisfaction-300x65.png 300w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/table-user-satisfaction-768x166.png 768w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/table-user-satisfaction-1536x331.png 1536w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/table-user-satisfaction-2048x441.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Table 1: User satisfaction as a function of user friendliness and effort<\/figcaption><\/figure>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">The SUS survey is a 10-items questionnaire with assertions that can be answered using the 5-point Likert scale. Each question has its rephrased version, what accounts for robustness in the measurement of attitudinal data. Moreover, SUS is broadly used and validated, reliable for small samples, besides enabling comparisons between results. The intuition behind considering click\/time a metric of effort is that the speed of the user interaction with the interface decreases if utilizing the system is costing. In other words, users need more time to decide which button to click next in complicated interfaces.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">Another measure of effort used in the experimental design was the <strong>Electrodermal Activity<\/strong> (EDA) measured with a <em>BITalino<\/em> biosignals platform. EDA captures the resistance of the skin to a small electrical current caused by its neural response to stress. To make it simpler: a person sweats when his\/her stress level is high, which increases the skin conductance. An advantage of <em>BITalino<\/em> compared to other devices that measure stress such as an <em>Apple Watch<\/em> is its algorithmic transparency and the open access to the raw data points.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">As our sample size in this study was constraint to ten people, we decided that a <strong>within-subjects experimental design<\/strong> would be the best choice. A between-subjects design would split our participants into two groups, decreasing the sample size and, consequently, affecting the power of the study. Applying the within-subjects design it is more likely that we will have statistically significant results in the end of the study.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">Although deploying this experiment in the field was initially a constraint, as we could not move a ticket machine to the lab, it also has the benefit of preserving the study <strong>external validity<\/strong>, due to the similarity between our setting and a real-world situation. On the other hand, in lab experiments the researcher can keep control of undesirable influences, what maintains the experiment internal validity high and facilitates experiment reproducibility.<\/p>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"has-large-font-size wp-block-heading\">Preventing bias in the study design<\/h2>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">Offering the participants written instructions is the best way to ensure experiment objectivity and avoid the <strong>observer-expectancy effect<\/strong>, that is, when the experimenter expectations influence the study result. This could happen, for instance, if a researcher tells more details about the experiment intended outcomes to one of the participants.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">In each one of the treatments the participants must accomplish four tasks, which basically consist in buying tickets with different conditions, such as booking a group travel or buying an extra ticket for a child. A survey is applied when participants accomplished the set of tasks for each experience. Because humans retain more easily information presented in the beginning and in the end of an event, a phenomenon called respectively <strong>recency and primacy effect<\/strong>, I decided to randomize the order in which the tasks are presented in each treatment for all participants. Besides that, half of the participants started the experiment on the website, whereas the other half judged the ticket machine first, controlling for any effect caused by <strong>participant\u2019s fatigue<\/strong> in the ratings. Both steps have the goal of setting aside as much as possible the impact of human bias in the experiment\u2019s outcome.<\/p>\n\n\n\n<p class=\"has-medium-font-size\">Finally, it was necessary to isolate the influence of the ticket price in the user satisfaction across treatments by establishing for each of them different city destinations. As the participants were requested to buy the cheapest tickets, they could get upset if they find a more expensive ticket when deploying tasks in the last treatment. Comparing between treatments could decrease their satisfaction rating drastically, especially if they have the sensation of making a bad deal. This effect is well known in the behavioral economics\u2019 field as <strong>loss aversion<\/strong>, in other words the human tendency to avoid loss due to our higher sensitivity to losing than to winning.<\/p>\n\n\n\n<div style=\"height:60px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"has-large-font-size wp-block-heading\">Experiment deployment and results<\/h2>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">The user tests were conducted in the field by two trained colleagues who were also responsible for recruiting the participants. Due to time and resources constraints, our study has a very small sample size of ten people. Although my colleagues have made the effort of inviting a diverse user base, the study sample was still mainly male (60%), young (80% between 21-29 years old), and highly educated (70% are enrolled in universities). When asked about how often they use the DB booking systems, the answer\u2019s distribution shows that the participants use the website more often than the ticket machine. As shown in the figure below, 30% of participants buy tickets on the website at most six times in the year, whereas 40% use the ticket machine once a year to book their trips.<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"551\" height=\"264\" src=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/nutzungshaufigkeit.png\" alt=\"\" class=\"wp-image-216\" srcset=\"https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/nutzungshaufigkeit.png 551w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/nutzungshaufigkeit-300x144.png 300w\" sizes=\"auto, (max-width: 551px) 100vw, 551px\" \/><figcaption>Frequency of use in percentage from weekly (top) to never (bottom).<\/figcaption><\/figure>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">Ratings from the first and second presented treatments were not significantly different, validating the success of our randomization system. The statistical test used in the analysis was the t-test for related samples from the scipy Python package, which is used to estimate whether the difference between two measurements from the same set of study participants could have happened by chance. To refute the null hypothesis that both average expected values are identical, the test p-value should be smaller than 0.05. <\/p>\n\n\n\n<p class=\"has-medium-font-size\">Because of our sample size being too small, the only statistically significant difference found between our metrics was the average number of clicks per task, which was higher for the ticket machine (mean of 30 clicks\/task) than for the website (mean of 24 clicks\/task). Although the CSAT score of both systems are identical, the percentage of participants satisfied with the website is higher (90%) than with the ticket machine (80%). Moreover, the usability of these systems was also above average, given that their SUS scores were higher than 68 (for more on interpreting SUS scores see this <a href=\"https:\/\/measuringu.com\/sus\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jeff Sauro\u2019s article<\/a>).<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-gallery columns-2 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"376\" height=\"264\" src=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/csat_scores.png\" alt=\"\" data-id=\"217\" data-full-url=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/csat_scores.png\" data-link=\"http:\/\/tailinhares.tech\/index.php\/2021\/08\/03\/deutsche-bahn-booking-experience\/csat_scores\/\" class=\"wp-image-217\" srcset=\"https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/csat_scores.png 376w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/csat_scores-300x211.png 300w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/csat_scores-340x240.png 340w\" sizes=\"auto, (max-width: 376px) 100vw, 376px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"386\" height=\"264\" src=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/sus_scores.png\" alt=\"\" data-id=\"218\" data-full-url=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/sus_scores.png\" data-link=\"http:\/\/tailinhares.tech\/index.php\/2021\/08\/03\/deutsche-bahn-booking-experience\/sus_scores\/\" class=\"wp-image-218\" srcset=\"https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/sus_scores.png 386w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/sus_scores-300x205.png 300w\" sizes=\"auto, (max-width: 386px) 100vw, 386px\" \/><\/figure><\/li><\/ul><figcaption class=\"blocks-gallery-caption\">Frequency of CSAT and SUS scores in the ticket machine (blue) and website (orange) treatments.<\/figcaption><\/figure>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">There is statistically significant strong positive correlation between SUS and CSAT scores for both website and ticket machine, proving that good usability and user satisfaction walk hand-by-hand. However, average task duration and interaction speed only impact SUS and CSAT ratings when booking tickets on the website. For this treatment the Pearson correlation analysis shows a strong relationship between high interaction speed and high SUS and CSAT scores. On the other hand, the longer it takes in average for the participants to deploy a task on the website, the lower are the SUS and CSAT ratings attributed to the system.<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-gallery columns-2 is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"386\" height=\"278\" src=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_csat.png\" alt=\"\" data-id=\"219\" data-full-url=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_csat.png\" data-link=\"http:\/\/tailinhares.tech\/index.php\/2021\/08\/03\/deutsche-bahn-booking-experience\/speed_vs_csat\/\" class=\"wp-image-219\" srcset=\"https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_csat.png 386w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_csat-300x216.png 300w\" sizes=\"auto, (max-width: 386px) 100vw, 386px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"382\" height=\"278\" src=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_sus.png\" alt=\"\" data-id=\"220\" data-full-url=\"http:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_sus.png\" data-link=\"http:\/\/tailinhares.tech\/index.php\/2021\/08\/03\/deutsche-bahn-booking-experience\/speed_vs_sus\/\" class=\"wp-image-220\" srcset=\"https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_sus.png 382w, https:\/\/tailinhares.tech\/wp-content\/uploads\/2021\/08\/speed_vs_sus-300x218.png 300w\" sizes=\"auto, (max-width: 382px) 100vw, 382px\" \/><\/figure><\/li><\/ul><figcaption class=\"blocks-gallery-caption\">Relationship between average interaction speed when deploying tasks and CSAT and SUS ratings.<\/figcaption><\/figure>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-medium-font-size\">In conclusion, our study shows <strong>no significant evidence to the hypothesis that \u201cthe user satisfaction is higher with the website because it is more user friendly, it is easier to use, and less stressful\u201d<\/strong>. Furthermore, both DB systems have above average usability and satisfy the users\u2019 needs. The only downside of the ticket machine was the higher average number of clicks, which is understandable due to the interaction interface\u2019s limitations. Surprisingly, the participants seem less tolerant with usage difficulties on the website than on the ticket machine.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Note: the EDA data analysis will be done by another colleague and added to this conclusion later for the sake of completeness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Assessing user satisfaction of online and offline services<\/p>\n","protected":false},"author":1,"featured_media":210,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-7","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-project"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/posts\/7","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/comments?post=7"}],"version-history":[{"count":14,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/posts\/7\/revisions"}],"predecessor-version":[{"id":240,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/posts\/7\/revisions\/240"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/media\/210"}],"wp:attachment":[{"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/media?parent=7"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/categories?post=7"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tailinhares.tech\/index.php\/wp-json\/wp\/v2\/tags?post=7"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}