Stuck with a difficult assignment? No time to get your paper done? Feeling confused? If you’re looking for reliable and timely help for assignments, you’ve come to the right place. We promise 100% original, plagiarism-free papers custom-written for you. Yes, we write every assignment from scratch and it’s solely custom-made for you.
Order a Similar Paper Order a Different Paper
Step 1. Select, download and read 2 articles among below
Step 2. Prepare 1 page of review for each article
- No specific format
- Should include brief summary, discussion, insights, implications to you, etc.
- Prepare in regular document format (word, pdf).
Step 1. Select, download and read 2 articles among below Step 2. Prepare 1 page of review for each article No specific formatShould include brief summary, discussion, insights, implications to you, et
Contents lists available atScienceDirect International Journal of Medical Informatics journal homepage:www.elsevier.com/locate/ijmedinf Eye-tracking retrospective think-aloud as a novel approach for a usability evaluation Hwayoung Cho a,⁎ , Dakota Powell b, Adrienne Pichon b, Lisa M. Kuhns c,d , Robert Garofalo c,d , Rebecca Schnall b aCollege of Nursing, University of Florida, Gainesville, FL, United StatesbSchool of Nursing, Columbia University, New York, NY, United StatescDivision of Adolescent Medicine, Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, United StatesdDepartment of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States ARTICLE INFO Keywords: Eye movement measurements Eye movements Eye-tracking Mobile applications Mobile health Information technology Health IT Usability evaluation ABSTRACT Objective: To report on the use of an eye-tracking retrospective think-aloud for usability evaluation and to de- scribe its application in assessing the usability of a mobile health app. Materials and Methods: We used an eye-tracking retrospective think-aloud to evaluate the usability of an HIV prevention mobile app among 20 young men (15 –18 years) in New York City, NY; Birmingham, AL; and Chicago, IL. Task performance metrics, critical errors, a task completion rate per participant, and a task completion rate per task, were measured. Eye-tracking metrics including ﬁxation, saccades, time to ﬁrst ﬁxation, time spent, and revisits were measured and compared among participants with/without a critical error. Results: Using task performance analysis, we identi ﬁed 19 critical errors on four activities, and of those, two activities had a task completion rate of less than 78%. To better understand these usability issues, we thoroughly analyzed participants ’corresponding eye movements and verbal comments using an in-depth problem analysis. In areas of interest created for the activity with critical usability problems, there were signi ﬁcant di ﬀerences in time spent ( p= 0.008), revisits ( p= 0.004), and total numbers of ﬁxations ( p= 0.007) by participants with/ without a critical error. The overall mean score of perceived usability rated by the Health IT Usability Evaluation Scale was 4.64 ( SD= 0.33), re ﬂecting strong usability of the app. Discussion and Conclusion: An eye-tracking retrospective think-aloud enabled us to identify critical usability problems as well as gain an in-depth understanding of the usability issues related to interactions between end- users and the app. Findings from this study highlight the utility of an eye-tracking retrospective think-aloud in consumer health usability evaluation research. 1. Introduction With the rapid expansion of mobile technology in healthcare [ 1], it is crucial to ensure that mobile health (mHealth) technologies are usable [ 2]. Usability is a measure of the quality of an end-user ’s ex- perience when interacting with the technology [ 3]. Usability factors are closely linked to the success or failure of the technology as usability is related to the quality in use of the technology [ 4]. The ‘quality in use ’is the capability of the software product to enable speci ﬁed users to achieve speci ﬁed goals with e ﬀectiveness, productivity, safety and sa- tisfaction in speci ﬁed contexts of use [ 5,6]. To ensure quality in use of the technology, it is important to assess its usability during system development, which helps ensure that the system meets the needs of end-users [ 2,7,8]. In order to successfully achieve the goals of the system, it is critical to choose the most appropriate evaluation techniques which best meet the study aims during the system development process [ 9]. Usability evaluation methods are broadly classi ﬁed as expert-based usability testing methods such as a heuristic evaluation and a cognitive walk- through and end-user-based usability testing methods such as a think- aloud protocol, ﬁeld observation, interview, focus group, and ques- tionnaire [ 9–11]. With a particular focus on usability testing with in- tended end-users in this paper, traditional usability testing most com- monly uses a think-aloud protocol [ 10,12]. Think-aloud protocols are used to identify the cognitive behavior of performing tasks while using technology and determine how that information is used to facilitate https://doi.org/10.1016/j.ijmedinf.2019.07.010 Received 27 February 2019; Received in revised form 9 July 2019; Accepted 11 July 2019 ⁎Corresponding author at: University of Florida College of Nursing, 1225 Center Drive, PO Box 100197, Gaineville, FL 32610-0197, United States E-mail address: [email protected]ﬂ.edu (H. Cho). International Journal of Medical Informatics 129 (2019) 366–373 1386-5056/ © 2019 Elsevier B.V. All rights reserved. T problem resolution [10,13]. Think-aloud protocols are generally cate- gorized into concurrent and retrospective protocols. In a concurrent think-aloud protocol, users are asked to think and talk aloud at the same time while performing cognitive tasks; in a retrospective think- aloud protocol, users are asked to recall what they were thinking during a prior experience. Both concurrent and retrospective think-aloud protocols are popular approaches since they provide comprehensive insights into the problems that end-users encounter in their interaction with the system [ 14]. However, there are several limitations of the think-aloud protocol. The qualitative information provided by end- users are unstructured, and there are often gaps of silence where the end-users are thinking but not verbalizing, and as a result, some data collection is limited at those time [ 15]. Speciﬁc to adolescents, studies report that this age group is less likely to articulate their thought pro- cesses during a think-aloud protocol [ 16,17]. Findings from our past work suggest that a traditional think-aloud protocol to assess the us- ability of technology with adolescents may not provide su ﬃcient in- formation to identify usability problems [ 15,18]. To address this gap, eye-tracking technology can be used to assess usability of new technologies by illuminating the decision-making through the examination of eye movement patterns [ 19–21]. Eye- tracking is the process of measuring the point of gaze and/or the motion of an eye relative to the head, which has the potential to improve us- ability assessments by providing valuable ocular data. However, there is a paucity of research on how the eye-tracking method can be applied in usability testing of mHealth technology as a single rigorous usability evaluation method by achieving its full potential [ 22]. Prior use of eye- tracking has not standardized the use of this data making interpretation of eye-tracking data di ﬃcult [ 20–23]. The purpose of this paper is to report a novel methodological approach of an eye-tracking retro- spective think-aloud for usability evaluation, and to describe its appli- cation in assessing the usability of a mHealth app. 1.1. Study context This study was conducted as part of a larger study to adapt a group- based theory-driven, manualized HIV prevention curriculum for diverse sexual minority adolescents [ 24]. We adapted an evidence-based, group-level, face-to-face HIV prevention curriculum onto a mobile platform using an iterative design process [ 25–27]. The mobile app, the Male Youth Pursuing Education, Empowerment & Prevention around Sexuality (MyPEEPS App), delivers HIV prevention information through 21 activities which are comprised of: didactic content, gra- phical reports, videos, and true/false and multiple-choice quizzes. Upon completing each activity, users are rewarded with a stylized trophy, which is used to promote continued use of the app. A combination of usability evaluation techniques including usability experts as well as intended end-users is recommended [ 28,29]; therefore, we assessed the usability of the MyPEEPS App from both expert and end-user perspec- tives [ 30]. In this paper, we focused on the end-user usability testing utilizing an eye-tracking retrospective think-aloud. 2. Methods We conducted an eye-tracking retrospective think-aloud to evaluate the usability of the MyPEEPS App. The Institutional Review Board of Columbia University Medical Center served as the central IRB (#AAAQ6500) for this study and approved all research activities. 2.1. Sample Participants were recruited using ﬂ yers, posting on social media, and direct outreach at local community-based organizations in New York City, NY; Birmingham, AL; and Chicago, IL. Our sample was comprised of 20 young men since 95% of usability issues are identi ﬁed with 20 end-users [ 31]. Inclusion criteria were: 1) 13 to 18 years of age; 2) self-identi ﬁed as male; 3) male sex assigned at birth; 4) understand and read English; 5) living within the metropolitan area of one of the three cities listed above; 6) ownership of a smartphone; 7) sexual in- terest in men; and 8) self-reported HIV-negative or unknown status. Participants who wore bifocal/progressive glasses or who experienced eye surgery (e.g., corneal, cataract, intraocular implants) were excluded from participation since these types of glasses or eye conditions a ﬀect the precision of the gaze estimation while collecting participants ’eye movements [ 32]. 2.2. Procedures We explained the purpose of the study and study procedures to the participants who were then asked to sign an informed consent (18 years old) / assent (13 –17 years old) form. Participants were asked to sit down at a desk. The eye tracker (i.e., Tobii X2-30) was calibrated with a nine-point system where the participant watched a circle move across the screen and paused at each of nine ﬁxed points. With the moving calibration test, the measurement accuracy was provided within 0.5 degrees providing an error of less than 0.5 cm between measured and intended gaze points [ 32]. The resolution of the computer monitor was set to 1920*1080 pixels. First, participants were provided with use case scenarios of the MyPEEPS App and asked to complete the tasks using the app on an iOS simulator utilizing a Windows desktop computer. The ﬁrst half of par- ticipants were provided with use case scenario, version 1; the remaining half of the participants were provided with use case scenario, version 2. Two versions of use case scenarios were used in order to capture re- presentative tasks of the app (e.g., comics, animated videos, true/false questions, and multiple-choice quizzes). Activities which were neces- sary to navigate the app (e.g., log-in/out, set-up of pro ﬁle) and those activities which were di ﬃcult for the ﬁrst ten participants to complete were included in use case scenario version 2. The tasks associated with each of the use case scenarios are presented in Table 1. iMotions soft- ware was used to record participant ’s eye movements and the computer screen while performing each task [ 33], which allows researchers to present app screen recordings and synchronized eye-tracking data si- multaneously. Participants were allowed to ask questions before starting the app testing, but once testing began, we encouraged participants to complete all tasks by themselves. Participants were instructed not to turn to the researcher for assistance because a shift in visual focus increases the risk of losing eye-tracking data [ 22]. If participants had trouble and were unable to proceed, they were instructed to say ‘HELP ’. Following use of the app, participants were asked to describe their experience dealing with errors and their perception of their overall performance. Then participants viewed the recordings of their use of the app which depicted their eye movements overlaid on the app screen on a computer. Participants were asked to think-aloud and verbalize their thoughts about the tasks they completed and the di ﬃculties they encountered while using the app. Participant ’s verbal comments were audio-recorded. Following the testing of the app, participants were asked to rate usability of the MyPEEPS App using the Health Information Technology Usability Evaluation Scale (Health-ITUES). [ 34 ] Participants were compensated $40-50, depending on the geographic site, for their time. 2.3. Data collection Eye-tracking data were collected using Tobii X2-30 [ 35], which has a sampling rate of 30 Hz (i.e., 30 gaze points were collected per second for each eye), and saved into iMotions software [ 33].Table 2 lists the task performance metrics collected to capture usability problems by examining how capable participants were at using the MyPEEPS App on given tasks (i.e., a task completion rate was calculated in two ways: by participant and by task) [ 4,36], and the eye-tracking metrics collected H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 367 for an in-depth analysis of usability problems.All survey data were collected electronically using Qualtrics ®survey software [ 39]. Demographics and mobile technology use was assessed through (our research team-designed) questions on age, race, ethnicity, frequency of using mobile devices or laptop/desktop to access the In- ternet, and duration of using mobile apps on a smartphone. Data on perceived usability were collected using the Health-ITUES [ 34], a cus- tomizable questionnaire with a four-factor structure: system impact, perceived usefulness, perceived ease of use, and user control, and it has been validated for use with mHealth technology [ 40]. The Health- ITUES consists of 20 items rated on a ﬁve-point Likert scale from strongly disagree (1) to strongly agree (5). A higher scale value in- dicates higher perceived usability of the technology. Table 3lists the 20 items on the Health-ITUES and how they were customized for this study. 2.4. Data analysis Data analysis was based on the iMotions video-recordings of user sessions synchronized with eye movements, and transcriptions of par- ticipants ’verbal comments from the audio-recordings collected during the think-aloud. Two research team members reviewed the transcripts to identify common usability concerns, then a third reviewer consulted in instances of discrepancy. STATA SE 14 was used for analysis of de- scriptive statistics [ 41]. Data analysis focused on: 1) task performance analysisof task per- formance metrics, and 2) problem analysisof eye-tracking metrics and participants ’verbal comments. Since the average task completion rate in the literature (i.e., an analysis of nearly 1200 usability tasks) is 78% [ 42], any task with less than 78% of a task completion rate was iden- ti ﬁed as a problem. In the problem analysis, the eye-tracking metrics including time to ﬁrst ﬁxation, time spent, revisits, and total numbers of ﬁ xations were compared among participants with/without a critical error using a two-sample t-test. Level of signi ﬁcant was set as alpha less than 0.05. 3. Results 3.1. Sample The mean age of study participants was 17.4 years ( SD= 0.88; Table 1 Task included in use case scenarios. Task Use case scenario – version Log-in to the MyPEEPS App III Collect the trophy from activity #1 Set Up MyPEEPS Proﬁ leIII [Activity: Set-up] Introduction to the app explaining what the user is to expect. User inputs name, telephone number, e-mail address, and how they prefer to get notiﬁcations. Collect the trophy from activity #2 BottomLine III [Activity: Select from options] Users are asked the farthest they will go with a one-time hookup in a number of sexual scenarios and given a selection of responses about what they will and won ’t do and how they will do it. Collect the trophy from activity #3 Underwear Personality Quiz I [Activity: Sliders] Users complete a personality quiz and are introduced to the avatars that they will be seeing in the app. Avatars ’personality traits and identities are shared with ‘gossip ’. Collect the trophy from activity #4 My Bulls-I I [Activity: Text input] Users are asked to think about their important identity traits and create a list of their top ﬁve identity traits after seeing an example of the activity done by one of the app avatars, P. Collect the trophy from activity #5 P ’s On-Again O ﬀ-Again BottomLine I [Activity: Video, select from options] Video of a text conversation between two avatars, P and Nico, about P ’s new relationship and P ignoring his BottomLine. Users are asked to complete questions about why P should be concerned about his BottomLine with a new partner. There are two videos with two sets of questions. Collect the trophy from activity #6 Sexy Settings I [Activity: Select from options] Users are presented with a setting in which sex could be taking place and are given one potential threat to a BottomLine and asked to select another potential threat for the given setting. Collect the trophy from activity #7 Goin ’Downhill Fast I [Activity: Click through information, select from options] Users are presented with information about drugs and alcohol and how they can a ﬀect a BottomLine. Resources for additional information about drugs/alcohol are provided. After reading through the information, users complete a set of questions about drugs/alcohol ’s potential impact on their BottomLine. Collect the trophy from activity #8 Step Up, Step Back I [Activity: Select from options] Users are introduced to identity traits that may identify them as a VIP (privileged)/Non-VIP (non-privileged) and then asked a series of identity-related questions. An avatar representing the user moves back and forth in a line for a night club, relative to the avatars in the app, as questions are answered. Collect the trophy from activity #9 HIV True/False I [Activity: True/False button answer] Users complete a series of True/False questions related to HIV, with information following a correct answer. Collect the trophy from activity #10 Checking In On Your BottomLine I [Activity: Select from options] Users are given the opportunity to review and make changes to their BottomLine, taking into consideration any information that they may have learned from completing the activities prior to this check-in. Collect the trophy from activity #13 Well Hung I [Activity: Drag and drop] Users are introduced to the association of HIV transmission risk with diﬀerent sexual behaviors categorized into no risk, low, medium, and high risk. Users complete an activity dragging and dropping a given sexual activity onto the risk category associated with the sex act. Collect the trophy from activity #15 Checking In On Your BottomLine Again II [Activity: Select from options] Users are again given the opportunity to review and make changes to their BottomLine, taking into consideration any information that they may have learned from completing the activities prior to this check-in. Collect the trophy from activity #17 4 Ways To Manage Stigma II [Activity: Click through, select from options] Users are presented with four stigma management strategies, then a scene for each of the four app avatars and asked to answer which strategy each character is using in the scene. Table 1 (continued ) Task Use case scenario – version Collect the trophy from activity #18 Rubber Mishap I [Activity: Shaking select from options] Users are asked to complete a series of questions relating to condom usage as the screen shakes to mimic being under the in ﬂuence of drugs/ alcohol. Collect the trophy from activity #19 Get a Clue! II [Activity: Shake device situation builder] Jumbled scenarios are created using either a shake of the phone or press of a button. Users answer from given options how they would act in the scenario, keeping the BottomLine and communication strategies in mind. Collect the trophy from activity #20 Last Time Checking In On Your BottomLine II [Activity: Select from options] Users are again given the opportunity to review and make changes to their BottomLine, taking into consideration any information that they may have learned from completing the activities prior to this check-in. Collect the trophy from activity #21 BottomLine Overview II [Activity: View list of changes] Users are presented with a list of their BottomLine selections since the initial activity and subsequent check-ins. View settings III Log Out III H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 368 range = 15–18 years of age). 45% ( N= 9) of participants self-identi ﬁed as White, 20% ( N= 4) as African American, 10% ( N= 2) as Asian, and 45% ( N= 9) of participants self-identi ﬁed as Hispanic. 85% of parti- cipants ( N= 17) reported using Internet almost constantly every day. The majority of participants (85%) reported using mobile devices as opposed to using laptop/desktop (15%) to access the Internet. The mean duration of participants ’use of mobile apps on a smartphone per day was 9.40 h ( SD= 5.52). 3.2. Eye-Tracking retrospective think-aloud The visit took between 2 and 2.5 h. Before watching the recordings displaying their eye movements, participants described their experience dealing with errors and their perception of their task performance. More than half of participants who had di ﬃculty completing tasks (e.g., participants who said ‘HELP ’during the app testing) stated, ‘Everything was okay ’, ‘It was pretty easy’ ,or‘I didn ’t have any di ﬃculties ’until they viewed their eye movements on an app screen page where they en- countered the di ﬃculty. 3.2.1. Task performance analysis 18.104.22.168. Critical error . A total of 19 critical errors were identi ﬁed across four activities: #2 BottomLine, #5 P ’s On-Again O ﬀ-Again BottomLine, #8 Step Up, Step Back, and #13 Well Hung . The number of critical errors for the activities is presented in Table 4. 22.214.171.124. Task completion rate per participant. The percentage of tasks that were completed without a critical error by a participant ranged from 79% to 100%. Six participants successfully completed tasks without any critical error. 126.96.36.199. Task completion rate per task . The percentage of participants who completed each task without a critical error ranged from 45% to 100%. Two tasks had a task completion rate less than 78% [ 42]; in our study, the tasks related to the activities #2 BottomLine70% and#13 Well Hung 45%. 188.8.131.52. Summary of task performance analysis. There were two activities with critical errors, which were identi ﬁed through task performance analysis: #5 P ’s On-Again O ﬀ-Again BottomLine and#8 Step Up, Step Back . These two activities were reported by participants as a user error (e.g., they closed the app screen by mistake while they were Table 2 Task performance and eye-tracking metrics. Task performance metrics Critical error Number of critical errors (e.g., if a participant said ‘HELP ’during the app testing, it was considered a critical error in this study). Task completion rate per participant Percentage of tasks that were completed without a critical error by a participant. Task completion rate per task Percentage of participants who completed a given task without a critical error. Eye-tracking metrics Fixation Moments when the eyes are relatively stationary, indicating the moments when the brain is processing information received by the eyes. Theﬁxation generally ranges from 100 to 300 milliseconds. Longer ﬁxations on a speciﬁ c area reﬂect a participant ’sdi ﬃculty with information processing [ 22,37 ]. Saccades Rapid eye movement from one target to another between two consecutive ﬁxations [ 38]. Time to ﬁrst ﬁxation Amount of time it took a participant to look at a speci ﬁc area from stimulus onset [ 37]. Time spent Amount of time that a participant spent looking at a speci ﬁc area. Revisit Number of times that a participant repeatedly viewed a speci ﬁc area. Table 3 Health-ITUES (customized for this study). System Impact 1 MyPEEPS is a positive addition to my sexual health. 2 MyPEEPS helps me make safe decisions when it comes to sex and relationships. 3 MyPEEPS gives me the information and skills I need to avoid situations that make me uncomfortable and that put my sexual health at risk from HIV or other STIs. Perceived Usefulness 4 Using MyPEEPS makes it easier to make safer decisions about my sexual health. 5 Using MyPEEPS allows me to make safer decisions about my sexual health more quickly. 6 Using MyPEEPS makes me more likely to make safer decisions about my sexual health. 7 MyPEEPS is useful for making safer decisions about my sexual health. 8 I think MyPEEPS presents a more open-minded process for learning about my sexual health. 9 I am satis ﬁed with MyPEEPS for making safer decisions about my sexual health. 10 I make safer decisions about my sexual health in a timely manner because of MyPEEPS. 11 Using MyPEEPS lowers my risk of getting HIV. 12 I am able to ﬁnd the information I need about sexual health and HIV whenever I use MyPEEPS. Perceived Ease of Use 13 I am comfortable with my ability to use MyPEEPS. 14 Learning to operate MyPEEPS is easy for me. 15 I have the skills to use MyPEEPS. 16 I ﬁnd MyPEEPS easy to use. 17 I remember how to log on to and use MyPEEPS. User Control 18 MyPEEPS gives error messages that clearly tell me how to ﬁx problems. 19 Whenever I make a mistake using MyPEEPS, I recover easily and quickly. 20 The information (such as on-line help, on-screen messages and other documentation) provided with MyPEEPS is clear. Table 4 Critical error. Activity Critical errors #2 BottomLine 6 #5 P ’s On-Again O ﬀ-Again BottomLine 1 #8 Step Up, Step Back 1 #13 Well Hung 11 Total number of critical errors 19 H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 369 reading contents), and reviewed/determined as non-usability-related problems by two research team members and were excluded from problem analysis. There were also two activities with critical errors, which were identiﬁed with a task completion rate less than 78% via the task performance analysis: #2 BottomLineand#13 Well Hung. These two activities were thoroughly reviewed and analyzed using eye- tracking data and participants ’verbal comments, and included in the problem analysis. 3.2.2. Problem analysis 184.108.40.206. Problem 1. #2 BottomLine; navigating the map after completing the prior activity #1 . Task description: Within the MyPEEPS App, a total of 21 activities are displayed along a virtual ‘Map’ . One activity at a time on a smartphone screen is shown in consecutive order on the Map. User begins each activity by clicking the activity ’s number in a circle or name in a box. Upon completing each activity, the user is taken back to the Map showing the activity ’s number and name the user just completed. In order to navigate to the next activity, the user needs to scroll or swipe to the left on the Map. Problem description: Participants were confused about moving forward to the second activity #2 BottomLineon the Map after com- pleting the very ﬁrst activity #1 Set Up MyPEEPS Proﬁ lesince they expected to view the next activity by default. Quotations: “This is the part where I was confused. I didn ’t understand that I should move to the side. I didn ’t know. I kept clicking number one because I thought that was where I had to go and then it would just take me back …there should be instructions or something or like a hint …like arrows. ”[UMP07] “ I am trying to ﬁgure out how to… I think that would be really helpful if it just went automatically over. I meant I want to see the next one au- tomatically right after I completed the previous one. Otherwise, I cannot remember if I did or not. ”[UMP03] Gaze plots: Based on participants ’ﬁ xations and saccades, gaze plots depicting ﬁxation sequences were generated in conjunction with Problem 1 . The gaze plots were compared among participants with/ without a critical error. The number of ﬁxations on Problem 1ranged from 19 (without a critical error) to 200+ (with a critical error). A sample of gaze plots with/without a critical error is presented in Fig. 1( 1) and (2). 220.127.116.11. Problem 2. #13 well hung; Drag/drop response option . Taskdescription: User is introduced to the association of HIV transmission risk with di ﬀerent sexual behaviors categorized into ‘no risk ’, ‘low’ , ‘ medium ’, and ‘high risk ’. The user completes an activity (i.e., quizzes) dragging and dropping a given sexual activity onto the four risk categories associated with the sex act. For instance, user drags a card labeled with the sexual activity down to the corresponding risk level, then selects the ‘Next ’button to continue in the activity. In order to see the ‘Next ’button, the user needs to scroll down. Problem description: Participants were confused about the drag/ drop response option on the quizzes. Several participants tried to ﬁgure it out in a way of either of dragging the sexual activity card down to the risk category or dragging the risk category up to the sexual activity card since there was no feedback on whether their selected response was correct or incorrect unless they clicked the ‘Next ’button. Quotations: “Didn ’t I have to drag something? That ’s what was con- fusing. I felt it should have just been a click. Then, even I didn ’t know there was next button at the bottom. I couldn ’t move forward. ” [UMP005] “ I didn ’t know what to do. I ﬁgured it out but I didn ’t know if I had to click it or drag it. I don ’t know …I was expecting it to be like an empty line. I was expecting it just to be a line, empty, and then I would drag the answers into the clips. I was expecting the clothing line clips to be empty because you see how there are four and it says high, medium, low, or no risk …I was expecting it to be like a spectrum and I would drag the answers into the line depending where they fell. ”[UMP07] Heat maps: Heat maps are static aggregations of gaze ﬁxations re- vealing the distribution of visual attention, which represent where participants concentrated their gaze and how long they gazed at a given point in di ﬀerent colors [ 22,43]. Red areas on a heat map re ﬂect a high number of gaze ﬁxations, while yellow and green areas indicate fewer gaze ﬁxations. The heat maps were compared among app pages with/ without a critical error. For instance, a participant who successfully completed this task without di ﬃculty would see the ﬁrst given sexual activity card, drag the card down to a correct (risk level) answer ‘lower ’, and then click the ‘Next ’button. While several participants had these di ﬃ culties on the ﬁrst page of the quizzes, no one had the di ﬃculties on the remaining pages. For the reason, the heat maps for every page within the activity #13 Well Hungwere compared with the ﬁrst page. Fig. 2(1) depicts a heat map of the ﬁrst page without a critical error, while Fig. 2(2) depicts that of the page with a critical error. Areas of interest: Areas of interest refer to speci ﬁc areas in the Fig. 1. (1) Gaze plot without a critical error. (2) Gaze plot with a critical error. H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 370 interface that are of interest to researchers . Given that a partici- pant with a critical error on the ﬁrst page of quizzes did not experience any critical error on the remaining quiz pages, a total of eight areas of interest on the ﬁrst page of the quizzes were created to compare time to ﬁ rst ﬁxation, time spent, revisit, and total number of ﬁxations among participants with/without a critical error. On an area of interest, ‘lower ’ (i.e., the card corresponding to the correct answer for the ﬁrst quiz), there were signi ﬁcant di ﬀerences in time spent ( p= 0.008), revisits ( p = 0.004), and total number of ﬁxations ( p= 0.007) by participants with/without a critical error. 3.2.3. Perceived usability Perceived usability was rated using the Health-ITUES ( Table 5) [ 34]. The overall Health-ITUES score was the mean of all the items with each item weighted equally, and in this study it was 4.64 ( SD= 0.33) with a range of 3.80 –5.00, re ﬂecting strong usability of the app. 4. Discussion In this study we successfully used an eye-tracking retrospective think-aloud to conduct a comprehensive usability evaluation of a mHealth app with adolescents. We identi ﬁed two critical usability problems through participants ’eye movements and verbal comments. Eye-tracking data and qualitative data were integrated to provide a more holistic understanding of the usability issues with the mHealth app. Our methodological approach is brie ﬂy compared with a tradi- tional stand-alone usability testing method, a think-aloud only used in literature in Table 6[11, 44]. Our ﬁndings demonstrate the usefulness of an eye-tracking retro- spective think-aloud for usability evaluations. While a concurrent think- aloud is the predominant data collection method for traditional us- ability testing [ 45], it su ﬀers from some notable shortcomings such as distractions of end-users ’attention or negative e ﬀects on their natural task performance when incorporated with other techniques/technolo- gies [ 22,46– 49]. In contrast, an eye-tracking retrospective think-aloud allowed participants to avoid interferences during the usability eva- luation [ 22, 48, 49], which is an especially important strength of our methodological approach. Findings from this study showed that use of an eye-tracking retro- spective think-aloud for a usability evaluation allows participants to share their real-time experience using the app and stimulates verbal expression of their experience using the app which is more di ﬃcult to achieve using traditional stand-alone usability testing [ 14,15]. Our study participants had di ﬃculty successfully completing some tasks but could only explain these issues at a descriptive level until the partici- pants reviewed a recording of their task performance depicting their eye movements overlaid on the app screen on a computer. For example, we asked participants to describe their experience dealing with errors and their perception of their task performance right after they had completed tasks. Despite the di ﬃculties our participants encountered, more than half of the participants brie ﬂy commented, ‘Everything was okay ’, ‘It was pretty easy’ ,or‘I didn ’t have any di ﬃculties ’. On the other hand, while watching the screen-recording presenting participants ’ unusual eye movements, the participants expressed di ﬃculties during the app testing. This suggests that traditional stand-alone usability testing among youth may underestimate problems, re ﬂecting social desirability among some adolescents. By showing the screen-recordings with the eye-tracking data, we were able to explore participants ’reason (s) for their eye movements and their challenges using the app. A previous study on the usability of mHealth apps among adolescents reported di ﬃculty in capturing the adolescents ’verbalizations in a think-aloud protocol [ 15,18,50]. Other existing evidence also suggests that in think-aloud protocols, some adolescents did not clearly discuss their di ﬃculties ﬁnding a solution [ 46]. Therefore, our ﬁndings suggest Fig. 2. (1) Heat map without a critical error. (2) Heat map with a critical error. Table 5 Health-ITUES scores. Health-ITUES Construct Mean (SD)Median (range) System Impact 4.80 (0.38)5.00 (3.67-5.00) Perceived Usefulness 4.63 (0.40)4.78 (3.56-5.00) Perceived Ease of Use 4.76 (0.33)4.90 (4.00-5.00) User Control 4.28 (0.74)4.33 (2.00-5.00) Overall Health-ITUES Score 4.64 (0.33)4.75 (3.80-5.00) *Rating score from 1- worst to 5-best (20 items) H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 371 that it can be beneﬁcial to show eye-tracking data during a retro- spective think-aloud to elicit rich comments as well as usability pro- blem-related comments from adolescents. The application of a comprehensive usability evaluation method, an eye-tracking retrospective think-aloud, enabled us to gain a better un- derstanding of usability issues of a mobile app. In our study, the eye- tracking data was illustrated using gaze plots and aggregated heat maps in addition to areas of interest. In gaze plots tracing participants ’eye movements by representing the sequence of ﬁxation and saccades in the form of a scan path, the eye-tracking data were presented with circles and lines. By comparing the gaze plots among participants with/ without a critical error, we identi ﬁed a speci ﬁc app page within each activity. Heat maps aggregating the ﬁxations revealed which parts were most frequently looked at using colors. In the heat maps created for our study, the areas of each page of quizzes and its correct answer were displayed as a red color indicating participants ’high visual attention if there were no critical usability problems. Also, the eight areas of in- terest – a sexual activity card, four risk categories, ‘Prev ’button, ‘Next ’ button, and ‘Back ’button – created to compare eye-tracking metrics among participants with/without a critical error, showed signi ﬁcant di ﬀerences in time spent, revisits, and number of ﬁxations. Findings from our study demonstrate that eye-tracking data indicating di ﬀer- ences between participants with/without a critical error can help cap- ture usability problems where end-users cannot recognize the problems right away, monitor the speci ﬁc areas where they encountered di ﬃ- culties, and further make inferences about their actual cognitive pro- cesses by researchers. Our work highlights that the use of eye-tracking data can provide researchers a rich representation and an in-depth understanding of the end-users ’experience participating in usability testing. The eye-tracking retrospective think-aloud approach was time- consuming. For example, upon completing the tasks employing a use case scenario, participants were encouraged to think aloud retro- spectively and asked to verbalize their thoughts about the tasks while watching a recording of their use of the app that depicted their eye movements overlaid on the app screen. The process took a signi ﬁcant amount of time (i.e., between 2 and 2.5 h). Moreover, our approach using eye-tracking technology required additional time and researcher ’s technical and extensive analysis skills. The eye tracker (i.e., Tobii X2- 30) and software (i.e., iMotions) is very costly, which may limit others ability to access this technology. With the bene ﬁts from the method of an eye-tracking retrospective think-aloud, however, the use of the eye- tracking technology (device and software) is highly recommended for a usability evaluation. The Health-ITUES was used as a measure of usability, which has been validated for use with mHealth technology [ 40]. Although several usability problems were identi ﬁed in this study, the overall Health- ITUES mean score (i.e., mean of all 20 items; a higher score indicates higher perceived usability of the technology) was as high as 4.64 (5- best). Nearly 95% of teens in the US ages 13 –17 own or have access to a smartphone [ 51]. Given that 85% of our participants reported using mobile devices constantly every day, and the mean duration of their use of mobile apps per day was 9.40 h, the high usability score of the My- PEEPS App may be because our participants could easily resolve pro- blems while using the app, and/or quickly learn how to use the new app as they are largely heavy smartphone users. In our study, participants who had di ﬃculty on the ﬁrst page of quizzes no longer had di ﬃculty on the remaining pages within the same activity. End-users who per- ceive the mHealth app to be useful may be more likely to show an improvement in its impact in their everyday lives [ 52], which is an- other strength of our study. 4.1. Limitations The generalizability of the results may be limited by the study sample who live in the metropolitan areas of New York City, NY; Chicago, IL; and Birmingham, AL. Results may di ﬀer in other groups who live in rural areas. We employed an iOS simulator on the computer in order for the mobile app to be used in the same manner as on a smartphone, therefore there may be di ﬀerences in end-users ’experience when interacting with the app on a computer. Additionally, it was time- consuming to collect and analyze data through a retrospective think- aloud with an eye-tracking technique, as compared to a traditional stand-alone usability testing method. 5. Conclusions In this paper, we presented a methodological approach of an eye- tracking retrospective think-aloud and its application in evaluating the usability of an HIV prevention mobile app intended for diverse sexual minority young men. Our approach enabled us to identify critical us- ability problems as well as gain an in-depth understanding of the us- ability issues related to interactions between end-users and the MyPEEPS App. Findings from this study highlight the utility of an eye- tracking retrospective think-aloud to enhance end-user usability testing of a mHealth app. Our methodological approach may encourage other researchers who design/develop mHealth apps for adolescents to con- duct comprehensive usability evaluations in a collaborative manner utilizing an eye-tracking technique with high-skilled usability experts in future research. Declaration of Competing Interest The authors declare that they have no con ﬂicts of interest in the research. Table 6 Comparison with traditional stand-alone usability testing method. Think-aloud only (mostly concurrent)[11 44] Eye-tracking retrospective think-aloud Bene ﬁt Direct insights into end-users ’thoughts and strategies during the task performance Deep insights into end-users ’behavior related to the identi ﬁed usability problems Objective eye movements of the identi ﬁed usability problems Holistic understandings of the usability issues Measurement Time for task performance Time for task performance; critical error Eye ﬁxation; saccades; time to ﬁrst ﬁ xation; time spent; revisit Needed users 3+ 20+ Required users ’skills during testing (particularly for adolescents) High (unnatural/distracting/strenuous) to think and talk aloud at the same time) Low Low Required equipment Low (audio-recorder) Low (audio-recorder) High (eye-tracking device and software) Required time for data collection Medium High Low Required time for data analysis Medium Medium High Required expertise Medium High High H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 372 AcknowledgementsThis research was supported by the National Institute of Minority and Health Disparities of the National Institutes of Health (NIH) under award number U01MD11279 (MPI: RS and RG). The content is solely the responsibility of the authors and does not necessarily represent the o ﬃ cial views of the NIH. References  U.S. Food and Drug Administration, Mobile Medical Applications. Secondary Mobile Medical Applications, (2018) https://www.fda.gov/MedicalDevices/ DigitalHealth/MobileMedicalApplications/default.htm .  W. Brown 3rd, P.Y. Yen, M. Rojas, R. Schnall, Assessment of the Health IT Usability Evaluation Model (Health-ITUEM) for evaluating mobile health (mHealth) tech- nology, J. Biomed. Inform. 46 (6) (2013) 1080 –1087, https://doi.org/10.1016/j. jbi.2013.08.001 [published Online First: Epub Date]|.  A. Abran, A. Kheli ﬁ, W. Suryn, A. Se ﬀah, Usability Meanings and Interpretations in ISO Standards, Softw. Qual. J. 11 (4) (2003) 325 –338, https://doi.org/10.1023/ a:1025869312943 [published Online First: Epub Date]|.  J. Nielsen, Usability Engineering, Elsevier, 1994 .  ISO/IEC 9126, Software Engineerting- Product Quality, (2001) .  ISO 9241-11, Ergonomic Requirements for O ﬃce Work With Visual Display Terminals (VDTs) –Part 11: Guidance on Usability, (1998) .  R. Louho, M. Kallioja, P. Oittinen, Factors a ﬀecting the use of hybrid media ap- plications, Graphic arts in Finland 35 (3) (2006) 11 –21 .  P. Ziemba, J. W ątróbski, A. Karczmarczyk, J. Jankowski, W. Wolski, Integrated approach to e-commerce websites evaluation with the use of surveys and eye tracking based experiments. 2017, Federated Conference on Computer Science and Information Systems (FedCSIS) (2017) 1019 –1030, https://doi.org/10.15439/ 2017F320 [published Online First: Epub Date]|.  H. Cho, P.-Y. Yen, D. Dowding, J.A. Merrill, R. Schnall, A multi-level usability evaluation of mobile health applications: a case study, J. Biomed. Inform. 86 (2018) 79–89, https://doi.org/10.1016/j.jbi.2018.08.012 [published Online First: Epub Date]|.  M.W.M. Jaspers, A comparison of usability methods for testing interactive health technologies: methodological aspects and empirical evidence, Int. J. Med. Inform. 78 (5) (2009) 340 –353, https://doi.org/10.1016/j.ijmedinf.2008.10.002 [published Online First: Epub Date]|.  A. Holzinger, Usability Engineering Methods For Software Developers, (2005) .  M.J. Van Den Haak, M.D.T. De Jong, P.J. Schellens, Retrospective vs. Concurrent think-aloud protocols: testing the usability of an online library catalogue, Behav. Inf. Technol. 22 (5) (2003) 339 –351, https://doi.org/10.1080/0044929031000 [published Online First: Epub Date]|.  P.-Y. Yen, S. Bakken, A comparison of usability evaluation methods: heuristic evaluation versus end-user think-aloud protocol –an example from a web-based communication tool for nurse scheduling, AMIA Annual Symposium Proceedings, (2009), p. 714 .  Haak MJvd, Jong MDTd, Exploring two methods of usability testing: concurrent versus retrospective think-aloud protocols, IEEE International Professional Communication Conference (2003), https://doi.org/10.1109/IPCC.2003.1245501 [published Online First: Epub Date]|.  L. Cooke, E. Cuddihy, Using eye tracking to address limitations in think-aloud protocol. IPCC 2005, Proceedings. International Professional Communication Conference, 2005 (2005) 653 –658, https://doi.org/10.1109/IPCC.2005.1494236 [published Online First: Epub Date]|.  A. Donker, P. Markopoulos, A comparison of think-aloud, Questionnaires and Interviews for Testing Usability with Children (2002) 305 –316 .  G.H. Seng, The e ﬀects of think-aloud in a collaborative environment to improve comprehension of L2 texts, The reading matrix 7 (2) (2007) .  B. Sheehan, Y. Lee, M. Rodriguez, V. Tiase, R. Schnall, A comparison of usability factors of four mobile devices for accessing healthcare information by adolescents, Appl. Clin. Inform. 3 (4) (2012) 356 –366, https://doi.org/10.4338/aci-2012-06-ra- 0021 [published Online First: Epub Date]|.  L. Cooke, Is eye tracking the next step in usability testing? 2006, IEEE International Professional Communication Conference (2006) 236 –242, https://doi.org/10. 1109/IPCC.2006.320355 [published Online First: Epub Date]|.  R.J. Jacob, K.S. Karn, Eye Tracking in Human-computer Interaction and Usability Research: Ready to Deliver the Promises. The Mind ’s Eye, Elsevier, 2003, pp. 573 –605 .  L. Lorigo, M. Haridasan, H. Brynjarsdóttir, L. Xia, Eye tracking and online search: lessons learned and challenges ahead, J. Am. Soc. Inf. Sci. Technol. 59 (7) (2008) 1041–1052, https://doi.org/10.1002/asi.20794 [published Online First: Epub Date]|.  O. Asan, Y. Yang, Using eye trackers for usability evaluation of health information technology: a systematic literature review, JMIR Hum. Factors 2 (1) (2015) e5, https://doi.org/10.2196/humanfactors.4062 [published Online First: Epub Date]|.  E.M. Kok, H. Jarodzka, Before your very eyes: the value and limitations of eye tracking in medical education, Med. Educ. 51 (1) (2017) 114 –122, https://doi.org/ 10.1111/medu.13066 [published Online First: Epub Date]|.  M.A. Hidalgo, L.M. Kuhns, A.L. Hotton, A.K. Johnson, B. Mustanski, R. Garofalo, The MyPEEPS randomized controlled trial: a pilot of preliminary e ﬃcacy, feasi- bility, and acceptability of a group-level, HIV risk reduction intervention for young men who have sex with men, Arch. Sex. Behav. 44 (2) (2015) 475 –485, https://doi. org/10.1007/s10508-014-0347-6 [published Online First: Epub Date]|.  R. Schnall, L. Kuhns, M. Hidalgo, et al., Development of MyPEEPS mobile: a be- havioral health intervention for young men, Stud. Health Technol. Inform. 250 (2018) 31 .  R. Schnall, L. Kuhns, K. Bullock, et al., Participatory End-user feedback to update MyPEEPS: a theory-driven evidence based intervention for YMSM, American Public Health Association 2018 Annual Meeting & Expo, (2018) .  R.K.L. Schnall, M. Hidalgo, D. Powel, J. Thai, C. Pearson, S. Hirsh ﬁeld, J. Bruce, M. Ignacio, A. Radix, U. Belkind, R. Garofalo, Adaptation of a group-based, HIV risk reduction intervention to a mobile app for young sexual minority men, Aids Educ. Prev. 30 (6) (2018) .  C. Rusu, S. Roncagliolo, V. Rusu, C. Collazos, A Methodology to Establish Usability Heuristics, (2011) .  A. Solano, C.A. Collazos, C. Rusu, H.M. Fardoun, Combinations of methods for collaborative evaluation of the usability of interactive software systems, Advances in Human-Computer Interaction 2016 (2016) .  H. Cho, D. Powell, A. Pichon, et al., A mobile health intervention for HIV prevention among racially and ethnically diverse young men: usability evaluation, JMIR Mhealth Uhealth 6 (9) (2018) e11450, , https://doi.org/10.2196/11450[published Online First: Epub Date]|.  L. Faulkner, Beyond the ﬁve-user assumption: bene ﬁts of increased sample sizes in usability testing, Behav. Res. Methods Instrum. Comput. 35 (3) (2003) 379 –383 .  Tobii Technology I. Tobii Technology, Stockholm, Sweden. Secondary Tobii Technology, Stockholm, Sweden, (2016) http://www.tobiipro.com/product- listing/tobii-pro-x2-60/ .  iMotions Biometric Research Platform 6.0, iMotions A/S, Copenhagen, Denmark. [program], (2016) .  P.Y. Yen, D. Wantland, S. Bakken, AMIA Symposium 2010Development of a Customizable Health IT Usability Evaluation Scale. AMIA …Annual Symposium Proceedings.2010, Development of a Customizable Health IT Usability Evaluation Scale. AMIA …Annual Symposium Proceedings. (2010) 917 –921 .  iMotions. Tobii X2-30. Secondary Tobii X2-30. https://imotions.com/tobii-x2-30/.  A.L. Russ, J.J. Saleem, Ten factors to consider when developing usability scenarios and tasks for health information technology, J. Biomed. Inform. 78 (2018) 123–133, https://doi.org/10.1016/j.jbi.2018.01.001 [published Online First: Epub Date]|.  A. Poole, L. Ball, Eye tracking in human-computer interaction and usability re- search: current Status and future prospects, in: C. Ghaoui (Ed.), Encyclopedia of Human Computer Interaction: IGI Global, 2005 .  M.-L. Lai, M.-J. Tsai, F.-Y. Yang, et al., A review of using eye-tracking technology in exploring learning from 2000 to 2012, Educ. Res. Rev. 10 (2013) 90 –115, https:// doi.org/10.1016/j.edurev.2013.10.001 [published Online First: Epub Date]|.  Qualtrics, Provo, Utah, USA [program], (2005) .  R. Schnall, H. Cho, J. Liu, Health information technology usability evaluation scale (Health-ITUES) for usability assessment of mobile health technology: validation study, JMIR Mhealth Uhealth 6 (1) (2018) e4, https://doi.org/10.2196/mhealth. 8851 [published Online First: Epub Date]|.  StataCorp, Stata Statistical Software: Release 14, StataCorp LP, College Station, TX, 2015.  Je ﬀSauro, What Is A Good Task-Completion Rate? Secondary What Is A Good Task- Completion Rate? (2011) https://measuringu.com/task-completion/ .  O.Špakov, D. Miniotas, Visualization of eye gaze data using heat maps, Electron. Electr. Eng. (2007) 55 –58 .  A. Fernandez, E. Insfran, S. Abrahão, Usability evaluation methods for the web: a systematic mapping study, Inf. Softw. Technol. 53 (8) (2011) 789 –817, https://doi. org/10.1016/j.infsof.2011.02.007 [published Online First: Epub Date]|.  J. Nielsen, T. Clemmensen, C. Yssing, Getting access to what goes on in people ’s heads?: Reﬂ ections on the think-aloud technique, Proceedings of the Second Nordic Conference on Human-Computer Interaction (2002) 101 –110 .  J.L. Branch, Investigating the information-seeking processes of adolescents: the value of using think alouds and think afters, Libr. Inf. Sci. Res. 22 (4) (2000) 371–392 .  J. Preece, Y. Rogers, H. Sharp, Interaction Design: Beyond Human-computer Interaction, John Wiley & Sons, 2015 .  J.B. Bavelas, L. Coates, T. Johnson, Listener responses as a collaborative process: the role of gaze, J. Commun. 52 (3) (2002) 566 –580, https://doi.org/10.1111/j.1460- 2466.2002.tb02562.x [published Online First: Epub Date]|.  S. Elling, L. Lentz, Md. Jong, Retrospective think-aloud method: using eye move- ments as an extra cue for participants’ verbalizations, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2011) 1161 –1170 .  K.S. Karn, S. Ellis, C. Juliano, The Hunt for Usability: Tracking Eye Movements. CHI ’ 99 Extended Abstracts on Human Factors in Computing Systems, ACM, Pittsburgh, Pennsylvania, 1999 173-73 .  PewResearchCenter, Teens, Social Media & Technology 2018, (2018) .  L. Casaló, C. Flavián, M. Guinalíu, The role of perceived usability, reputation, sa- tisfaction and consumer familiarity on the website loyalty formation process, Comput. Human Behav. 24 (2) (2008) 325 –345, https://doi.org/10.1016/j.chb. 2007.01.017 [published Online First: Epub Date]|. H. Cho, et al. International Journal of Medica l Informatics 129 (2019) 366–373 373
Step 1. Select, download and read 2 articles among below Step 2. Prepare 1 page of review for each article No specific formatShould include brief summary, discussion, insights, implications to you, et
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=ucis20 Journal of Computer Information Systems ISSN: 0887-4417 (Print) 2380-2057 (Online) Journal homepage: https://www.tandfonline.com/loi/ucis20 Usability evaluation of menu interfaces for smartwatches Kyeongjin Park, Meuel Jeong & Kyungdoh Kim To cite this article: Kyeongjin Park, Meuel Jeong & Kyungdoh Kim (2020) Usability evaluation of menu interfaces for smartwatches, Journal of Computer Information Systems, 60:2, 156-165, DOI: 10.1080/08874417.2018.1425644 To link to this article: https://doi.org/10.1080/08874417.2018.1425644 Published online: 01 Mar 2018.Submit your article to this journal Article views: 464View related articles View Crossmark dataCiting articles: 2 View citing articles Usability evaluation of menu interfaces for smartwatches Kyeongjin Park, Meuel Jeong, and Kyungdoh Kim Department of Industrial Engineering, Hongik University, Seoul, Korea ABSTRACTCurrently, there are various smartwatch products on the market, and the number of users is expected to continue to increase. The functions of smartwatches have also been diversified, and the amount of information displayed on their screens is also increasing. However, as there are many restrictions on the rather small screen size, it is difficult to apply the methods used to provide information on the existing smart devices. Therefore, this study investigated how menu should be provided on the touch screen of smartwatches and derived a more effective menu structure. For our purposes, we conducted twoexperiments. In the first experiment, we provided 40 items in grid view and list view layout styles and tried to search for a given item by scrolling or paging. Efficiency was higher for the list view layout, in which many items were displayed on the screen and task completion time was shortened. However, the overall satisfaction was higher for the grid view layout, in which fewer items were displayed on the screen. In the second experiment, we derived an efficient menu structure when displaying hierarchicalitems that could be grouped into upper and lower categories. Likewise, many items on a single screen were excellent in terms of task completion time and efficiency. Providing depth of menu by categoriza- tion showed satisfactory results in task completion time, efficiency, and overall satisfaction. KEYWORDSSmartwatch; small screen; usability; menu structure Introduction In the 1960s, the concept of wearable computers began to emerge. At first, only simple functions such as calculations were incorporated; however, this functionality has been evol- ving constantly. The number of products has also been increasing every year. The McKinsey Global Institute 1esti- mates that wearable computers can generate more than $4 trillion in value when interoperating between IoT applica- tions. Typical wearable devices are smartwatches, which have various advantages owing to their portability and posi- tive market prospects. A variety of products are currently available, and the number of smartwatch users is expected to grow continuously. 2 Smartwatches are wearable smart phones in the form of wrist watches having small monitors, which can perform a variety of functions, including connecting to the Internet. 3 Similar to wrist watches, these provide user friendliness. 4 Smartwatches are often connected to peripherals to enable active interaction with devices. 5They also provide informa- tion in the background and can provide additional informa- tion because they have sensors attached. 4,6It is possible to continuously integrate computer processing through smart- watches and process the desired information without restric- tion of place and time. 7Smartwatches also have the advantage of freeing users ’hands. 8These features make smartwatches suitable as wearable computing devices. 9A study by the Consumer Electronics Association also found that wrists are the most suitable body parts to wear wearable devices. 10 However, smartwatches have a limited interface size compared to other smart devices, such as smart phones. Small displays introduce many limitations when using smartwatches. As functions on mobile devices have become more diverse, user interfaces (UI) provided in mobile environments have been actively studied, because smaller screens are not suitable for incorporating the UI format of desktop machines. 11 It is not practical to simply apply the interface used on the existing desktop machines to devices with smaller screens without considering the display differences. 12 There is a limit to applying the existing interface to the screen of a smartwatch, which is relatively smallerthanthatofasmartphone.Therefore,smartwatch manufacturers should design the interface by taking into consideration factors such as the icons and layout suitable for the small screen. 13 Currently, various functions have been introduced in smartwatch es, but the information that can be displayed on the screen is limited. 14 Despite this situation, there is a dearth of research on the menu inter- face of smartwatches. This study attempts to derive a method to provide a menu that enables efficient information search on the lim- ited size of the touch screen of smartwatches. Through a comparative study of list view and grid view layout styles, the number of items displayed on a single screen, and the number of items provided in one row, we desire to find a waytoprovideaneffectivemenuonthesmalltouchscreen of smart watches. CONTACT Kyungdoh Kim [email protected] Hongik University, 94 Wausan-ro, Mapo-gu, Seoul 121-791, Korea. Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/ucis . JOURNAL OF COMPUTER INFORMATION SYSTEMS2020, VOL. 60, NO. 2, 156 –165 https://doi.org/10.1080/08874417.2018.1425644 © 2018 International Association for Computer Information Systems Related works With the proliferation of smart phones and the development of wireless Internet, people started to use various functions such as games, online shopping, etc. on their mobile phones. 15 Chae and Kim 16 derived three characteristics of mobile Internet through comparison with the Internet, and one of them was a small screen. Due to the small screen size, there is a limit to the information provided on the Internet. Smartwatches are smart devices with a very small screen that can be worn on the wrist and have screen constraints to provide various functions. 14 In particular, it is an obscure interface for finger manipulation because of the smaller screen size than traditional smart phones. 17 Therefore, it is essential to study how to provide menus on a small screen such as smartwatches. The effect of smartwatches ’display format and character arrangement for readability and preference was studied. 18 Although the outline form of the display did not affect read- ability, text aligned with straight lines and curves was more readable than text with diagonal lines. Additionally, the combi- nation of individual sentences was more readable than when sentences were presented as paragraphs, but the reading time was slow. Text readability by display ratio and font size was studied. 19 As a result, they showed that the readability is best with aspect ratio of 16:9 and 9 pt. in circle shaped smartwatches. Smartwatches ’virtual keyboard layout that allows efficient text entry in situations where there is an increasing need for manip- ulation on smartwatches was presented. 20In the situation where all the contents of the keyboard cannot be contained, the com- bination of the alphabet provided with the group and various control keys suggest a virtual keyboard suitable for smart- watches. The performance and satisfaction of task performance of elderly users according to the number of icons and the spacing of the arrangements were studied. 21 As a result, fewer icons were better, and four icon menus were most suitable on a single screen. The icon menu for single line grid view, two-line grid view, space rotation type, and honeycomb shapes was compared. 13 The task completion time of the menu structure provided in two lines was the fastest. It was found that the speed and readability of the user are influenced by the arrangement of the icons and the spacing between the icons. Plaumann, Müller, and Rukzio 22 presented an item on list view interface in the clock screen and experimented on how to locate it. On a small screen such as a clock, many items resulted in a higher percen- tage of participants failing the task. However, the study was conducted only on the list view interface, and other factors that make up the interface were insignificant. As a result of the literature survey, smartwatches showed a very different result of satisfaction and task completion according to minute changes to font size, the way of providing information, and the aspect ratio due to the small screen. These suggest that the type of information, layout, and interaction means provided by the existing smart devices should be changed to fit small screen characteristics. Menu type There are two types (list and grid view) of interface representa- tion. The list view is a simple representation of information in one dimension, and the grid view is a representation of infor- mation horizontally and vertically in a two-dimensional layout representation. 23 The list view is a series of items arranged in one direction, and the user searches the items to be searched one-dimensionally. The list view menu improves search effi- ciency because there is little variance in the line of sight move- ment during navigation. By contrast, the grid view menu is provided as a two-dimensional array of width and height. The user can quickly access a lot of items and can easily view the entire screen of the provided menu. 24 Depending on the menu type (list view or grid view), the user is affected in various ways. First, in the web environment, when the same amount of information is provided as a grid view, the user has a good evaluation of the reliability of the received information com- pared to when it is provided as a list view. 25 In the study by Salomon, 26conducted in the mobile environments, users could feel less cognitive load when presented with information as a list view and icon. List view and grid view menus on mobile screens provided during a simulation were also compared. 27The parti- cipants in the experiment shortened the task completion time in list view menus rather than grid view menus. It showed higher satisfaction and faster task completion time when providing grid view menu that shows all information at once than list view menu of text type on iPhone. 28Thus, it can be seen that the provided menu type affects the user ’staskcompletiontimeand subjective evaluation, and the result depends on the usage environment (see Table 1 ). Paging, depth, and scroll The smaller screen has a limitation in displaying the informa- tion to be provided on a single screen. Therefore, information is provided by scrolling or paging method in which the screen is divided into appropriate pages. Scrolling provides informa- tion that goes beyond the size of the screen, one row at a time, allowing the user to navigate continuously and incrementally. When much information is provided, paging and scrolling are useful. 11 If the user cannot find the desired menu or informa- tion on the provided screen, the user can navigate through the page conversion. 29 When information was provided on a small screen, it was more of a screen transition than on a large screen. 30 Four types of navigation techniques including scrolling and paging were surveyed. 30 As a result, the time tends to be longer when it is provided via scrolling. Menus Table 1. Menu type studies. Literature Year Findings Devices 27 2009 The participants shortened the task completion time in list view menus rather than grid view menus in driving simulation. Mobile phone 25 2010 When the same amount of information is provided as a grid view, the user had a good evaluation of the reliability of the received information compared to when it is provided as a list view. PC 28 2013 Higher satisfaction and faster task completion time when providing grid view menu than listview menu were shown. Mobilephone 26 2014 Users could feel less cognitive load when presented with information as a list view andicon. Mobile phone JOURNAL OF COMPUTER INFORMATION SYSTEMS 157 using scrolling generated more visual fatigue than menus using paging. 31 In previous studies, user performance and subjective measurement were highly evaluated when pre- sented as paging rather than by scrolling. However, there were some conflicting conclusions. For example, it showed that menus using paging were less popular than menu using scrolling. 29 In the past, scrolling was a factor that hindered usability. However, nowadays, as users have more experience using the internet, they have become accustomed to the scrol- ling method and are being used variously. 32This means that it is necessary to derive a suitable method for the user of very small screen devices where the use of paging and scrolling is inevitable. When items are bundled in the same category, there is a method for structuring menus by defining information as a superordinate concept and a subordinate concept. There is a method for presenting a menu indicating the category of infor- mation and presenting it through several steps. This step is called depth when you are going through several steps. When the information is composed on a screen using a wide range, its size is called “breadth, ”and many studies on proper breadth and depth have been conducted to efficiently provide the given information. Users prefer menus with wider breadth to menus with more depth. 33,34 Jacko and Salvendy 35 also argued that structures with much depth are complicated and inconvenient to users. At this time, if all contents cannot be provided on a single screen due to the limitation of the screen, information is provided by using the scroll. 30 However, a lot scrolling is perceived to the user as a “deeper ”depth. 36 Dawkins 37 argues that scrolling should be avoided as much as possible. Thus, more depth is preferred over wide breadth when a lot of cursor movement is required. 38 This is in contrast to previous studies that favor wide breadth over depth. Table 2 shows a summary of paging, depth, and scroll studies. Touch screen Smartwatches are provided as a touch screen, and the menu should be designed considering the size limitation of these screens. The touch interface is very intuitive because it allows the devices to be operated without a separate controller or input device, and the input and output occur at the same time. 39 Additionally, unlike a display that simply presents information, the size of the item to be touched, the item spacing, and the like must be considered in the design of a touch interface. 40 The optimal size of the touch key is 9.2 mm to 9.6 mm 41; however, they experimented on a large screen and their conclusion is difficult to apply to smartwatches which have relatively smaller screens than smart phones. Schedlbauer 42 suggested that the spacing of touch keys does not affect usability, but Kim, Kim, Lim, and Jung 43 recommended maintaining a gap of at least 1 mm. In addition, Parhi, Karlson, and Benderson 41 suggested that the minimum touch key size perceived by a user is 2 mm to 3mm. Table 3 summarizes previous touch screen studies. Therefore, this study investigates the effect of a menu struc- ture used to provide information to users of smartwatches derives an efficient menu structure. This study is conducted through two experiments. In experiment 1, when items of the same type are given, we try to find out which menu structure is good on the small screen of smartwatches. That is, we provide a total of 40 items as grid view and list view menu types, and search for a given item using scrolling or paging. In experiment 2, we want to derive an efficient menu structure when hierarch- ical items are grouped into upper and lower categories. A total of 4,096 items is divided into upper and lower categories and provided step by step. To search an item, a top-down menu structure is formed in which an upper item category is first selected. Experiments to compare the menu type and the depth of the information structure provided differently. As a result, the two experiments show how to provide an efficient menu in smartwatches. Experiment 1 Experiment 1 evaluates the effect of the menu type (the list view, the grid view), navigation method (Paging, Scrolling) and the number of items displayed on a single screen (4, 9) on task completion time and subjective measurement in smartwatches. Experiment 1 provides 40 fruit items, and the participants per- form a task to search, on a given menu, for a specific fruit. Methods Experiment environment Experiments were conducted on a 1.6 × 1.6 inch screen (screen size of about 40 mm × 40 mm). This is the size of Samsung ’s Galaxy Gear, Gear 2 and Gear 2 Neo, smartwatches from Sony, Apple watch, and many other smartwatch pro- ducts. In this experiment, we made our application using Table 2. Paging, depth, and scroll studies. Literature Year Findings Devices 33 1981 Users prefer menus with wider breadth to menus with more depth. Mini PC 34 1986 Users prefer menus with wider breadth to menus with more depth. PC 31 1988 Menus using scrolling generated more visual fatigue than menus using paging. PC 30 1990 When information was provided on a small screen, it was more of a screen transition than on a large screen. Micro-computer 35 1995 Much depth is complicated and inconvenient to users. PC 32 1997 As Internet usage increases, users become accustomed to scrolling. PC 11 1999 When much information is provided, paging and scrolling are useful. PC 38 2004 More depth is preferred over wide breadth when a lot of cursor movement is required. Handhelddevice 37 2007 Scrolling should be avoided as much as possible. Mobile phone 29 2009 On online survey, time tends to be longer when it is provided via scrolling. PC Table 3. Touch screen studies. Literature Year Findings Devices 41 2006 The optimal size of the touch key is 9.2 mm to 9.6 mm. PocketPC 42 2007 The spacing of touch keys does not affect usability. PC 43 2012 The gap of touch key space maintaining at least 1 mm was recommended. Mobile phone 158 K. PARK ET AL. Android studio. We built a smartwatch screen on Samsung Galaxy Note 5 and set it to be disabled except for limited screens. Considering the use of smartwatches, the experiment was conducted by fixing the screen to the wrist position through the manufactured band. Subjects This experiment was conducted for students at Hongik University. A total of 40 students were recruited through online and offline recruiting. All participants had no visual problems and had experience using touch screens. In parti- cular, two of them had experience using smartwatches. The participants were composed of 29 males and 11 females. The mean age was 24.3 years, and the standard deviation was 1.88. The experiments were carried out in an independent space, and one person at a time. The experiment time took no longer than 30 minutes, and a compensation of 5,000 won was given. Independent variables The menu type indicates the form of the menu to be pre- sented. In this experiment, two menu types are provided: list view and grid view. The corresponding items in the list view and grid view types are provided in the form of text (Figure 1 ). Because the user can influence the cognitive level of the corresponding item to navigate the menu, the items provided for the exact comparison of the two menu types are limited to text. The number of paging refers to the number of screens divided by 40 items. Thus, Paging 1 provides all items in one breadth. A total of four levels (one to four) is provided per paging. For example, Paging 3 means that 40 items are divided into three screens. To move between screens, a “Next ” and “Back ”touch screen buttons are shown at the bottom of the screen. For many items per breadth, one must scroll through the items that are not shown on a single screen. On the contrary, if there are few items per breadth, the items are divided into several pages, so that one has to move the screen with the Next button. The number of items means the number of items shown on a single screen ( Figure 1 ). In the case of smartwatches, it is necessary to consider both characteristics of small screen size and touch screen image. Therefore, the number of items that can be provided on a single screen is limited. In this study, we set the number of items displayed on a single screen at two levels: four and nine. The grid view is provided in 2 × 2 and 3 × 3 arrays, and the items are all of the same size and at the same level. The list view is provided with the same height length of menus, and in the case of level nine, the height is very small, i.e., 3.3 mm. If it exceeds level nine, it becomes smaller than the minimum perceptible size of 3 mm. 41 Dependent variables As an objective measure, we used task completion time to find agiven item. Task completion time means the time it takes to find an item from the start of the search on the presented screen to the touch of the target item. The unit of measure is second. In addition to the scroll operation and button click for searching, we measured the frequency of errors, which means unnecessary touch or wrong touch frequency. As a subjective measure, Nielsen and Landauer 44 used the efficiency and overall satisfaction as a usability evaluation measure of UI (User Interface) design. Efficiency is a question of whether it was easy to navigate to the desired item. The overall satisfaction is a question of whether the method pro- viding the menu is satisfactory. All of the measurements were collected with a 7-point Likert scale. Experiment design and experiment procedure This experiment provides the same level of information (40 items) in different menu types. 40 items were found in Plaumann et al. 22 participants experienced only one form of grid view or list view presented, and for the remaining inde- pendent variables, the nested factorial design was given as the within subject factor. This experiment experiences menu type, the number of items and four types of paging displayed on a single screen. A total of eight environments (1 × 2 × 4) were tested. Eight experiments were given randomly. We measured the time to search for three items in each of the eight envir- onments, measured the time to search for three items in different positions, and used the average value for analysis. The search order of items at three different locations was provided randomly. After three times of measurement in one situation, subjective evaluation of the method providing the menu was conducted through a questionnaire. To mea- sure the number of error touches, all experimental conditions were recorded. Participants were given plenty of practice opportunities to become familiar with menu navigation and manipulation. The items to be searched were placed on the notebook screen in front of the experiment, and the time is measured when the start of the search is announced. When the participant touched the target item, the measurement is completed once. Figure 2 shows a participant experimenting using the smartwatches menu. (a) The list view (b) The grid view Figure 1. Menu type and the number of items in a screen. JOURNAL OF COMPUTER INFORMATION SYSTEMS 159 Results The collected data were from 40 people, and they were used for analysis without any missing values. A three-way ANOVA test was conducted. Objective measures The difference in task completion time by the menu type am ong the main factors was not statistically significant (F1;282ðÞ ¼ :36;p¼ :549). If the amount of information dis- played on the small screen of smartwatches is at the same level, the menu type does not affect the task completion time of the user. However, there was a difference due to the number of items displayed on the screen ( F3;282ðÞ ¼ 33:60;p< :001). The participants in the experiment executed the task quicker when relatively many items ( M = 9.2, SD = 3.16) were shown on a single screen rather than relatively few items ( M = 11.4, SD = 3.99). There was also a difference in task completion time due to the number of paging to switch ( F3;282ðÞ ¼ 8:84;p< :001). Tukey ’s HSD test showed the slowest completion time in Paging 3 ( Figure 3 ). Paging 3 occurs when scroll occurs at a certain point and movement between screens occurs twice at most. There was no interaction between the number of paging and the number of items displayed on a single screen (F3;282ðÞ ¼ :86;p¼ :449). But, there was an interaction between the menu type and the number of items displayed on a single screen ( F3;282ðÞ ¼ 4:83;p¼ :029 ;Figure 4-A ). On the one hand, the grid view type ( M = 10.9, SD = 3.94) tends to be faster than the list view type ( M = 11.9, SD = 4.78) when the number of items shown on a single screen is four ðF1;158ðÞ ¼ 2:88;p¼ :092). On the other hand, there was no difference between list view type ( M = 8.9, SD = 3.29) and grid view type ( M = 9.5, SD = 2.45) when the number of items shown was nine ( F1;158ðÞ ¼ 1:48;p¼ :225). Also, there was an interaction effect between menu type and number of paging (F3;282ðÞ ¼ 2:71;p¼ :045 ;Figure 4-B ). The grid view type (M = 11.0, SD = 3.34) showed a tendency to be faster than the list view type ( M = 12.9, SD = 5.75) in the case of Paging 3ðF1;78ðÞ ¼ 3:51;p¼ :065). Finally, there was a three-way interaction effect between menu type, number of paging, and number of items shown on a single screen for task completion time (F2;282ðÞ ¼ 3:09;p¼ :027). The difference in task completion times between the grid view ( M = 11.2, SD = 3.24) and the list view ( M = 15.9, SD = 6.25) was noticeable in the Paging 3 environment where four items were displayed on a single screen ( F1;36ðÞ ¼ 8:29;p¼ :006). Additionally, the number of errors was measured. The total number of errors measured was 110, and most of them were Figure 2. Snap shot of the experiment. Figure 3. Results for the Tukey HSD test between paging numbers on comple- tion time. Figure 4. Interaction effect plot of task completion time (seconds) as a function of(a) the number of items and menu type and (b) the number of paging andmenu type in Experiment 1. 160 K. PARK ET AL. observed in list view type (89 times) in which many items were displayed on a single screen. Subjective ratings Only the number of items displayed on a single screen among the main factors affected the efficiency (F3;282ðÞ ¼ :35;p¼ :793). The experiments showed a high efficiency in a situation where relatively many items were displayed on a single screen ( M = 4.9, SD = 1.58) than the situation where relatively few items were displayed on a single screen ( M = 4.3, SD = 1.57). However, there was no difference in efficiency by menu type ( F1;282ðÞ ¼ :56;p¼ :454), and the number of paging ( F3;282ðÞ ¼ :35;p¼ :793). There was an interaction in the efficiency between menu type and the number of items shown on a single screen (F1;282ðÞ ¼ 6:80;p¼ :010 ; Figure 5 ). When nine items were displayed on a single screen, the efficiency of the list view type was higher than that of the grid view type, and the statistically significant difference is shown in Figure 5 (F1;158ðÞ ¼ 5:7;p¼ :018). However, there was no statistically significant difference in efficiency for grid view and list view type when four items were shown on a single screen ( F1;158ðÞ ¼ :1:71;p¼ :193). There was no interaction effect between menu type and the number of paging ( F1;282ðÞ ¼ :38;p¼ :766). In addition, there was no interaction effect between the number of paging and the number of items displayed on a single screen ( F1;282ðÞ ¼ :06;p¼ :981). In efficiency, the three-way interaction effect on menu type, the number of paging, and the number of items dis- played on a single screen was statistically significant (F3;282ðÞ ¼ 2:93;p¼ :034 Þ. When relatively few items were shown, the efficiency of Paging 3 was the most negative in the list view type ( M = 3.6, SD = 2.46), with a large difference from the grid view type ( M = 4.7, SD = 2.77) (F1;38ðÞ ¼ 4:22;p¼ :046). On the contrary, when relatively many items are displayed on a single screen, efficiency eva- luation of Paging 3 was significantly lower in the list view (M = 5.4, SD = 1.39), with a large difference from the grid view type ( M = 4.2, SD = 1.81) ( F1;38ðÞ ¼ 5:98;p¼ :019). The difference in the number of items displayed on a single screen among the main factors that affect the overall satisfac- tion was statistically significant ( F1;282ðÞ ¼ 8:52;p¼ :005). High overall satisfaction was obtained when relatively few items ( M = 4.5, SD = 1.70) were shown on a single screen rather than many items ( M = 3.9, SD = 1.69) on a single screen. On the other hand, there was no difference in the overall satisfaction with the menu type (F1;282ðÞ ¼ 1:33;p¼ :249) and the number of paging times ( F1;282ðÞ ¼ 1:36;p¼ :255) There was an interaction in the overall satisfaction between menu type and the number of items shown on a single screen (F1;282ðÞ ¼ 23:88;p<:001 ;Figure 6 ). When relatively few items are shown on a single screen, the overall satisfaction of the grid view was higher than list view (F1;158ðÞ ¼ 6:80;p¼ :010). When relatively many items are displayed on a single screen, the overall satisfaction of the list view was measured higher than that of the grid view (F1;158ðÞ ¼ 19:30;p<:001). However, there was no interaction between the menu type and number of paging (F1;282ðÞ ¼ :56;p¼ :641), and no interaction between the number of paging and the number of items displayed on a single screen ( F1;282ðÞ ¼ :30;p¼ :824). In addition, there was no 3-way interaction effect between three independent vari- ables on the overall satisfaction ( F1;282ðÞ ¼ 1:8;p¼ :913). Summary and discussion In the environment where relatively many items are displayed on a single screen, fast task completion times and high effi- ciency were observed. In this case, the list view type is better, whereas in the environment where few items are displayed, the grid view type gives better results. In the study by Shneiderman and Plaisnat, 24 the list view menu ensured that the user ’s gaze moved sequentially in the vertical direc- tion and the search proceeded, so that the user ’s attention was not dispersed and the search was easy. However, the grid view menu had a two-dimensional form, which allowed the user to easily see the entire object, requir- ing less motion and allowing for a quick selection. In Kammerer and Peter, 23eye tracking analysis shows that in the list view environment, the user moves the gaze in a linear way, moving from top to bottom, while in the grid view environment it is not linear and the movement of the eye follows in units of a row-column. Therefore, when many Figure 5. Interaction effect plot of efficiency (points) as a function of the number of items and menu type in Experiment 1. Figure 6. Interaction effect plot of Satisfaction (points) as a function of the number of items and menu type in Experiment 1. JOURNAL OF COMPUTER INFORMATION SYSTEMS 161 items are displayed on the small screen of smartwatches, it is considered that the performance is low in grid view type because it is highly dispersed in the vertical and horizontal directions and confusing to the users. However, when fewer items are displayed on a single screen, the movement of the gaze is relatively small, and it is considered that the grid view type is then appropriate for the task by grasping the whole object at a glance. This showed a faster task completion time and high efficiency. In this experiment, when three-page movements were required, and relatively few items were shown, efficiency was lower and the task completion time was slower in a list view environment. According to Norman ’sstudy, 29 this is because paging has a merit that makes it easy to grasp the overall image of a menu belonging to the next screen with one key operation. However, the newly changing screen can confuse users and reduce the sense of context. Additionally, a moderate level of the scroll has the advantage that a con- tinuous sense of navigation is maintained, though many levels of the scroll can disrupt location sense and cause confusion. If relatively few items are displayed on a single screen, the scrolling time is f aster than when many items are displayed because the time to search for items on a single screen is relatively faster. Esp ecially, in list view environ- ment, since the scroll direction and the line of sight coincide with each other, the user has a faster task completion time and generates a fast scroll. This fast scroll may cause missing items. Missing items will result in repeated searches that take a long time to complete the task. In addition to these cases, when the number of paging is increased, the menu structure becomes more congested, in which case the task completion takes a long time. In other words, in the list view environ- ment in which fewer items are displayed, if scrolling and paging are mixed, and the structure of a complicated menu is presented, this resulted in shorter task completion times and lowest efficiency. The frequency of errors was mostly observed in a situation where nine items on a single screen were provided as a list view type. According to a survey by Dandekar, Raju, and Srinivasan, 45 the average fingertip size of users is 8 –10 mm. It is claimed that at least 8 mm × 8 mm touch key size would be satisfactory. However, if nine items on a screen are pro- vided as list view type, the length is only 3.33 mm, much smaller than 8 mm. As a result, it seems that the frequency of errors is measured by clicking the items in the vicinity during the operation process. The overall satisfaction was high when the number of items displayed on a single screen was fewer. It is considered that the overall satisfaction is higher when the item menu size is relatively large due to the small screen of smartwatches. Experiment 2 Experiment 2 derived an efficient menu form on a small screen when the menu items provided could be categorized into upper categories and lower categories. Therefore, it was designed to find a specific public institution in a specific country and city, and a total of 64 public institution classes, eight cities, and eight countries were used. Methods Experiment environment and subjects Experiment 2 was conducted in the environment similar to Experiment 1. A total of 11 female and 25 male students attending Hongik University were recruited ( M = 24.1, SD = 1.81) for Experiment 2. All of the participants, including two with experience using smartwatches, had experience using smart devices with touch screens. The experiment was conducted in an independent space, and the experiment time did not exceed 30 minutes. The 5,000 won was given for participation. Variables The menu type and the number of the items displayed on a single screen are used as independent variables, and the level of each factor is the same as in experiment 1. Additionally, two levels of depth, representing a hierarchical menu, were selected as independent variables. We set the depth level to take into account the task situation of this experiment. Depth 2 has a 64 × 64 structure, and Depth 3 has a depth of 8 × 8 × 64. The dependent variable used the same task completion time, frequency of errors, efficiency, and overall satisfaction as in experiment 1. Experiment design and experiment procedure In Experiment 2, only one form of grid view or list view was experienced, and for the remaining independent variable level, it was designed as a nested factorial design, which served as the within subject factor. The participants experimented with four (1 × 2 × 2) menu conditions. The participants performed tasks to find items at three different positions in each menu environment. Items at three different location levels were randomly provided. In this experiment, the average time is used to measure the time to search for three items in different positions. The task is given as follows: “You have to find a High court in Galindo in Peru. ”The experiment sequence of all four environments was provided randomly, and when the time measurement of the three items in one situation was com- pleted, a subjective evaluation was conducted through the questionnaire. We also collected the frequency of errors. Results The collected data, from 36 people, were used for analysis without any missing values. A three-way between-subject ANOVA test was conducted. Objective measures Task completion time showed statistical difference in the number of items shown on a single screen ( F1;268ðÞ ¼ 17:72;p<:001) and depth ( F1;268ðÞ ¼ 17:72;p<:001). When the number of items displayed on a single screen were many ( M =19.8, SD = 6.67), measurement of task completion time was faster than when the number of items displayed on a single screen were few ( M = 24.1, SD = 6.62). Also, when the menu was categorized, measurement of task completion time was faster in Depth 3 ( M = 19.4, SD = 5.87), and provided a deeper structure 162 K. PARK ET AL. than Depth 2 ( M = 22.1, SD = 7.02). On the other hand, there was no difference in task completion time by menu type ( F1;268ðÞ ¼ 2:46;p¼ :119). There was an interaction between menu type and depth for task completion time ( Figure 7 ). There was no difference in the list view and the grid view type when depth was provided in three steps. However, when depth was provided in two steps, the task performed faster in the list view, compared to the grid view, and the difference was statistically significant. Also, the frequency of errors was measured. The frequency of errors was measured 96 times throughout the experiment. Within these, 86 times were observed in the list view type, where relatively many items were displayed on a single screen. Subjective ratings In the case of efficiency, the difference due to depth was significant (F1;268ðÞ ¼ 8:21;p¼ :005). Efficiency was evalu- ated by providing three levels of depth ( M = 4.6, SD = 2.03) rather than providing two levels of depth ( M = 3.6, SD = 1.81). On the other hand, the difference between menu type ( F1;268ðÞ ¼ 1:09;¼ :299) and the number of items displayed on a single screen ( F1;268ðÞ ¼ 3:33;p¼ :070) was not statistically significant. However, when relatively many items were displayed ( M = 4.4, SD = 1.96), efficiency was higher than when relatively few items were provided on a single screen ( M = 3.8, SD = 1.95). There was no interaction between independent variables on efficiency ( F1;268ðÞ ¼ :00;p¼ 1:000, F1;268ðÞ ¼ 37;p¼ :544 ;F1;268ðÞ ¼ 1:09;p¼ :299). There was also no three-way interaction effect between the three independent variables on efficiency ( F1;268ðÞ ¼ :483 ;p¼ :488 Þ. Among the main factors that affect the overall satisfaction, the difference due to the depth was significant (F1;268ðÞ ¼ 10:81;p<:001). Participants of the experiment were satisfied with the menu structure provided by dividing the depth level into three levels ( M = 4.6, SD = 1.90), com- pared to the menu structure provided by the two levels of depth ( M = 3.5, SD = 1.96). On the other hand, the difference in the number of items displayed on a single screen (F1;268ðÞ ¼ 1:97;p¼ :168) and the menu type (F1;268ðÞ ¼ :00;p¼ 1:000) was not statistically significant. Likewise, there was no interaction effect between all independent variables on the overall satisfaction (F1;268ðÞ ¼ 3:30;p¼ :071 ;F1;268ðÞ ¼ :91;p¼ :34, F1;268ðÞ ¼ :91;p¼ :343). There was also no 3-way interac- tion effect between the three independent variables on satis- faction ( F1;268ðÞ ¼ :27;p¼ :604). Summary and discussion The difference between task completion time by the depth and the number of items displayed on the screen was statis- tically significant. It was faster to measure the task completion time in a menu structure of 8 × 8 × 64, compared to a structure of 64 × 64. This is considered to be due to an increase in scrolling when 64 items on a small screen are provided to one breadth. This is consistent with previous studies. 36,38 Although the sizes of the screens on which the experiment was conducted were different, the user had a faster searching speed in a divided depth than a wide breadth requiring many scrolls. Also, relatively many items, even on small screens, contributed to the speed of task completion time. In the menu structure of depth 2, the list view type was performed faster than the grid view type. It is considered that the scroll direction coincides with the search. Again, most of the touch errors were measured on the list view of level 9 on a single screen. This is due to the size of the touch environment for relatively smaller item. There was a statistically significant difference in the effi- ciency and overall satisfaction due to depth level. In menu structure of depth 3, participants showed high efficiency and overall satisfaction. It is considered that the situation of pro- viding the step by step structure, without additional scrolling, increases the efficiency and overall satisfaction of the partici- pant. In a small screen environment, it can be concluded that when considering the menu structure, depth constituting the menu structure in stages is more suitable than scroll. General discussion The result of Experiment 1 shows relatively more items on the screen, shortened task completion time and increased per- ceived efficiency. On the other hand, when relatively few items were displayed on a single screen, satisfaction was high. The fewer the items displayed on a screen, the larger the size of the item. It is believed that the overall satisfaction is increased when the item size is relatively large due to the small screen of smartwatches. On the navigation side of the menu, the task completion time was shortest in Paging 3, where scrolling and paging were mixed. Experimental results in the case of providing items at the same level did not show the results of task completion time, efficiency, and overall satisfaction. For task completion time and subjective efficiency in the process of searching, it is desirable to design many items to be displayed on a single screen. In parti- cular, when relatively many items were displayed on a single screen, the list view type menu had higher efficiency and overall satisfaction than the grid view type. On the other hand, when relatively few items were displayed on a single screen, the overall Figure 7. Interaction effect plot of completion time (seconds) as a function of depth and menu type in Experiment 2. JOURNAL OF COMPUTER INFORMATION SYSTEMS 163 satisfaction was high. In this case, the grid view menu had higher overall satisfaction than the list view menu. In Paging 3, which is a complex mixture of scrolling and paging, the list view menu with few items on a single screen showed slow task completion time and negative efficiency. This is because the confusion caused by the many levels of the scroll and the complex paging method is likely to have reduced the user ’s sense of context. In the search experiment of the menu, which can be categorized by the concept of upper and lower level, when relatively many items are displayed on a single screen, task completion time is shortened, and efficiency is high. Also, the categorized menu structure shortened task completion time and showed good evaluation in efficiency and overall satisfac- tion. This is consistent with the results of research conducted on a larger screen than smartwatches. 36,38 Based on the results of this study, we propose an efficient menu method as follows: Providing the same level of items on a small screen, 1. In terms of task completion time and efficiency, it should be designed to show many items on a single screen. 1–1. In particular, the list view type seems to be appro- priate in order to improve efficiency. 2. From the viewpoint of satisfaction, it should be designed to show few items on a single screen. 2–1. In this case, it is judged that the grid view type is appropriate. 3. Do not mix scrolling and paging methods. Providing items that can be categorized into top and bot- tom concepts, (1) In terms of task completion time and efficiency, it should be designed to show many items on a single screen. (2) Categorize to provide depth of menu. In both cases, when many items are displayed on a single screen, task completion time and efficiency are good. On the other hand, in terms of the overall satisfaction, it is good that few items are displayed on a single screen. Taking these results into consideration, makers of smartwatches will have to make decisions about the menu interface of smartwatches.Thesalesvolumeofsmartwatchesisonthe rise, and the market is expected to remain buoyant. Various functions are applied to smartwatches, but it is restricted to the small screen. There are limitations in applying the guidelines applicable to a larger screen to smartwatches of a smaller screen. This study experimented with an efficient menu for providing methods on a small screen. Based on the results, it was possible to derive an efficient way of providing menus in two universal situations in which menus are provided (the same level of menus, and the menu structure that can be categorized into higher-level concepts). This study has several limitations. Experiments were con- ducted in smart phones environment assuming a universal smartwatches screen size. We also experimented with the assumption of square smartwatches enclosure. At present, there are also circle-shaped smartwatches (for example, Samsung Gear S3), and it seems necessary to study possible differences resulting from the outer shape. In addition, smart- watches often operate on the move. This experiment was conducted assuming a static situation in an independent experimental space. Studies that consider movement are also likely to be needed in the future. Funding This research was partially supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (Grant No. 2015R1C1A1A01053529). ORCID Kyungdoh Kim http://orcid.org/0000-0003-1062-6261 References 1. Mckinsey Global Institute. Unlocking the potential of the internet of things. 2015 [accessed 2017 May 20]. http://www.mckinsey. com/insights/business_technology/the_internet_of_things_the_value_of_digitizing_the_physical_world 2. Business Insider Australia. Just 3.3 million fitness trackers were sold in the US in the past Year. 2014. [accessed 2017 May 20] .http://www.businessinsider.com/33-million-fitness-trackers-were- sold-in-the-us-in-the-past-year-2014-5 . 3. Bieber G, Kirste T, Urban B Ambient interaction by smart watches. In Proceedings of the 5th International Conference on Pervasive Technologies Related to Assistive Environments. Crete:ACM; 2012 . p. 39. 4. Johnson KM. Literature Review: An Investigation into the Usefulness of the Smart Watch Interface for University Studentsand the Types of Data they would Require. 2014 .http://img1. wikia.nocookie.net/__cb20140 801120101/mobile-computing-pre diction/images/c/c7/Literature_Review_KMJ.pdf . 5. Bieber G, Haescher M, Vahl M Sensor requirements for activity recognition on smart watches. In Proceedings of the 6thInternational Conference on PErvasive Technologies Related to Assistive Environments. Island of Rhodes: ACM; 2013 . p. 67. 6. Rhodes BJ, The wearable remembrance agent: a system for aug- mented memory. First International Symposium on Wearable computers. Massachusetts: IEEE; 1997 . p. 123 –28. 7. Starner T Wearable computing: through the looking glass. In Proceedings of the 2013 International Symposium on WearableComputers. Zurich: ACM; 2013 . p. 125 –26. 8. Pascoe J, Thomson K On the use of mobile tools in everyday life. In Proceedings of the 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces. Adelaide: ACM; 2007 .p.39 –47. 9. Marks P. Samsung launch kickstarts the smartwatch boom. New Scientists. 2013 ;219(2934):22. 10. Consumer Electronics Association. One-third of US Consumers Plan to Purchase Fitness Technologies; 2013 [accessed 2017 May 20]. https://www.healthpopuli.com /2013/01/06/ces-2013-health- survey/ 11. Jones M, Marsden G, Mohd-Nasir N, Boone K, Buchanan G. Improving Web interaction on small displays. Comput Networks. 1999 ;31(11):1129 –37. doi: 10.1016/S1389-1286(99) 00013-4 . 12. Haseloff S. Designing adaptive mobile applications. In: Parallel and Distributed Processing. Ninth Euromicro Workshop. Mantova: IEEE; 2001 . p. 131 –38. 164 K. PARK ET AL. 13. Kim MJ, Yoon SK, Choi JH. A study of menu structure on smartwatch for effective navigation-focused on types of menu and color cues. Korea Digital Des Counc. 2015 ;15:395 –406. 14. Narayanaswami C, Raghunath MT Application design for a smart watch with a high resolution display. The Fourth InternationalSymposium on In Wearable Computers. Seoul: IEEE; 2000 .p.7 – 14. 15. Ling C, Hwang W, Salvendy G. A survey of what customers want in a cell phone design. Behav Inf Technol. 2007 ;26(2):149 –63. doi: 10.1080/01449290500128214 . 16. Chae M, Kim J. What ’s so different about the mobile Internet? Commun ACM. 2003 ;46(12):240 –47. doi: 10.1145/953460 . 17. Perrault S, Lecolinet E Using low-power sensors to enhance inter- action on wristwatches and bracelets. In International Conference on Mobile Computing, Applications, and Services. Paris: Springer International Publishing; 2013 . p. 261 –64. 18. Aum HJ. The impact of smartwatch display shape and text layout on readability and preference. Yonsei University (Seoul):Graduate Program in Cognitive Science; 2015 . 19. Park S, Park J, Choe J, Jung ES. The effect of text information frame ratio and font size on the text readability of circle smart- watch. J Ergon Soc Korea. 2014 ;33(6):499 –513. doi: 10.5143/ JESK.2014.33.6.499 . 20. Komninos A, Dunlop M. Text input on a smart watch. IEEE Pervasive Computing. 2014 ;13(4):50 –58. doi: 10.1109/MPRV.2014.77 . 21. Mo F, Yi S, Zhou J Effect of icon amount and visual density on usability of smartwatches. In International Conference on Human Aspects of IT for the Aged Population. Los Angeles: Springer International Publishing; 2014 . p. 466 –77. 22. Plaumann K, Müller M, Rukzio E CircularSelection: optimizing list selection for smartwatches. In Proceedings of the 2016International Symposium on Wearable Computers. Heidelberg: ACM; 2016 . p. 128 –35. 23. Kammerer Y, Gerjets P. The role of thinking-aloud instructions and prior domain knowledge in information processing and source evaluation during Web search. In: CogSci. 2013 . p 716 –21. 24. Schneiderman B, Plaisant C. Designing the user interface. Don mills: Addison Wesley; 1998 . 25. Kammerer Y, Gerjets P How the interface design influences users ’ spontaneous trustworthiness evaluations of web search results:comparing a list and a grid interface. In Proceedings of the 2010Symposium on Eye-Tracking Research & Applications. Austin: ACM; 2010 . p. 299 –306. 26. Salomon A Menu design of mcommerce applications: a human factors approach to menu design for mobile commerce applica- tions [dissertation]. California (US):San Jose State University;2014 . 27. Kujala T Efficiency of visual time-sharing behavior: the effects of menu structure on POI search tasks while driving. In Proceedingsof the 1st International Conference on Automotive User Interfaces and Interactive Vehicular Applications. Essen: ACM; 2009 .p.63 –70. 28. Kim K. Comparison between overview menu and text menu in smartphone. J Ergon Soc Korea. 2013 ;32(6):529 –34. doi: 10.5143/ JESK.2013.32.6.529 . 29. Norman KL, Friedman Z, Norman K, Stevenson R. Navigational issues in the design of online self-administered questionnaires. Behaviour Inf Technol. 2001 ;20(1):37 –45. doi: 10.1080/ 01449290010021764 . 30. Dillon A, Richardson J, McKnight C. The effects of display size and text splitting on reading lengthy text from screen. Behaviour Inf Technol. 1990 ;9(3):215 –27. doi: 10.1080/01449299008924238 . 31. Hwang SL, Wang MY, Her CC. An experimental study on Chinese information displays on VDTs. HumanFactors. 1988 ;30:461 –71. 32. Nielson J Changes in web usability since 1994. 1997; [accessed 2017 May 20] http://www.useit.com/alertbox/9712a.html 33. Miller DP The depth/breadth tradeoff in hierarchical computer menus. In Proceedings of the Human Factors Society AnnualMeeting. Los Angeles: SAGE Publications; 1981 . p 296 –300. 34. Schultz EE Jr, Curran PS. Menu structure and ordering of menu selection: Independent or interactive effects? SIGCHI. Bulletin:ACM; 1986 ,p.69 –71. 35. Jacko JA, Salvendy G, Koubek RJ. Modelling of menu design in computerized work. Interact Comput. 1995 ;7(3):304 –30. doi: 10.1016/0953-5438(95)93606-6 . 36. Chae M, Kim J. Do size and structure matter to mobile users? An empirical study of the effects of screen size, information structure, and task complexity on user activities with standard web phones. Behaviour Inf Technol. 2004 ;23(3):165 –81. doi: 10.1080/ 01449290410001669923 . 37. Dawkins AL, Antón AI, Lester J, Amant RS. Personalized hier- archical menu organization for mobile device users. [dissertation].North Carolina (US): North Carolina University; 2007 . 38. Christie J, Klein RM, Watters CA. Comparison of simple hierar- chy and grid metaphors for option layouts on small-size screens. Int J Human Comput Stud. 2004 ;60(5):564 –84. doi: 10.1016/j. ijhcs.2003.10.003 . 39. Albinsson PA, Zhai S High precision touch screen interaction. In Proceedings of the SIGCHI conference on Human factors in computing systems. Lauderdale: ACM; 2003 . p 105 –12. 40. Colle HA, Hiszem KJ. Standing at a kiosk: effects of key size and spacing on touch screen numeric keypad performance and user preference. Ergonomics. 2004 ;47(13):1406 –23. doi: 10.1080/ 00140130410001724228 . 41. Parhi P, Karlson AK, Benderson BB. Target size study for one- handed thumb use on small touchscreen devices. In Proceedingsof the MobileHCI ’06. Espoo: ACM; 2006 . p 203 –10. 42. Schedlbauer M Effects of key size and spacing on the completion time and accuracy of input tasks on soft keypads using trackballand touch input. In Proceedings of the Human Factors andErgonomics Society Annual Meeting. Los Angeles: SAGE Publications; 2007 . p. 429 –33. 43. Kim BR, Kim TI, Lim YJ, Jung ES. Usability evaluation of the size of small touch keys for the smart phone. J Korean Inst Ind Engineers. 2012 ;38(2):80 –88. doi: 10.7232/JKIIE.2012.38.2.080 . 44. Nielsen J, Landauer TK ( 1993 , May). A mathematical model of the finding of usability problems. In Proceedings of theINTERACT ’93 and CHI ’93 conference on Human factors in computing systems. Amsterdam: ACM; 1993. p. 206 –13. 45. Dandekar K, Raju BI, Srinivasan MA. 3-D finite-element models of human and monkey fingertips to investigate the mechanics of tactile sense. J Biomech Eng. 2003 ;125(5):682 –91. doi: 10.1115/ 1.1613673 . JOURNAL OF COMPUTER INFORMATION SYSTEMS 165