The Theory of Mobile Usability: Comparison
Please note this is a comparison between Version 1 by Paweł Weichbroth and Version 2 by Jessie Wu.

The success of a new mobile application depends on a variety of factors ranging from business understanding, customer value, and perceived quality of use. In this sense, the topic of usability testing of mobile applications is relevant from the point of view of user satisfaction and acceptance. 

  • mobile usability
  • testing
  • methodology
  • framework

1. Usability Conceptualization

In light of a recent study [1][27], the most widely accepted definition of usability is that provided in the ISO 9241-11 standard, which states that usability is “the extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness efficiency and satisfaction in a specified context of use” [2][28].
At this point, of an obvious nature, a question arises: how does one understand context? Context can be understood as a carrier of information about the environment, place, time, and situation in which an entity currently exists [3][29]. Here, an entity is a user who deliberately interacts with a mobile application. With the ubiquity of mobile devices (GPS devices) [4][30] and Internet connectivity (public Wi-Fi hotspots, home Wi-Fi, LTE 4G, and 5G) [5][31], the ability to incorporate this type of information is common and in many domains has become even an imperative to use [6][32]. In summary, context in mobile systems can be divided into three categories [7][33]:
  • external, independent, and valid for all interested users (e.g., current weather, dangerous events, time, etc.);
  • location, refers to information about the user’s point of interest (e.g., traffic jams, road conditions, parking space, restaurant reviews, etc.);
  • user-specific, related to the user’s attributes, beliefs, activities, and interests (e.g., gender, age, nationality, religion, etc.).
Incorporating such context in mobile applications significantly enhances the quality of service in terms of perceived usefulness by making our everyday environments increasingly intelligent [8][34].

2. Usability Attributes Conceptualization

By definition, an attribute is a quality or feature regarded as a characteristic or inherent part of something [9][35]. Similarly to the notion of usability, attributes do not exist as such. On the contrary, they emerge from the physical interaction between the user and the mobile application. If now one takes into account the aforementioned usability definition, the question arises as to how to measure the extent of effectiveness, efficiency, and satisfaction. The answer is twofold: through user observation or by user survey.
That being said, an attribute can be classified as “observable” or as “perceived”, respectively. While it is possible to change the type from the former to the latter, then the reverse operation is hardly achievable, or even impossible, due to human nature. For instance, very few users, if any, explicitly manifest satisfaction during or after using typical mobile applications. Nevertheless, there have been attempts to identify, measure, and evaluate numerous qualities with regard to both the user and the application, especially in domains such as games [10][36] or entertainment [11][12][37,38].
Let us now look at three attributes referred to in the ISO 9241-11 standard. It should be noted that, while effectiveness and efficiency are directly observable qualities, satisfaction is a “hidden” quality. Moreover, it is also possible to measure both effectiveness and efficiency through user survey. In short, Table 1 shows the 2-category classification of the ISO 9241-11 usability attributes.
Such a distinction has implications for the conceptualization of the usability attributes. Firstly, in the case of the observed category, the object of measurement is a user, or, more precisely, the user’s level of task performance. With this assumption, Table 2 shows the definitions of the observed usability attributes.
.
Secondly, in the case of the second category, the object of measurement is a mobile application, in particular the user’s perceived level of workload and application performance, as well as the self-reported level of satisfaction. The definitions of the perceived usability attributes are provided in Table 3
In summary, the observed attributes can be interpreted in terms of the performance-based characteristics of the user, whereas the perceived attributes can be interpreted in terms of the user’s perceptions of certain application characteristics, as well as their own feelings of comfort and task fulfilment.
It should also be noted that there are other commonly studied attributes that are considered latent variables. In this regard, the most frequent ones also concern [1][27] learnability, memorability, cognitive load, simplicity, and ease of use.

3. Usability Attributes Operationalization

By definition, operationalization is “the process by which a researcher defines how a concept is measured, observed, or manipulated within a particular study” [18][44]. More specifically, the researcher translates the conceptual variable of interest into a set of specific “measures” [19][45]. Note that, here, a measure is a noun and means a way of measuring with the units used for stating the particular property (e.g., size, weight, and time), whereas “measures of quantitative assessment commonly used for assessing, comparing, and tracking performance or production” are termed as metrics [20][46]. In other words, a metric is a quantifiable measure of the observed variable.
However, the other way to quantify variables is to use indicators. By definition, an indicator is “a quantitative or qualitative variable that provides reliable means to measure a particular phenomenon or attribute” [21][47]. Indicators are used to operationalize latent variables [22][48], in both reflective and formative measurement models [23][49]. In summary, for the sake of methodological clarity of the above terms “metric” and “indicator”, only the former will be used for both observable and perceived attributes.
Drawing upon the usability attributes classification, now reswearchers can turn to operationalize them, which requires specification of the quantifiable metrics, along with corresponding measurement scales.

3.1. Observed Effectiveness

To quantify the observed effectiveness of a user in the context of the performed tasks, in total, five metrics are provided in Table 4 with assigned units and quantities.

3.2. Observed Efficiency

By definition, efficiency is a quality that is measured by the amount of resources that are used by a mobile application to produce a given number of outputs. Now, thinking in terms of usability testing, the measured resource concerns the amount of time that a user needed to perform a particular task. Thus, the observed efficiency is measured by the completion time (EFFI1 metric) in units of time (commonly in seconds) with respect to each individual task, or much less often to a set of related tasks.

3.3. Perceived Effectiveness

It should be noted that observed and perceived effectiveness are measured by the same metrics except for the first one (EFFE1) since its submission to the respondent would imply a self-assessment of the rate of task completion. The following 7-point Likert scale can be used: absolutely inappropriate (1), inappropriate (2), slightly inappropriate (3), neutral (4), slightly appropriate (5), appropriate (6), and absolutely appropriate (7).

3.4. Perceived Efficiency

If rwesearchers c consider efficiency as an unobservable construct, the 7-point rating scale is also used to measure and rate the mobile application in this view. Table 5 shows the details of the perceived efficiency metrics.
Similarly, if efficiency is treated as an unobservable construct, the 7-point Likert rating scale can be used to measure and evaluate the mobile application in this perspective, starting from extremely low (1), very low (2), low (3), moderate (4), high (5), very high (6), to extremely high (7). Note that, for all metrics, expect the last one, a reverse scale must be used to estimate the perceived efficiency in order to preserve the correct interpretation of the collected data.

3.5. Perceived Satisfaction

In general, satisfaction is “a pleasant feeling you get when you get something you wanted or when you have done something you wanted to do” [25][51]. The perceived satisfaction construct (SATI) is composed of the three metrics validated in other usability studies. Table 6 provides a detailed description.
The following 7-point Likert scale can be used, starting with strongly disagree (1), disagree (2), somewhat disagree (3), neither agree nor disagree (4), somewhat agree (5), agree (6), and strongly agree (7).