1. Reference Framework for the Evaluation of DC
In 2013, the Institute for Prospective Technological Studies (IPTS) from the European Commission’s Joint Research Centre launched the Digital Competence Framework (DigComp), integrating existing conceptualisations of DC
[1]. This framework arranged the dimensions of DC in five competence areas: information and data literacy, communication and collaboration, digital content creation, safety and problem solving. In total, 21 DCs are distributed in the five competence areas. In 2016, DigComp 2.0 was published
[2] and updated the terminology, concepts and descriptors of DCs. In 2017, DigComp 2.1 was released
[3] and applied significant changes, such as increasing the initial three proficiency levels to eight, making use of Bloom’s taxonomy in the definition of the descriptors of DCs.
DigComp is a reference framework structured in five dimensions: (1) competence areas involving the different DCs, (2) descriptors for each DC, (3) proficiency levels at the DC level, (4) knowledge, skills and attitudes expected in each DC and (5) different purposes of applicability. The DigComp framework has been used as the reference framework due to its remarkable strengths: (1) it was designed after a deep analysis of the available DC frameworks, (2) it followed a meticulous process of consultation and development by experts in the area of DC and (3) as a result, it provides a comprehensive view based on DCs and competence areas. For similar reasons, the United Nations Educational, Scientific and Cultural Organization (UNESCO) also selected DigComp as the reference DC framework for the development of the Digital Literacy Global Framework (DLGF)
[4][5]. Even more, the World Bank also identified the DigComp framework, in a recent report, as one of the most comprehensive and widely used frameworks for general DC
[6].
DigComp describes DC regardless of the technologies and devices employed. Nevertheless, common software tools tend to provide similar functions despite the fact that the interface design may vary
[7]. Moreover, findings from recent studies also questioned that DC is independent of the task context and the technology used, since in some specific fields, concrete digital technologies or handling specific digital technologies may be a relevant DC
[4].
Based on the data collection approach, three major categories were identified in the custom implementations based on DigComp
[3]: (1) performance-based assessment, where examinees have to solve tasks that are usually expected to be faced in a real-life context by using simulations or typical applications such as Office suites, (2) knowledge-based assessment, where the declarative and procedural knowledge of examinees is measured and (3) self-assessment, which is mainly based on Likert scales and examinees self-evaluate their level of knowledge and skills. Other authors such as Sparks et al.
[8] illustrated different designs of assessment instruments according to their purposes: research purposes, accreditation, institutional quality assurance, self-assessment to support professional development, etc.
Regarding the types of items selected in the design of evaluation instruments, test designers tend to use constrained response item formats. Their implementation is simple and facilitates the automatic correction. However, these formats are not the most suitable for assessing higher-order skills. To assess higher-order skills according to the intermedium and advanced levels of DigComp, more sophisticated formats are necessary, such as purpose-built games or interactive simulations, to ensure an effective evaluation of DC
[9]. Furthermore, despite the study carried out by Heer
[10] to select different item formats to meet the assessment purposes, empirical evidence choosing the most suitable item types according to the assessment objectives is scarce.
Finally, the multidimensionality of the DC construct has been identified in several studies. For example, DC has been theoretically structured in five competence areas in the DigComp framework
[2]. However, theoretical and empirical studies have reported contradicting results. For example, Reichert et al. analysed the most commonly used digital literacy frameworks and found that empirical evidence on the use of digital applications allows for distinguishing between a general digital literacy component and four application-specific components (the web-based information retrieval factor, the knowledge-based information retrieval factor, the word processing factor and the digital presentation factor)
[11]. Jin et al. found in their custom implementation based on DigComp that DC can be considered as a general one-dimensional construct
[12]. In the systematic literature review conducted by Siddiq et al.
[13], in most studies where dimensionality was checked, it was concluded that DC is a unidimensional construct, i.e., the construct has a unique underlying dimension that can be measured using a single measure of the test. Although further studies have continued in the same line, e.g.,
[14][15][16], the need for further research was also suggested. In addition to frameworks of DC, various national and international assessment studies were conceived based on a multidimensional framework, e.g., the International Computer and Information Literacy Study (ICILS). However, the empirical results presented differences in terms of quantity of dimensions and categories identified in the dimensions.
2. Information and Data Literacy
According to DigComp, IDL is one of the competence areas composed of three DCs.
This area is also known as information literacy or digital information literacy, and it is constantly changing due to the recurrent changes in how citizens access and manage information through different types of devices. Citizens, and especially youth, are replacing traditional media with social networks, which are currently one of the means most used, but at the same time supposes an ungoverned source of information that tends to create confusion, generate controversies and distrust
[17][18][19], enable users to be active content creators
[20] and influence young people in their choice of role models
[21]. Moreover, the ease and speed of propagation that social networks facilitate for disinformation has become one of the most dangerous threats
[22][23][24][25], in conjunction with the emergence of discourses based on emotional appeal to influence choice by making use of different ways such as click baiting, algorithms based on artificial intelligence, creation of filtered bubbles, personalisation of information, etc.
[24][25][26]. The 2019 Eurobarometer already showed an increase in concern over issues such as the rapid growth of fake news (74%) and towards social media (65%)
[27]. IDL has been identified as a key literacy to identify fake news
[28]. So, in this context, it is necessary to examine and assess how citizens perceive and evaluate the media in terms of fake news.
There are many self-reports where individuals must self-assess their level, and most of them are tools composed of multiple-choice questions measuring low-order cognitive skills, e.g.,
[13][29][30][31][32]. In addition, assessments cannot be carried out using simple self-assessment tests. They offer a solution that is easy to implement but tends to obtain unrealistic results from examinees, caused by their overconfidence, especially examinees with very low ability
[33]. The existence of the Dunning–Kruger Effect has also been proven to exist in the IDL area
[34]. There are also some exceptions, e.g., using open-ended tasks with scoring rubrics
[35][36], but these alternatives would be very complicated to integrate in a certification context requiring safe settings.
From the point of view of operationalising the construct of IDL for assessment purposes, Sparks et al.
[8] indicated that test designers appear to take two possible approaches: (1) selecting a particular framework aligned with the construct defined in their implementation and then designing items according to the descriptors of the framework (this option is suitable for assessing a specific set of skills) or (2) operationalising the construct at a conceptual level, thereby developing authentic tasks that evaluate ECDL in a broader way. This option is suitable for defining the construct more holistically and examining whether examinees can put their knowledge into action in a real context. Consequently, the intended learning objectives and the type of assessment foreseen should be clarified from the beginning. Even more, beyond a specific construct definition, in the development of the assessment other issues should be considered, e.g., the contexts where information is going to be accessed, evaluated and used or whether a specific technology is an assessment target in itself or constitutes a way for achieving an objective.
Regarding the implementation of the assessment tools, Sparks et al.
[8] categorised different types of assessment as: (1) consisting of constructed response questions focused on IDL, such as the International Computer and Information Literacy Study (ICILS), (2) consisting of constructed response questions focused on technology literacy, such as ECDL and (3) consisting of performance-based tasks focused on IDL, such as the Interactive Skills Assessment Tool (iSkills, Mount Maunganui, New Zealand).
IDL assessment in higher education is a key issue too
[8][37], and interest in developing instruments to assess IDL has been growing in recent years. However, most of the tests are developed from two perspectives, librarian and academic, and are often domain specific
[38][39].
With regard to the validation of the quality of the assessment instruments, the classical test theory was applied in most of the tests identified, and the most commonly performed analyses were content and discriminant validity and internal consistency reliability
[40]. Therefore, experts argued the need to have free available assessment instruments for measuring IDL, performing a more effective assessment, validated and independent of the domain and the context
[38].
3. Netiquette
According to DigComp
[41], netiquette is one of the six DCs defined in the communication and collaboration competence area, defined as: “
To be aware of behavioural norms and know-how while using digital technologies and interacting in digital environments. To adapt communication strategies to the specific audience and to be aware of cultural and generational diversity in digital environments”.
In our present society, where ICTs are present into most areas and social networks and the extensive use of mobile devices have radically modified the way of interacting among people, netiquette is becoming a crucial DC
[42]. Thus, a new scenery emerges for understanding human relations, from how interpersonal skills are exercised online to how social behaviours are exhibited in groups and online communities
[43]. Cabezas-González et al. found that individuals who communicate online frequently and make use of social networks very frequently tend to show lower levels of DC, contrary to expectations
[44]. So, it is of great importance to investigate the current education of individuals in communication and collaboration and in netiquette too
[45]. Nevertheless, netiquette has been barely defined and still does not seem to have attracted the required attention
[46]. Only a few studies have analysed the guidelines related to the correct use of electronic mail, e.g.,
[47][48], or presented general guidelines for the Internet, e.g.,
[49]. No studies have attempted to define which DCs a citizen should have to communicate efficiently through everyday tools such as instant messaging applications, social networks or email.
Regarding the implementation of the assessment tools, the empirical articles identified included the development of tailored tests, e.g.,
[50][51], whose validity and reliability evidence are insufficient; netiquette DC has not yet been assessed in depth and most of them include only a few general questions
[4][29]. From a broader point of view, experts have identified the lack of instruments for evaluating individuals’ DC in the communication and collaboration competence area
[13]. Only BAIT, closely related to this study, provides a test exclusively dedicated to the assessment of netiquette
[52].
4. Item Response Theory (IRT)
IRT, also referred to as item characteristic curve theory, attempts to give a probabilistic foundation to the problem of measuring unobservable traits and constructs, or latent traits
[53][54]. IRT is widely used to calibrate and evaluate items in assessment instruments and provide ways of assessing the properties of a measurement instrument in terms of reliability and validity
[14][53][55][56][57][58]. The main features and advantages that characterise IRT are: (1) the existence of latent traits that can explain an examinee’s behaviour in a test, (2) the relationship between performance and the set of traits assessed, (3) the dimensionality specification, (4) the position of the item in the trait’s value set, (5) assessment instruments with properties that do not depend on the specific group of respondents or the specific set of items showed, as both items and examinees receive a score on the same scale at the same time, (6) in contrast to classical test theory (CTT), basic units of analysis are based on the items and not on the assessment instrument and (7) the reliability of an assessment instrument is dependent on the action between the examinee and the assessment instrument.
Considering the strengths of the theory, the Rasch measurement model has been used to investigate the reliability and validity of the tests developed in the study. The Rasch measurement model is the simplest model available within an IRT context and facilitates interpretation assuming that the response of the examinees to an item only depends on their proficiency and the difficulty of the item
[57]. Furthermore, in IRT, the internal validity of a test is evaluated in terms of the fit of the items to the model. Marginal maximum likelihood (ML) is the most commonly used method in the estimation of the models in IRT and presumes that the parameters of an individual are aleatory variables with a certain distribution
[58].
Applying the most suitable IRT model firstly relies on the characteristics of the items used, dichotomous or polytomous. For dichotomous items, such as the ones used in tests, the most used models are the logistic models with one, two or three parameters. The parameters to characterise the items include
[59]: their difficulty (situating the item on the ability scale, which states the probability of being answered correctly), discrimination (representing the degree of variation in the success rate of individuals as a function of their ability) and a pseudo guessing parameter (representing the lower asymptote where even less-capable individuals will score by guessing). IRT is based on the principle that it is possible to measure latent traits, i.e., traits which are not directly perceptible. Some items can comprise a specific trait (e.g., competences for evaluating the information)
[54].
In addition, measures constructed using the Rasch measurement model are unidimensional and have expected structures of item calibrations that cover the difficulty range of difficulty within a domain in an assessment domain. The results are valid only to the extent that the dimensions are different and clear, i.e., there are no items assessing different variables at the same time; therefore, the unidimensionality assumption is realistic. Hence, other models such as the multidimensional IRT models appeared, which consider a construct consisting of various factors. The multidimensional random coefficient multinomial logit (MRCML) model was presented as an alternative to confirmatory factor analysis (CFA)
[60]. CFA and multidimensional IRT are methods applied to validate a possible organisation of the information. The multidimensional Rasch model is the simplest of MIRT models, which assumes that all item loadings are set to unity with the Rasch model
[61].
Finally, relatively small sample sizes could be sufficient for Rach analysis, and about 200 examinees suffice for obtaining accurate parameter estimates
[62].