Barriers and Support Factors of Open Data: Comparison
Please note this is a comparison between Version 2 by Lindsay Dong and Version 3 by Lindsay Dong.

Obstacles to Open Data can manifest at various levels. Fourteen potential supporting factors (n = 14) and thirteen barriers (n = 13) to the provision and anonymization of personal data were identified. These encompassed technical prerequisites as well as institutional, personnel, ethical, and legal considerations. These findings offer insights into existing obstacles and supportive structures within Open Data processes for effective implementation.

  • open data
  • support factors
  • barriers
  • open science

1. Introduction

Releasing anonymized personal data as Open Data sparks intense debate across academic disciplines, fueled by a burgeoning global interest in exploring its diverse applications [1][2]. Recent technological advancements have exponentially increased the possibilities for fully automated data collection and analysis, impacting all aspects of human life [3][4]. However, the comprehensive and cross-disciplinary use of Open Data has not yet been fully implemented [5].
Many opportunities, especially in managing diverse sources of personal health data, remain unexplored [2]. Organizations often struggle to prepare and integrate available datasets for comprehensive analysis, hindering the ability to derive full benefits from the data [6][7][8].
Hence, it is evident that Open Data offers significant societal potential across various dimensions, but organizations have been reluctant to make data available as Open Data. Moreover, there is an absence of a widespread, cross-sectoral data pool for Open Data.

2. Open Data in Authorities, Companies, Research Institutions, and the Healthcare Sector

2.1. Authorities

The disclosure of government data, known as Open Governance Data (OGD), has become a significant global topic [9][10][11]. OGD is attributed to transformative value [11]. Over the last decade, the OGD spectrum has gained much attention in research and practice. Governments worldwide strive to build an OGD ecosystem, as many cultural and institutional benefits are expected through OGD [10]. Enhanced data utilization should be integrated into decision-making processes, especially locally [8]. OGD initiatives are emerging as part of a new public governance policy that increasingly includes citizens as co-producers of public policy through access to official information [12]. The data release is expected to drive new service innovations, increase transparency in government agencies [13], and result in societal benefits [12]. During these developments, the Organization for Economic Co-operation and Development (OECD) developed an OUR (open, useful, reusable) Data Index to determine and evaluate public data provision. Similarly, the international WWW Foundation created an Open Data Barometer, providing an overview of government data publication [9].

2.2. Companies

Companies across various industries concur that big data plays a pivotal role in shaping the future, necessitating the development of relevant workforce capabilities and knowledge. For instance, big data technologies are seen as promising opportunities for companies in the health and pharmaceutical sectors seeking to secure or establish a competitive advantage [14]. Some studies have identified the impact of using Open Data on economic growth [10]. Therefore, continued data utilization and processing are regarded as crucial drivers for future industrial development and value creation [9]. Enabling these extensive data analyses is viewed as a shared responsibility among all industry stakeholders [2]. For this purpose, a data strategy for businesses is considered indispensable, encompassing internal orientation, such as how data are collected and used to enhance administrative processes and services and external alignment. This involves data management for other stakeholders who utilize the data to create societal value.

2.3. Research Institutions

Open Science in the realm of research projects pertains to unrestricted access to scholarly publications and research raw data [15]. The past decade has witnessed significant growth in the Open Science movement, aiming to facilitate the unimpeded dissemination of scientific discourse and broad distribution of research findings [16]. Open Science is seen as instrumental in enhancing data accessibility [15]. In 2018, the International Committee of Medical Journal Editors (ICMJE) introduced data sharing from clinical studies published in journals; however, recent studies have revealed considerable disparities between intentions and actual data accessibility [15]. Nevertheless, disseminating and institutionalizing Open Science is considered a pivotal moment in shaping rules for data protection and sharing [17]. The European Commission released a strategy specifically for research data called the European Open Science Cloud (EOSC), aimed at facilitating data exchange and further data analysis for publicly funded research [16]. Leveraging the FAIR (findable, accessible, interoperable, reusable) data framework and EOSC initiatives, the entire research lifecycle is set to undergo fundamental changes to become more efficient, transparent, credible, and collaborative. Integrating data with expanded sample sizes have led to significant progress, particularly in rare diseases and genetic disorders. Similar strides are anticipated if health and environmental data are interconnected [16]. However, collecting data anew for each new project and insufficiently incorporating previous studies into meta-analyses are considered wasteful of research resources [18].

2.4. Healthcare Sector

The global COVID-19 pandemic has underscored significant challenges in collecting, integrating, and sharing medical personal data worldwide [6][19]. Data analysis from various sources can provide vital information for pandemic management [6]. For instance, Horn and Kerasidou emphasize that data on individual behavior can offer crucial insights into virus spread [20]. Furthermore, Feeney et al. [21] stress the importance of collecting and managing personal health data in times of increased mobility and crises. Data flow is becoming increasingly important for ensuring optimal healthcare, especially for vulnerable groups, such as migrants, chronically ill individuals, and children. National borders must not constrain health data [21]. Consequently, there is a demand for cross-border data exchange in electronic health services at the European level [21][22]. The need for international collaboration has grown steadily, and the opportunities presented by artificial intelligence and big data in the medical sector should be fully harnessed [22]. The healthcare sector has long called for more excellent technological orientation and the use of big data [20]. However, patients and healthcare organizations are frustrated by numerous barriers to accessing patient data [23]. Many health data are currently stored in data silos due to privacy concerns and are not yet accessible for shared data utilization [24]. Throal et al. illustrate, using intensive care as an example, that much machine-readable data are generated daily in this discipline. However, they have not been used further due to legal and ethical concerns [25]. Leveraging big data in healthcare promises more accurate prognosis, new diagnostic approaches, and improved and efficient treatment [20][22][26]. The rapid technological advancements driven by artificial intelligence and machine learning techniques have fundamentally expanded the ability to identify patterns and structures in data that can enhance health, diagnosis, and treatment [27]. Access to scientific health data is essential for further scientific progress and innovation [28]. Clinical, evidence-based decision making ideally requires a foundation in big data to support decision making [2][28]. Simultaneously, the optimized use of personal patient data can fundamentally transform healthcare, individual understanding, and disease prevention [26]. Open data availability can provide new and deeper insights into prevention, diagnosis, and therapy, especially in the context of genomic data [29][30]. Its benefits are particularly pronounced in rare diseases. Big data applications enable deep and precise phenotyping of genetic and rare diseases, offering invaluable insights [31]. Furthermore, data sharing for comparing genetic and epidemiological risk factors is crucial [22]. Therefore, the collaborative use of personal health data for medical research and practice is considered fundamentally significant [32]. Aspects of general quality assurance in healthcare through shared data usage are critical [28]. However, using data from health-related activities has raised new ethical challenges related to data privacy, integrity, and appropriate use [27]. The ability to link individual data records is considered a central element for medical research while simultaneously being ethically sensitive due to the potential to gain deep insights into very intimate aspects [3]. This has revealed societal and individual contradictions and dilemmas [33]

3. Barriers and Support Factors of Open Data

3.1. Barriers

3.1.1. ELSI

Ethical and Social Implications

Several ethical obstacles for sharing and analyzing data are cited in the studies [16][28][31]. A significant ethical and societal dilemma is that the potentially great benefits of Open Data may not materialize due to data privacy concerns [27]. Normative standards for ethical scrutiny are currently lacking, which can lead to physical and psychological harm in the re-use of data [34]. Individual harmlessness is perceived to be at risk when data are sold, leading to re-identification or extortion, which can result in financial, physical, psychological, and emotional harm [27]. Risks and benefits of data sharing must, therefore, be carefully weighed [10][19][34], and it is unclear whether specific datasets can be ethically released [10]. There is also concern about subsequent unethical and inappropriate projects in secondary use with a risk to privacy [27][33][35]. Ethical issues are mainly seen in further health data exchange [36]. Another challenge is the wide range of methodologies and practices within Open Data, each involving specific legal and ethical issues [29]. An essential ethical problem is that Open Data are irrevocable and cannot be retrieved [30].

Legal and Policy Implications

Political decisions can make data sharing and analysis difficult [16][28]. There is often a lack of political priority and action to drive an Open Data culture [10][11][37]. Many barriers are also cited on the legal side [13][16][19][21][27][38][39]. Many legal aspects and standards on Open Data are still unresolved [2][11][21] and vaguely regulated [11][19], especially in the area of data security and consent procedures [19][31], as well as in the further, transparent use of data [19][27][40].

3.1.2. Personnel

A lack of interest in Open Data structures is generally described as an obstacle [41]. Further, parts of the public are reluctant to increase data collection and fear increasing surveillance [8]. A lack of human resources for implementation is a significant difficulty [11][16][42]. Human resources are needed to use the technology and provide data [13][40][42]. In this regard, a significant hurdle is seen in potential users’ lack of skills and abilities [41]. There is a lack of time and resources to acquire the missing skills [43]. There is a lack of background knowledge about Open Data [7][24], and the goals of Open Data use are not understood [5]. Here, the perspective and awareness toward a data-centric culture have yet to develop and benefits to be recognized [5][10][42] even among experts [33] and executives [11]. There is a further lack of awareness about data privacy and sharing opportunities [19][44]. The potential uses of data are not recognized [13]. Similarly, there are insufficient skills in publishing and making data available [13]. Particularly in healthcare settings, this technical expertise is often not available [26][28]. Access to Open Data is also a significant challenge [10][43]. Often, this is too complicated for inexperienced users [43], and knowledge about data platforms’ purpose and use is sometimes unavailable [33]. In this context, potential users are described as heterogeneous regarding resources and knowledge [12].

3.1.3. Data Structure

Multiple data sources are considered a technical barrier in the data structure, leading to highly heterogeneous data structures [2][19][45]. Not all data are suitable for Open Data [10]. However, ensuring data interoperability for analysis and storage is deemed essential [2][13][16][24][28][46]. Various system architectures and data infrastructures are described as a root cause for the lack of interoperability [24][28][42]. Institutions with specific orientations, such as patient groups, can introduce diverse data formats [28]. The non-conformity of medical applications with web-based standards is also noted [33]. The involvement of numerous stakeholders in data collection and ownership increases the risk of data distortion [28][45][47]. Data collection often neglects data formats and validity [13][21]. Many datasets are fragmented and stored in various data silos, further limiting their subsequent processing and exchange [6][10][11][21][23][24][42][46]. The fragmentation is, in turn, attributed to differing semantics and data formats [6].

3.1.4. Technical

Data sharing and analysis often require the implementation of administrative and analytical systems, which frequently lack the necessary technical resources [11][16][19][28]. In some cases, the essential technical infrastructure is absent [13][16] with closed system architectures [1][13]. Additionally, many different systems, infrastructures, data formats, and cybersecurity protocols are considered another technological barrier [24][42]. Furthermore, analytical techniques that are currently insufficiently available are required [2][7][10][11][19]. Comprehensive solutions for all data types are still lacking [16][48]. Data management is also described as challenging [32][42]. Data storage is perceived as a hurdle [16][23][24], and there is a lack of established standards for data storage [16] and data processing [10][44], especially when integrating data into third-party applications [10]. Concerns about handling large data volumes and their secure storage are widespread [2][24][26]. Some data storage facilities still employ outdated technical security measures [44], and central storage can pose an elevated security risk [23]. Lack of central exchange portals [11] and uncertainty about the most suitable repositories [15][24] create obstacles. Limited resources for data provision also hinder progress [13].

3.1.5. Data Misuse

Concerns and fears about potential data misuse in the exchange and analysis of anonymized personal data have been described [1][15][27][40][49]. Simultaneously, the public has become more critical and sensitive to data misuse [8]. Even when personal data are collected and used with good intentions, there can be no guarantees for their future use [26]. Possible data breaches, unauthorized data access, malicious attacks, or illegal data sales raise further cybersecurity concerns [26][30][35][50]. Healthcare facilities, in particular, are increasingly identified as attractive targets for hackers [34]. Digital files can be easily shared, either unintentionally by responsible individuals or through illegal practices as exemplified by cases like the Cambridge Analytica scandal or ransomware attacks such as WANNACRY [26][49]. Additionally, there are concerns and apprehensions that when sharing data, information may be misinterpreted or that the accessing party may not meet sufficient confidentiality requirements or lack a duty of confidentiality [26]. Sleigh also postulated a fear of a surveillance state arising from collecting publicly sourced personal data [4]. There is also a general mistrust of data use by public institutions [37].

3.1.6. Clarification and Consent

Express consent for the re-use of data for other purposes is considered a significant barrier [1][11][16][24][26][41]. Particularly, reusing research data is often not addressed in prior consent [10][16][18]. In research, data are typically linked to the research project, which can be problematic for secondary use [41][47]. For example, with real-time patient data from COVID-19 case records, obtaining consent and sharing data proved to be a significant challenge [6]. Differences in the interpretation of consents according to the GDPR and the Declaration of Helsinki are also cited as obstacles [16]. In medical research, prior informed consent may not always be feasible [16]. These missing terms of use and consent often lead to data isolation in data silos [24]. Dynamic consents as a potential solution pose their own challenges, including the potential under-representation of minorities through this process [16] and the need for ongoing contact with data contributors, which can be highly resource-intensive [1][24].

3.1.7. Stigmatization

When data are shared, there are significant concerns about potential stigmatization [10][26][30][34][51]. Stigmatization can lead to extensive and long-lasting problems, especially when data are interconnected [30]. Digital profiling is perceived as increasingly unpredictable [52], and there are fears of possible re-identification [1][26][33]. According to Thoral et al. the risk of re-identification can never be entirely eliminated, especially in cases involving criminal or terrorist motives [25]. Individuals could be embarrassed by the publication of their data [10]. Discrimination may only loom in the future [35]. There is also fear of financial and professional discrimination, especially when disclosing health data [26]. Concerns exist about the rationing and reduction in services and rising individual costs [26][53].

3.1.8. Institutional Barriers

Several administrative barriers and organizational challenges are described that make data sharing and analysis difficult [28]. In some cases, the processes for establishing new administrative systems are described as so slow that they cannot keep up with rapid technological developments [42]. Necessary data access agreements between data repositories and users are estimated to be administratively burdensome and lengthy [16]. Additional contractual mechanisms often need to be created to regulate access in compliance with data protection laws, which requires a lot of time and trust from the parties involved [32]. Further, resistance to the transformation of facilities and institutions is also cited as a barrier [24] and a lack of institutional capacity [37]. This raises the question of whether organizations are interested in collecting and publishing data on an ongoing basis [13]. Releasing data may not be consistent with the organization’s goals [13].

3.1.9. Economic Barriers

The introduction and establishment of management and analysis systems for the use of Open Data requires financial and economic resources [16][19][38], which are often lacking [10][11], especially in the health sector [38]. In addition, technical solutions evolve very diversely and rapidly, which requires recurring investments [16]. In addition, external collaborations are described as expensive [11][18]. Investments and deploying resources also require a high intrinsic motivation in dealing with Open Data [47]. In collaboration with technology companies, new or increasing dependencies could also arise [20].

3.1.10. Data Privacy

Preserving data confidentiality and safeguarding the privacy and identity of individuals are widely recognized as significant challenges [38][41][44]. In the era of big data, the potential for privacy breaches has increased [50]. Furthermore, achieving comprehensive compliance with the GDPR is often perceived as demanding, particularly in the context of medical data exchange and reuse [16][24][38][44]. On the other hand, it has been observed that some stakeholders may disregard regulations, which can further undermine trust in data protection [20]. Adhering to stringent data privacy measures can also potentially reduce the overall value and utility of the data by limiting the analytical potential [16].

3.1.11. Commercial Interests

Commercial interests through Open Data are viewed critically [52]. Sharing data for commercial purposes is called a red line [20][26]. Furthermore, personal advertising, disease-related health marketing, and unsolicited contact are viewed critically [1][26][54]. Insurance companies, in particular, would have a great interest in health data in this regard, with the risk of economic disadvantage for the individuals [27]. There are also fears of selling data for commercial gain [35], especially against conditions of the general capitalist framework [4].

3.1.12. Trust

Lack of trust in the processes for sharing and using data is presented as a significant challenge. Here, trust towards stakeholders varies widely, with healthcare institutions and public authorities being trusted even more than private actors. Especially in large tech corporations [26] and private commercial companies, great distrust is evident in data reuse [26][54]. Further, there is mistrust toward the responsible and executing individuals [26][33], including health professionals [33][53]. Lack of trust has been cited as a critical problem in medical data sharing [44]. Further, there may be a loss of trust in the relationship between health professionals and patients when data are shared [26][53]. The difficult balance between risk management and confidentiality when disclosing, for example, data that are hazardous to health is emphasized here and can lead to data withholding [53].

3.1.13. Communication

Communication about the use and purpose or benefit of data sharing is seen as a critical challenge. Projects have failed in government and private collaborations due to a lack of communication about plans and intentions regarding data use [20]. The lack of transparency and communication can further contribute to mistrust of the government, health workers, patients, and the public [53]. In this regard, all stakeholders involved have requirements for transparency and communication when data are shared [34]. Government institutions, in particular, find it challenging to establish transparency concerning the further use of data and the expected benefits [42]. Lack of transparency in communication leads to different understandings of the benefits and purposes of data sharing [53] and creates problems in the establishment of data standards, such as the FAIRification of data [6].

3.2. Support Factors

3.2.1. Positive Outcome

Public Sector/Public Authorities

Public policy should be improved by using and exploiting OGD [8][10][11]. Open Data are considered an essential resource for public services to understand local needs better [5]. In addition, Open Data are increasingly creating opportunities for participation [11][39]. Transparency and accountability are increased through OGD, and political and social benefits are assumed [10][11][12][39]. Further, it is hoped that there will be improvement and support in social and political decision-making processes, as any problems can be better identified and problem-solving capacities are improved [8][10][11][12]. Access to external capacities and resources for problem solving emerges, which improves decision making [11]. The use of collective intelligence to solve public problems can occur [10][11]. Further, Open Data can enable more citizen participation, public engagement, and informed decision making [5][8][10][11][12]. Kawashita et al. cite increased social control through Open Data in this regard [11]. Open Data are described as a resource for community activism [5]. The collaboration of stakeholders can be strengthened, and political and social initiatives can be motivated. This can also improve overall trust in the government [10][12].

Entrepreneurship

A positive impact of Open Data on economic growth and the overall economy is described [5][10][12]. Open Data supports the transition to a knowledge-based economy, and a gain in knowledge about the digital transformation for all stakeholders occurs. Through Open Data, the competitiveness of all sectors can be increased, and information for potential investments is better provided [10]. Further, Open Data are considered a resource for innovation [1][5][10] and foster social and commercial value creation [11][39]. New processes, products, or services can be developed, or existing ones can be improved [10][11]. Innovations from the private sector should also support the mechanization of public authorities [42].

Research

Open data can be profitable for future research and accelerate innovation and discovery [4][15][30][35]. Open data improve transparency and reproducibility and protect against manipulation, further solidifying the scientific peer review process [15][47]. This strengthens trust in science, which is considered essential for discourse [47]. There are financial savings in access and labor costs to data [15], and disadvantages due to inaccessible research data can be avoided [47][55].

Healthcare Sector

The reuse of personal health data is expected to improve healthcare and quality of life for individuals and the community as a whole [2][4][10][20][35][56]. Also, the care needs of an aging population can be better understood through health data [20]. Further, Open Data supports more individualized precision medicine and quality of care [2][35]. Sharing health data also strengthens personal engagement with one’s own data [4]

3.2.2. ELSI

Ethical Implications

Generally, an ethical approval system based on established guidelines and regulations is recommended [27][33]. Mahomed and Labuschaigne also emphasize the necessary data competence that members of an ethics committee need [34]. Full ethical transparency is considered a sustainable and promising way to do this [27]. Another critical aspect is data management and use ethics [49]. It is essential to consider the tension between individual needs in use and the desire to maintain privacy and confidentiality [33]. Addressing the information that may promote potential stigma or otherwise be used against the individual’s best interest is enormously important [21]. Freedom from harm for the individual must be fully considered [27]. Individual protection from financial, physical, psychological, or emotional harm to a person should be considered [27]. Human dignity should be considered in all data processes, considering that people are behind the data, so respectful and responsible handling is required [27].

Social Implications

In addition, the consequences for society and the benefits for science must also be considered [27]. The degree of possible identifiability is of central importance for the public assessment of data sharing [27]. Thus, the benefit of the data, financed by public tax money, should be maximized for society [27]. Here, even an obligation of the re-use is implied, for the research data are usually very expensive in your emergence. In addition, there is a scientific necessity for the use of large and Open Data sets in order to be able to answer specific research questions [27].

Law and Policy

Policy frameworks are described as conducive to establishing and using Open Data [11][16][56][57]. In this context, a discernible political will for Open Data and the support and commitment of political leadership are essential [11][57]. Further, there are external constraints and constellations of pressure from international organizations and standards [11][58]. Professional procedures in data collection and sharing must be followed [27]. Mahomed and Labuschaigne for example, suggest agreeing on additional data transfer agreements that support legal and ethical standards, especially in the case of international data sharing and different data protection bases [34]. Clearly formulated legal regulations are considered another prerequisite for establishing Open Data [10][20][37][44][56]. These legal frameworks help convince people that their interests are protected [20][49]. In doing so, it is essential to respect the rights of individuals, especially the rights to privacy, autonomy, freedom, human dignity, and justice [27]. The right to privacy includes the right not to be identified without consent. The right to autonomy includes the right to decide freely whether or not to disclose data and which data to disclose. In this context, the extent of a potential threat to rights must be determined in advance to adapt security measures to the needs of the person at risk [27]. Data producers want to retain full control over the data [49]. Achieving this requires comprehensive information at many levels about the purpose, method, outcome, and data sharing. Protection by external regulatory frameworks is considered particularly important, especially for vulnerable groups. Further, respecting the freedom of research is an essential perspective for the research community [27].

3.2.3. Technical Infrastructure

The technical systems and existing infrastructure must be secure and resilient [23][26][44] and are also required at the European level [16]. The security and design of the infrastructure also influence trust and the willingness to share data [1][26]. A data-protection-friendly infrastructure in which the data never leave the institution is advantageous [16][28]. To this end, internal data trustees and a separate data analysis depository are recommended [16]. In addition, the technical infrastructure should harness the benefits of artificial intelligence [44]. For example, the possibility of machine learning makes centralized storage superfluous, making legally and ethically problematic agreements on data transfer unnecessary. Machine learning uses an iterative process to exchange information between databases and not between individual data records. Furthermore, high usability and user-friendly software are recommended for handling data [13][28][43][56], especially when converting raw data according to the FAIR principles [28]. Prepared access is crucial, especially for technically inexperienced users [43]. The available equipment must be supportive and good [57]. The interfaces must enable and support interaction and management with the data [49]. Appropriate platforms and tools for use are considered to be very helpful [5][11]. The data portals should facilitate fast and convenient searches and support data discovery, for example, through automatic visualizations [43]. Open data portals are also seen as beneficial [11].

3.2.4. Data Access

Data sharing should be planned and regulated [5][16][18][26][59]. Planning and all sharing aspects are considered essential to avoid complications in consent, data collection, grant compliance, data format, and data sharing, and should be conducted as early as possible [42][59]. In data use, the highest level of transparency should always be ensured with clarity on the purpose and goal of data sharing [1][21]. The goal should be useful and functional data sharing [46]. Contracts can support a trusting data-sharing relationship in this regard [54]. Access controls to data are described as a significant and beneficial aspect [21][23][26][31][46]. Developing an access model/access procedure for the data is considered necessary [16][20][37]. Adherence to transparency in access is critical in this regard [21]. Thus, trust, acceptance, and willingness to share data depend significantly on access arrangements [26][35]. Data subjects should be able to view and review data sharing [26]. Data donors must be protected against unlawful access [21]. This requires options for simple and secure authentication [44] and the use of controllable access protocols [16].

3.2.5. Education and Training

Linguistic, technical, and legal competencies are needed for Open Data processes [36]. This involves understanding how information is stored, managed, and shared [49][51]. Maximum engagement with these processes should be ensured [1][24][35]. Individuals often have special needs in this regard to enable use [43]. For this, the effective training and development of professionals is essential, and learning resources are needed to train individuals with specific skills [5][36][43][44][46]. Increased awareness of the use of Open Data is generally helpful [11]. Therefore, all stakeholders involved in Open Data should be intensively involved and sensitized [40][46]. General knowledge about current developments and possibilities and AI-supported data processing must be conveyed [1][40]. Training on technical backgrounds, personal fears, and possible cognitive overloads related to the use of Open Data is beneficial [5].

3.2.7. Data Structure

The structure and characteristics of the data represent significant factors that can favor further uses [56]. It is essential to ensure data quality [11][57]. Further, data quality standards according to FAIR principles represent a conducive data structure for Open Data processes [6][16][28]. There are high requirements for datasets in terms of security, integrity, authenticity, access controls, confidentiality, and availability [23]. It is considered crucial that only anonymized data are shared [1]. Datasets following the FAIR principle are considered a prerequisite for distributed analysis and machine learning. Therefore, all stakeholders who want to participate in reusing data must agree on a data model according to the FAIR principles, and institutions need different tools to FAIR-ify the data [28]. Its structure and form influence the FAIR-ification of data. In this regard, FAIR principles improve structure and discoverability, especially for health data and the intent of international exchange [6]. FAIR-ification also offers the opportunity to establish data privacy and data security through anonymization or synthetic case representations and should, therefore, be implemented in a data management plan [6].

3.2.8. Trust

The trust of users and data donors in the various data processes is considered an essential prerequisite for success and must be built and increased [24][46]. During data collection, a trusting relationship is already crucial [51]. Similarly, the importance of existing trust from the users to the data managers is emphasized [49]. Similarly, the data ecosystem and technical infrastructure must be trusted [1][37]. This trust is achieved through strict security measures, transparency of measures, and strict access to data [21][26]. In addition, society trusts different public and economic sectors to different degrees, and the scientific sector, in particular, must be perceived as absolutely trustworthy [27]. Similarly, established international institutions enjoy high trust [18]. There is also trust in governmental health institutions for responsible use [20]. Trust in the government and public authorities is further considered crucial [37]. Particularly in the case of cooperation between government and private institutions, a focus on building trust is essential [20]. However, even between departments within an institution, trust between professionals is critical for data sharing [51].

3.2.9. Consent Procedures

Consent procedures must adapt to the new realities of personal data sharing and use [1][16][23][31]. An important step can be a data governance policy that clearly regulates how consent is available [6]. Further, free choice and autonomy for data sharing are emphasized as necessary [26][35], especially for accepting possible sharing [35]. In this context, the new type of consent structures must explicitly integrate, for example, further processing of personal data within research [31]. Data producers want to retain control over personal-related data [49].

3.2.10. Collaborations

To address complexity, collaboration among institutions, as well as with policymakers and other stakeholders, is described as beneficial and significant [5][16][20][25][28][56]. Fylan and Fylan emphasize that collaboration with trust-based institutions is essential [26]. Thus, the government health system or government institutions should have an important role, for example, in data management [18][26]. The guiding principles of equity, solidarity, and quality in data use should be considered in partnerships between public and private institutions [20]. In this regard, Kawashita et al. emphasize the beneficial synergies in collaborating public and private partnerships [11]. The responsible use of data is enormously important for all participants and the entire process [25]. A multidisciplinary approach is seen as helpful, especially in developing open-access databases [25].

3.2.11. Communication

The increasingly broad communication and dissemination of Open Data to society, for example, by committees and journalists, is described as beneficial [8][11][35]. Raising public awareness of the benefits of sharing personal data is particularly important [35]. In this regard, improving communication with unknowing individuals who have previously had little awareness of the topic is essential [33]. In addition, communication is critical to learning user needs and technical requirements [6][12]. Data sharing must be consistently communicated at all levels and processes [13].

3.2.12. Economic Aspects

An increased demand for efficiency and cost savings can be conducive to supporting the reuse and re-purposing of data in general [42]. Available financial resources and existing digitization capacity are considered particularly conducive, as is a willingness to disclose data internally [11]. Fischer et al. emphasize, in particular, the saving of financial resources for the usually expensive data collection [47].

3.2.13. Institutional Aspects

The trend toward increased data sharing among departments and institutions is generally described as a positive and conducive factor [51]. Sandoval-Almazan et al. describe the internal design of rules for data processes as conducive [37]. Kawashita et al. emphasize how conducive a positive and changeable organizational culture is perceived to be for this process [11]. A structured data management plan approach is also crucial for success [6][24][44]. A data management plan allows individual challenges in interoperability, cybersecurity, and existing and necessary infrastructure to be considered [24].

3.2.14. Participation

It is crucial to involve all stakeholders in Open Data processes [45][53]. For example, in future policy developments, diverse stakeholders, minorities, professionals, and ethicists should be involved in creating guidelines for data sharing [53]. Especially for data donation and the development of AI-based systems, the participation of the target group is of high importance, as the participation of the stakeholders involved also raises awareness on the topic of Open Data [1]. Horn and Kerasidou cite an expanded say for data processes in this context [20]. A participatory essential attitude is also crucial in designing corresponding data platforms in this context. Such participatory platforms should focus on empowering data donors [33]. People must also be willing to participate in this process [40].

4. Conclusions

Various aspects of the data structure and technical infrastructure can be both hindering and conducive to establishing Open Data processes. In particular, aspects of data protection are perceived as a barrier with a fear of data misuse and stigmatization. On the other hand, aspects of controlled data access with transparent access rules are seen as conducive to reducing existing concerns and fears. The process of education and consent is also described as both hindering and beneficial, depending on how many resources are put into these elementary processes and how serious the processes are, leading to satisfied and fully informed individuals. The ethical, legal, and social implications (ELSI), as well as institutional and economic aspects, are also described as hindering and promoting, depending on which priorities are placed on these aspects and how they are pursued and observed. If there are commercial interests, this is generally described as a hindrance, whereas transparent and trustworthy collaborations, primarily with research institutions and state institutions, are considered trustworthy and feasible. Furthermore, personnel aspects can be both a help and a hindrance. On the one hand, many skills and abilities have been reported among people involved in Open Data. In principle, person-centered training and further education focusing on creating more awareness for individual Open Data processes and establishing a data-oriented culture are seen as particularly beneficial here. Furthermore, the areas of trust and communication can have both a positive and negative effect, whereby the participation of the actors involved is described as particularly beneficial, as is transparent and appreciative communication that includes all actors involved equally. The disclosure of possible positive outcomes and benefits and the integration of best-practice examples are considered particularly conducive to establishing Open Data processes.

References

  1. Kamikubo, R.; Lee, K.; Kacorri, H. Contributing to Accessibility Datasets: Reflections on Sharing Study Data by Blind People. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023.
  2. Rehman, A.; Naz, S.; Razzak, I. Leveraging big data analytics in healthcare enhancement: Trends, challenges and opportunities. Multimed. Syst. 2022, 28, 1339–1371.
  3. Ethikrat, D. Big Data und Gesundheit: Datensouveränität als Informationelle Freiheitsgestaltung: Stellungnahme: Kurzfassung; Bundesministerium für Gesundheit: Bonn, Germany, 2017.
  4. Sleigh, J. Experiences of Donating Personal Data to Mental Health Research: An Explorative Anthropological Study. Biomed. Inform. Insights 2018, 10, 1178222618785131.
  5. Dove, G.; Shanley, J.; Matuk, C.; Nov, O. Open Data Intermediaries: Motivations, Barriers and Facilitators to Engagement. Proc. ACM Hum.-Comput. Interact. 2023, 7, 1–22.
  6. Queralt-Rosinach, N.; Kaliyaperumal, R.; Bernabé, C.H.; Long, Q.; Joosten, S.A.; van der Wijk, H.J.; Flikkenschild, E.L.A.; Burger, K.; Jacobsen, A.; Mons, B.; et al. Applying the FAIR principles to data in a hospital: Challenges and opportunities in a pandemic. J. Biomed. Semant. 2022, 13, 12.
  7. Roguljić, M.; Šimunović, D.; Poklepović Peričić, T.; Viđak, M.; Utrobičić, A.; Marušić, M.; Marušić, A. Publishing Identifiable Patient Photographs in Scientific Journals: Scoping Review of Policies and Practices. J. Med. Internet Res. 2022, 24, e37594.
  8. Rempel, E.; Barnett, J.; Durrant, H. Contrasting views of public engagement on local government data use in the UK. In Proceedings of the 12th International Conference on Theory and Practice of Electronic Governance, Melbourne, Australia, 3–5 April 2019; Ben Dhaou, S., Ed.; ACM Digital Library: New York, NY, USA, 2019; pp. 118–128.
  9. Seo, J.; Kim, B.; Kwon, H.Y. Open Data Policies Analysis Disputes Mediation Cases in Korea: Based on OUR Data Index and ODB. In Proceedings of the DG.O2021: The 22nd Annual International Conference on Digital Government Research, Omaha, NE, USA, 9–11 June 2021; ACM Digital Library: New York, NY, USA, 2021; pp. 153–167.
  10. Mutambik, I.; Nikiforova, A.; Almuqrin, A.; Liu, Y.D.; Floos, A.Y.M.; Omar, T. Benefits of Open Government Data Initiatives in Saudi Arabia and Barriers to Their Implementation. J. Glob. Inf. Manag. 2022, 29, 1–22.
  11. Kawashita, I.; Baptista, A.A.; Soares, D. Open Government Data Use by the Public Sector—An Overview of its Benefits, Barriers, Drivers, and Enablers. In Proceedings of the 55th Hawaii International Conference on System Sciences, Maui, HI, USA, 4–7 January 2022.
  12. Smith, G.; Sandberg, J. Barriers to innovating with open government data: Exploring experiences across service phases and user types. Inf. Polity 2018, 23, 249–265.
  13. Crusoe, J.; Melin, U. Investigating Open Government Data Barriers; Springer: Cham, Switerland, 2018; pp. 169–183.
  14. Pesqueira, A.; Sousa, M.J.; Rocha, Á. Big Data Skills Sustainable Development in Healthcare and Pharmaceuticals. J. Med. Syst. 2020, 44, 197.
  15. Dos Santos Rocha, A.; Albrecht, E.; El-Boghdadly, K. Open science should be a pleonasm. Anaesthesia 2023, 78, 551–556.
  16. Eva, G.; Liese, G.; Stephanie, B.; Petr, H.; Leslie, M.; Roel, V.; Martine, V.; Sergi, B.; Mette, H.; Sarah, J.; et al. Position paper on management of personal data in environment and health research in Europe. Environ. Int. 2022, 165, 107334.
  17. Phillips, M.; Knoppers, B.M. Whose Commons? Data Protection as a Legal Limit of Open Science. J. Law Med. Ethics 2019, 47, 106–111.
  18. Medley, N.; Cuthbert, A.; Crew, R.; Stewart, L.; Smith, C.T.; Alfirevic, Z. Developing a topic-based repository of clinical trial individual patient data: Experiences and lessons learned from a pilot project. Syst. Rev. 2021, 10, 162.
  19. Schwalbe, N.; Wahl, B.; Song, J.; Lehtimaki, S. Data Sharing and Global Public Health: Defining What We Mean by Data. Front. Digit. Health 2020, 2, 612339.
  20. Horn, R.; Kerasidou, A. Sharing whilst caring: Solidarity and public trust in a data-driven healthcare system. BMC Med. Ethics 2020, 21, 110.
  21. Feeney, O.; Werner-Felmayer, G.; Siipi, H.; Frischhut, M.; Zullo, S.; Barteczko, U.; ∅ystein Ursin, L.; Linn, S.; Felzmann, H.; Krajnović, D.; et al. European Electronic Personal Health Records initiatives and vulnerable migrants: A need for greater ethical, legal and social safeguards. Dev. World Bioeth. 2020, 20, 27–37.
  22. Bentzen, H.B.; Castro, R.; Fears, R.; Griffin, G.; ter Meulen, V.; Ursin, G. Remove obstacles to sharing health data with researchers outside of the European Union. Nat. Med. 2021, 27, 1329–1333.
  23. Alzahrani, A.G.; Alhomoud, A.; Wills, G. A Framework of the Critical Factors for Healthcare Providers to Share Data Securely Using Blockchain. IEEE Access 2022, 10, 41064–41077.
  24. Hallock, H.; Marshall, S.E.; ’t Hoen, P.A.C.; Nygård, J.F.; Hoorne, B.; Fox, C.; Alagaratnam, S. Federated Networks for Distributed Analysis of Health Data. Front. Public Health 2021, 9, 712569.
  25. Thoral, P.J.; Peppink, J.M.; Driessen, R.H.; Sijbrands, E.J.G.; Kompanje, E.J.O.; Kaplan, L.; Bailey, H.; Kesecioglu, J.; Cecconi, M.; Churpek, M.; et al. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit. Care Med. 2021, 49, e563–e577.
  26. Fylan, F.; Fylan, B. Co-creating social licence for sharing health and care data. Int. J. Med. Inform. 2021, 149, 104439.
  27. Johansson, J.V.; Bentzen, H.B.; Mascalzoni, D. What ethical approaches are used by scientists when sharing health data? An interview study. BMC Med. Ethics 2022, 23, 41.
  28. Deist, T.M.; Dankers, F.J.W.M.; Ojha, P.; Scott Marshall, M.; Janssen, T.; Faivre-Finn, C.; Masciocchi, C.; Valentini, V.; Wang, J.; Chen, J.; et al. Distributed learning on 20,000+ lung cancer patients—The Personal Health Train. Radiother. Oncol. J. Eur. Soc. Ther. Radiol. Oncol. 2020, 144, 189–200.
  29. McWhirter, R.; Eckstein, L.; Chalmers, D.; Critchley, C.; Nielsen, J.; Otlowski, M.; Nicol, D. A Scenario-Based Methodology for Analyzing the Ethical, Legal, and Social Issues in Genomic Data Sharing. J. Empir. Res. Hum. Res. Ethics JERHRE 2020, 15, 355–364.
  30. Kuo, T.T.; Jiang, X.; Tang, H.; Wang, X.; Harmanci, A.; Kim, M.; Post, K.; Bu, D.; Bath, T.; Kim, J.; et al. The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition. J. Am. Med. Inform. Assoc. JAMIA 2022, 29, 2182–2190.
  31. Nellåker, C.; Alkuraya, F.S.; Baynam, G.; Bernier, R.A.; Bernier, F.P.J.; Boulanger, V.; Brudno, M.; Brunner, H.G.; Clayton-Smith, J.; Cogné, B.; et al. Enabling Global Clinical Collaborations on Identifiable Patient Data: The Minerva Initiative. Front. Genet. 2019, 10, 611.
  32. Scheibner, J.; Raisaro, J.L.; Troncoso-Pastoriza, J.R.; Ienca, M.; Fellay, J.; Vayena, E.; Hubaux, J.P. Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis. J. Med. Internet Res. 2021, 23, e25120.
  33. Househ, M.; Grainger, R.; Petersen, C.; Bamidis, P.; Merolli, M. Balancing between Privacy and Patient Needs for Health Information in the Age of Participatory Health and Social Media: A Scoping Review. Yearb. Med. Inform. 2018, 27, 29–36.
  34. Mahomed, S.; Labuschaigne, M.L. The evolving role of research ethics committees in the era of open data. S. Afr. J. Bioeth. Law 2023, 15, 80–83.
  35. Nunes Vilaza, G.; Coyle, D.; Bardram, J.E. Public Attitudes to Digital Health Research Repositories: Cross-sectional International Survey. J. Med. Internet Res. 2021, 23, e31294.
  36. Floridi, L.; Luetge, C.; Pagallo, U.; Schafer, B.; Valcke, P.; Vayena, E.; Addison, J.; Hughes, N.; Lea, N.; Sage, C.; et al. Key Ethical Challenges in the European Medical Information Framework. Minds Mach. 2019, 29, 355–371.
  37. Sandoval-Almazan, R.; Valle Gonzalez, L.; Millan Vargas, A. Barriers for Open Government Implementation at Municipal Level: The Case of the State of Mexico. In Proceedings of the DG.O2021: The 22nd Annual International Conference on Digital Government Research, Omaha, NE, USA, 9–11 June 2021; ACM Digital Library: New York, NY, USA, 2021; pp. 113–122.
  38. Vianen, N.J.; Maissan, I.M.; den Hartog, D.; Stolker, R.J.; Houmes, R.J.; Gommers, D.A.M.P.J.; van Meeteren, N.L.U.; Hoeks, S.E.; van Lieshout, E.M.M.; Verhofstad, M.H.J.; et al. Opportunities and barriers for prehospital emergency medical services research in the Netherlands; results of a mixed-methods consensus study. Eur. J. Trauma Emerg. Surg. 2023.
  39. Wieczorkowski, J. Barriers to Using Open Government Data. In Proceedings of the 2019 3rd International Conference on E-commerce, E-Business and E-Government, Lyon, France, 18–21 June 2019; ACM Digital Library: New York, NY, USA, 2019; pp. 15–20.
  40. Aleixandre-Benavent, R.; Vidal-Infer, A.; Alonso-Arroyo, A.; Peset, F.; Ferrer Sapena, A. Research Data Sharing in Spain: Exploring Determinants, Practices, and Perceptions. Data 2020, 5, 29.
  41. Tan, A.C.; Askie, L.M.; Hunter, K.E.; Barba, A.; Simes, R.J.; Seidler, A.L. Data sharing-trialists’ plans at registration, attitudes, barriers and facilitators: A cohort study and cross-sectional survey. Res. Synth. Methods 2021, 12, 641–657.
  42. van Donge, W.; Bharosa, N.; Janssen, M.F.W.H.A. Future government data strategies: Data-driven enterprise or data steward? In Proceedings of the 21st Annual International Conference on Digital Government Research, Seoul, Republic of Korea, 15–19 June 2020; Eom, S.J., Ed.; ACM Digital Library: New York, NY, USA, 2020; pp. 196–204.
  43. Wolff, A.; Tylosky, N.; Hasan, T. Open Data Inclusion through Narrative Approaches. In Proceedings of the 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS), Pittsburgh, PA, USA, 21–29 May 2022; pp. 125–129.
  44. Fischer-Hübner, S.; Alcaraz, C.; Ferreira, A.; Fernandez-Gago, C.; Lopez, J.; Markatos, E.; Islami, L.; Akil, M. Stakeholder perspectives and requirements on cybersecurity in Europe. J. Inf. Secur. Appl. 2021, 61, 102916.
  45. Broes, S.; Lacombe, D.; Verlinden, M.; Huys, I. Toward a Tiered Model to Share Clinical Trial Data and Samples in Precision Oncology. Front. Med. 2018, 5, 6.
  46. Tuler de Oliveira, M.; Amorim Reis, L.H.; Marquering, H.; Zwinderman, A.H.; Delgado Olabarriaga, S. Perceptions of a Secure Cloud-Based Solution for Data Sharing During Acute Stroke Care: Qualitative Interview Study. JMIR Form. Res. 2022, 6, e40061.
  47. Fischer, C.; Hirsbrunner, S.D.; Teckentrup, V. Producing Open Data; Pensoft Publishers: Sofia, Bulgaria, 2022; Volume 8, p. e86384.
  48. Csányi, G.M.; Nagy, D.; Vági, R.; Vadász, J.P.; Orosz, T. Challenges and Open Problems of Legal Document Anonymization. Symmetry 2021, 13, 1490.
  49. Alorwu, A.; Kheirinejad, S.; van Berkel, N.; Kinnula, M.; Ferreira, D.; Visuri, A.; Hosio, S. Assessing MyData Scenarios: Ethics, Concerns, and the Promise. In Proceedings of the CHI’21, 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; Kitamura, Y., Quigley, A., Isbister, K., Igarashi, T., Bjørn, P., Drucker, S., Eds.; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1–11.
  50. Wang, C.; Guo, F.; Ji, M. Analysis of Legal Issues of Personal Information Protection in the Field of Big Data. J. Environ. Public Health 2022, 2022, 1678360.
  51. Smart, M.A.; Sood, D.; Vaccaro, K. Understanding Risks of Privacy Theater with Differential Privacy. Proc. ACM Hum.-Comput. Interact. 2022, 6, 1–24.
  52. Burgess, J.P.; Floridi, L.; Pols, A.; van den Hoven, J. Towards a Digital Ethics: EDPS Ethics Advisory Group. 2018. Available online: https://philpapers.org/rec/BURTAD-3 (accessed on 22 November 2023).
  53. Papageorgiou, V.; Wharton-Smith, A.; Campos-Matos, I.; Ward, H. Patient data-sharing for immigration enforcement: A qualitative study of healthcare providers in England. BMJ Open 2020, 10, e033202.
  54. van der Burg, S.; Wiseman, L.; Krkeljas, J. Trust in farm data sharing: Reflections on the EU code of conduct for agricultural data sharing. Ethics Inf. Technol. 2021, 23, 185–198.
  55. Kwon, S.; Motohashi, K. Incentive or disincentive for research data disclosure? A large-scale empirical analysis and implications for open science policy. Int. J. Inf. Manag. 2021, 60, 102371.
  56. Zuiderwijk, A.; Spiers, H. Sharing and re-using open data: A case study of motivations in astrophysics. Int. J. Inf. Manag. 2019, 49, 228–241.
  57. Yerden, X.; Luna-Reyes, L.F. Promoting Government Impacts through Open Data: Key Influential Factors. In Proceedings of the DG.O2021: The 22nd Annual International Conference on Digital Government Research, Omaha, NE, USA, 9–11 June 2021; ACM Digital Library: New York, NY, USA, 2021; pp. 180–188.
  58. Galdon Clavell, G. Exploring the ethical, organisational and technological challenges of crime mapping: A critical approach to urban safety technologies. Ethics Inf. Technol. 2018, 20, 265–277.
  59. Rockhold, F.; Bromley, C.; Wagner, E.K.; Buyse, M. Open science: The open clinical trials data journey. Clin. Trials (Lond. Engl.) 2019, 16, 539–546.
More
Video Production Service