Image Annotations in Cultural Heritage Platforms: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Subjects: Others

Cultural heritage is one of many fields that has seen a significant digital transformation in the form of digitization and asset annotations for heritage preservation, inheritance, and dissemination. However, a lack of accurate and descriptive metadata in this field has an impact on the usability and discoverability of digital content, affecting cultural heritage platform visitors and resulting in an unsatisfactory user experience as well as limiting processing capabilities to add new functionalities. Cultural heritage institutions were responsible for providing metadata for their collection items with the help of professionals, which is expensive and requires significant effort and time. In this sense, crowdsourcing can play a significant role in digital transformation or massive data processing, which can be useful for leveraging the crowd and enriching the metadata quality of digital cultural content.

  • cultural heritage
  • crowdsourcing
  • recommendation system
  • word embeddings

1. Introduction

Cultural heritage (CH) is a subfield of digital humanities that encompasses a wide range of different aspects related to the study and preservation of previously conducted human activities and various societal attributes from previous generations. Of course, these aspects have traditionally included tangible elements (both movable or immovable), such as artwork, legacy artefacts, monuments, museums, groups of buildings, or archaeological sites, with a broad range of values from a symbolic, historic, and artistic points of view, that is, a long-established interpretation of heritage usually having significant scientific interest (ethnological, anthropological, architectural…) and a close relationship to social traits, beliefs, and behaviors. But, additionally, a more recent approach to CH is paying a growing attention to intangible features of this inherited patrimony that have been presented in a variety of forms throughout their long history of civilization [1,2], extending the mentioned ones with music, values, traditions, oral history, religious ceremonies, storytelling, or even the more mundane aspects of human life, such as cuisine or clothing, reflecting a shift from a conservation-oriented viewpoint to a value-oriented one.
This systemic perspective that nowadays rules the study of CH depicts both tangible and intangible worthy-of-preservation components as inextricably bound [3], a multidimensional and transversal cross-disciplinary expression of the pillars that explain both the external attributes and the idiosyncrasy of current societies. Together, they conform a shared bound between individuals and communities, between our present and our past, a complex framework that sustains and shapes our thinking and identity as members of a homogeneous local neighborhood or a universal nation. That is, CH is usually a key element in the identity formation of citizens [4].
Another peculiarity of the modern interpretation of CH lies in its live characterization, a changing definition under permanent construction that constantly evolves according to the society values, almost ever expanding in recent times. In any case, whether these CH manifestations are places, objects, traditions, or daily life reflections, there is intensified interdisciplinary research activity that pursues both understanding and preserving such symbols of our ancestors’ way of life, safeguarding such inheritance for future generations regardless where they live. As a result, it has become critical to ensure that, in addition to the heritage originals, the digital material of this heritage is preserved, accessible, and comprehensible over time.
Technological advancements and the CH domain have always had a very contentious relationship, which incorporates the application of information and communication technologies (ICT) in CH, which can certainly become a burden and challenge during users’ cultural experiences [5,6]. The strategic integration of CH and ICTs is not a new concept; for many years, the technology and tourism industries and cultural heritage organizations have collaborated closely to adapt to these developments [4,7,8]. CH institutions can use digital and communication platforms to improve their collection management and offer visitors an exciting experience that extends beyond their physical boundaries [9,10]. Currently, we have more opportunities than ever to allow people to participate in CH activities and become content producers, with many exciting engagement opportunities in digital heritage communities thanks to advanced technologies, applications, and services that can potentially bridge the existing gap between citizens and culture [11,12,13].
Galleries, libraries, archives, and museums (GLAMs) are among the many organizations that have developed major efforts in the processes of digital transformation in recent years, but it is uncommon that they have the resources to properly characterize their digital collection assets, which results in metadata that are typically restricted by the vocabulary, taxonomies, and perspective managed by the specific institution [14]. The limited annotations and the semantic gap that exists between experts and the public will have a negative impact on the CH institution, affecting its visibility and long-term relevance as well as providing a poor user experience on their platforms [15]. The European Union, the American Library Association, and UNESCO [16] have been urged to take advantage of such advances in ICT and launch digitalization projects along with digital collections in order to encourage people to actively engage with CH content [17,18]. However, their major shortcoming is a lack of well-defined and rich annotations for most of the large number of items they comprehend. This type of issue has a significant impact on the accessibility and discoverability of available digital content and limits the usability of these resources in new and interesting ways, hence limiting the user experience.
Metadata quality improvement and enhancement are crucial factors in the CH domain, where the inventory of items that must be described or transcribed is generally too large to be addressed through normal procedures due to the significant time, effort, and resources required [19]. Since it is quite rare for GLAMs and other CH organizations to have the capacity to adequately identify, describe, or enhance digital collection items, they have turned to investigate the potential of crowdsourcing, as the digitization of their artworks is critical for their appropriate promotion on the Internet [20,21]. All around the globe, exploring the potential of crowdsourcing in the CH domain is a growing trend, with CH institutions starting to share metadata descriptions that need to be enhanced, corrected, or annotated, and soliciting public assistance to help improve them [22,23]. More and more, these organizations encourage people to enrich artworks and cultural assets contributing their knowledge and expertise to help improve their descriptions [24], providing annotations voluntarily to complete existing characterizations of paintings, statues, buildings, or even the more intangible aspects of heritage mentioned above, just from pictures or available descriptions of the items from the organization collections or archives [25].
Of course, this approach falls squarely into the open debate in the literature about the advantages and disadvantages of collaborative efforts of interested users (social tags) compared to taxonomies developed by field experts (controlled vocabularies). Even though a large number of research works have been published in recent years [26,27,28,29], it is possible to identify complementary strengths in each technique, and there are no conclusive conclusions about the superiority of a particular approach, the most adequate option being largely dependent on the application scenery, the available resources, and the evaluation criteria. In any case, the research presented in this paper is not affected by this open debate.
While crowdsourcing systems face the challenge of reaching and utilizing an adequate number of available individuals, it is ultimately up to individual persons to discover assets that best match their interests or preferences, even though there is no monetary compensation for participation (the final result should be sufficiently rewarding for these kinds of volunteers). Research about motivating contributors across different types of crowdsourcing systems emphasizes the importance of intrinsic motivation, which occurs when contributors choose tasks that are inherently interesting or enjoyable rather than only extrinsic factors, such as payment [30,31,32]. This is to mean that the preference of people for concrete tasks is not solely determined by rewarding incentives set by the task requester. Instead, they rely heavily on the relationship of the characteristics of the task to the personal interests of the contributor. Therefore, CH crowdsourcing projects should pay more attention to targeting those people who might be especially interested in some types of assets, and they should be more concerned with engaging enthusiasts who can provide their contributions through the use of modern technologies. However, the main issue with most crowdsourcing initiatives is that items on CH platforms are assigned and displayed to all users at random without any distinctions or specific personalization, which frequently implies assigning users to annotation tasks that are far from their preferences and with their subsequent lack of interest.

2. Crowdsourcing Initiatives in the Field of Cultural Heritage

There are plenty of works in the literature describing the overwhelming use of technology in heritage study and preservation in any aspect the reader can imagine [33,34], and these technical advances have incorporated in more recent times a growing number of ICT technologies [35,36,37,38], including the ones used in this work, that is, system recommenders [39] making extensive use of semantic approaches (LOD) [40,41] and machine learning [42,43]. However, much less effort has been dedicated to crowdsourcing initiatives in the field of CH [44,45].
When developing crowdsourcing initiatives in the field of cultural heritage, it is not only important to fully understand the feasible contribution of humans in computation tasks but also the conveniences and disadvantages of software and tools adopted for the facilitation of this process and, increasingly importantly, the motivation of each specific individual. In 2005, The Steve Museum project was one of the first projects launched to explore the crowdsourcing concept and encourage the public to contribute tags to some UK and US museum collections [46]. Tagging has been shown to provide a quite different vocabulary compared to museum official descriptions, with 86% of tags submitted not found in the museum documentation [47]. This was a proof-of-concept project with limited user functions and no support for linking to semantic vocabularies. However, in 2008 the Australian Newspapers Digitization Program can be considered the earliest large-scale initiative developed in this field to date, a project requiring the general public to review and correct a rather poor optical character recognition (OCR) text of millions of articles extracted from their database of historical newspapers. During the life of this successful project, more than 166 million text lines from newspaper articles were reviewed, corrected, or enhanced by volunteers working for the project [48]. In 2009, the Waisda? video labeling game was also released as a popular platform that used gaming to annotate television heritage [49], awarding points to players when their tags match some other one that their adversaries had already submitted within a certain time frame.
Waisda? gameplay put a focus on responsiveness and accuracy, implicitly assuming that tags are genuine if there is widespread agreement among players; but still, the community itself acts as the filter, with no technological or interaction support to correct or improve the information quality. Furthermore, the European Union recently launched the CrowdHeritage platform, an online crowdsourcing platform for enriching the metadata of digitized cultural heritage material available on the Europeana portal [50]. The website of this project allows users to leave comments on selected cultural archives or authenticate existing ones in the same basic way that people browse all existing catalogues with no customization provided. In 2014, some Swiss Heritage institutions were deeply studied in a complete survey to examine to what extent crowdsourcing and open data practices were present in their initiatives and routines [51], concluding that crowdsourcing and open data policies had been considered by very few institutions. However, there were some indications that several institutions might use these innovations in the near future, as the majority of the surveyed institutions deemed them important, considering that their opportunities overcome the identified risks.
Regarding the Egyptian heritage, no documented crowdsourcing initiative was found in the literature. However, some Egyptian community-based initiatives have emerged with the goal of raising awareness about the ongoing loss of heritage, addressing acts of heritage damage, and seeking community support. The proliferation of social media facilitates information distribution among initiatives that are typically focused on a single topic by sharing images and links to articles to highlight changes that have a negative impact on significant buildings.
According to the literature, many CH institutions started to explore the potential of involving public users in the description, classification, and transcription of their metadata in order to build and enrich their digital cultural heritage [44]. Furthermore, a number of projects have demonstrated that crowdsourcing can create meaningful experiences through their collections while also encouraging creativity and engagement on their CH platforms [52]. Also, they must be aware motivating factors, such as user participation, are critical to guarantee the success of these initiatives. Regrettably, all of them usually provide the same set of items for all users to annotate or enhance, disregarding any personal preferences. Thus, it is necessary to improve the assignment of activities or tasks taking into account users’ interests and preferences instead of resorting to random approaches. This can be accomplished by utilizing web technologies that help in the identification of interesting data for each user according to the context if possible.

This entry is adapted from the peer-reviewed paper 10.3390/app131910623

This entry is offline, you can click here to edit this entry!