Semantic Support of Cultural Heritage

Semantic Support of Cultural Heritage: Comparison

Please note this is a comparison between Version 1 by Jean-Jacques Ponciano and Version 3 by Enzi Gong.

The usage of semantics is not new in cultural heritage disciplines. They are commonly used to define standards for meta-, para-, and provenance informignature of the 2019 Declaration of Cooperation for documenting and archiving. Examples of such standards are LIDO and MIDAS Heritage. These XML schema standards are still used in on advancing the digitization of cultural heritage. In recent years, however, the emergence of the Semantic Web has provided the much-required boost to semantic frameworks and technologies. It also dictates how semantics are defined and used today. Techniques and tools that formalize semantics through formalized knowledge re in Europe shows the important role that the 3D digitization process plays in the safeguard and sustainability of cultural heritage. The digitization also aims at sharing and presentations have become the norm in different fields applying semanticsting cultural heritage.

cultural heritage
point cloud
object recognition
linked open data
semantic enrichment
knowledge model
3D acquisition

1. Overview

The signature of the 2019 Declaration of Cooperation on advancing the digitization of cultural heritage in Europe shows the important role that the 3D digitization process plays in the safeguard and sustainability of cultural heritage. The digitization also aims at sharing and presenting cultural heritage. However, the processing steps of data acquisition to its presentation requires an interdisciplinary collaboration, where understanding and collaborative work is difficult due to the presence of different expert knowledge involved. This study proposes an end-to-end method from the cultural data acquisition to its presentation thanks to explicit semantics representing the different fields of expert knowledge intervening in this process. This method is composed of three knowledge-based processing steps: (i) a recommendation process of acquisition technology to support cultural data acquisition; (ii) an object recognition process to structure the unstructured acquired data; and (iii) an enrichment process based on Linked Open Data to document cultural objects with further information, such as geospatial, cultural, and historical information. The proposed method was applied in two case studies concerning the watermills of Ephesos terrace house 2 and the first Sacro Monte chapel in Varallo. These application cases show the proposed method’s ability to recognize and document digitized cultural objects in different contexts thanks to the semantics.

2. Background

Since the world heritage convention in 1972, UNESCO has worked actively to protect endangered world heritage sites and objects. It concerns cultural heritage that is subject to serious deterioration; significant loss of historical authenticity; loss of cultural significance; and the threat of human planning, armed conflict, or environmental factors (e.g., climatic and geologic). Actions are taken to protect cultural heritage by avoiding and mitigating threats and deterioration wherever possible. However, it is difficult to face the time and its consequences on cultural heritage. Therefore, its digitization and digital preservation are an opportunity to conserve it and share it with the public and between different organizations. The digitization process aims at converting information into a digital format and results in a digital representation. To preserve digital representation, it is necessary to ensure continued access to digital materials for as long as required. In 2005, Europe initiated the creation of a common access point to Europe’s cultural heritage. Since this initiative, several European projects, such as the Europeana initiative (https://pro.europeana.eu/about-us/mission, accessed on 18 March 2021), ITN-DCH (https://www.itn-dch.net/, accessed on 18 March 2021), VIMM (https://www.vi-mm.eu/vimm-results/, accessed on 18 March 2021), and Dariah-EU (https://www.dariah.eu/about/mission-vision/, accessed on 18 March 2021), have been implemented to enable digital representation and the sharing of projects by making them reusable, visible, and sustainable. These projects promote cultural heritage documentation as Linked Open Data using semantic technologies. Linked Open Data and semantic technologies facilitate the sharing of data and face digital format evolution and change over time that threaten digital preservation sustainability. However, the process from the digital acquisition to cultural heritage presentation is a long and challenging path, where research work is still necessary to improve it. This interest has been reinforced mainly for 3D digitization by the signing of the 2019 Declaration of Cooperation on advancing the digitization of cultural heritage in Europe. This study presents the potential of semantics to facilitate the process from 3D cultural heritage data acquisition to its presentation and thus support the safeguard of cultural heritage. The semantic technologies allow the meaning that is implicitly contained in data to be explicitly described (in logical form). This explicit meaning expressed through semantics enables machines and people to understand, share, and reason with one another. The proposed method aims at improving the process from 3D cultural heritage data acquisition to its presentation by providing an end-to-end process guided by expert knowledge through the use of semantic techologies. Challenges and motivations related to such a method are detailed in Section 1.1, and the related work is presented in Section 1.2. The method and its used knowledge model are described in Section 2. This method is composed of three steps:

Data acquisition guided by a recommendation system for acquisition technologies.
Data processing and structuring through knowledge-guided object recognition.
Data presentation with cultural information thanks to an enrichment process from Linked Open Data.

The proposed method was applied in two case studies presented in Section 3. The first case study concerns the archaeological site of the terrace house 2 in Ephesos (Turkey). The second case study corresponds to the first chapel of the Sacro Monte in Varallo (Italy). Results of the proposed method applied to these case studies are presented in Section 4. Results obtained by the proposed method are discussed in Section 5, to conclude on the method Section 6.

2.1. Challenges

The process from 3D cultural heritage data acquisition to its presentation is composed of four main steps:

Data acquisition, which allows the digitization of a cultural object and produces unstructured data;
Data processing, which produces a structured data thanks to the segmentation, classification, and analysis of unstructured data;
Data enrichment, which consists of enriching the structured data with cultural heritage information and knowledge related to the structured data;
Data presentation, which allows the visualization of the structured and enriched data.

Each of these steps requires expert knowledge. The data acquisition of cultural heritage requires knowledge on acquisition technologies and cultural heritage to choose the most adapted technology according to the cultural object to acquire and the context ^[1][2][3][1,2,3]. Data processing requires computer scientist knowledge to define the most adapted processing according to the data and cultural objects or elements to recognize. The data enrichment requires cultural heritage and historical knowledge to add metadata and information related to the digitized cultural objects. All of these requirements show that this process is an interdisciplinary one.

In addition to the challenges specific to each stage of the process, this process’s interdisciplinarity is a challenge that makes the process long and difficult. Providing an efficient process would require collaborative work between experts from different domains. However, understanding between the different experts, which is necessary to collaborate, is a difficult task that produces a sequence of isolated tasks rather than a continuous collaborative process. Such a process based on isolated and independent tasks is thus a long process that lacks a common pursued goal. A common goal would allow the optimization of each step according to the pursued final goal. This study proposes a method to facilitate and improve the process from data acquisition to its presentation by using explicit knowledge representation. The knowledge representation aims to gather knowledge from the different experts and use it to guide users and the whole process according to a common goal. This goal is the presentation of enriched and structured cultural heritage data.

2.2. Related Work

The usage of semantics is not new in cultural heritage disciplines. They are commonly used to define standards for meta-, para-, and provenance information for documenting and archiving. Examples of such standards are LIDO [4] and MIDAS Heritage [5]. These XML schema standards are still used in cultural heritage. In recent years, however, the emergence of the Semantic Web has provided the much-required boost to semantic frameworks and technologies [6]. It also dictates how semantics are defined and used today. Techniques and tools that formalize semantics through formalized knowledge representations have become the norm in different fields applying semantics. Ontologies expressed through Web Ontology Language (OWL) [7] have evolved as major computational artefacts to provide logical representations of any particular domain of interest [8]. CIDOC-CRM is the most prominent and widely used ontology within the cultural heritage community [9]. In 2006, it became an ISO standard for publishing cultural heritage. Although semantics are commonly used for documenting and archiving cultural heritage, it is not often used to guide data processing and enrich data from Linked Open Data, which are other strengths of semantics that can be applied to the cultural heritage domain. Therefore, this section presents works related to approaches for data acquisition and processing, and then, works related to collect data and gather cultural heritage information thanks to semantics.

Although reviews comparing acquisition technologies for different application domains exist, such as ^[10][11][10,11], systems that guide users for data acquisition are rare. Cultural heritage objects are diversified (e.g., archaeological sites, heritage buildings, and paintings) with specific characteristics, documentation requirements, acquisition context, and application fields (e.g., preservation, restoration, and documentation). The acquisition techniques and technologies vary according to the application field and related cultural heritage objects to acquire. Therefore, the European project COST Action TD1201: “ Colour and Space in Cultural Heritage (COSCH)” [12] has addressed the issue of determining preferred technical solution(s) according to data requirements needed to guide non-technical humanities experts. The approach presented in [13] proposes the COSCH-KR or COSCH Knowledge Representation, an ontology model to solve this issue. The ontology knowledge model constitutes interrelated semantics from technical and humanities domains involved in the optical recording of physical, cultural heritage assets. Inbuilt semantic rules infer the necessity of technical solution(s). COSCH-KR further applies semantics to the processing of these generated data as required by a cultural heritage application. Concerning cultural heritage data processing, the generation of annotated 3D models is nowadays widespread within the heritage community to disseminate and share information of cultural heritage objects. Various methodologies and algorithms have been applied to generate such computer-based 3D models. A review [14] presents the most popular methodologies and algorithms used to segment and classify 3D point clouds for the geospatial and heritage community. These authors highlight the advances made in this domain through the use of machine learning methods. Machine learning methods belong to the family of data-driven approaches. The main algorithms used to achieve machine learning are Markov Random Fields (MRF) (e.g., [15]) and quadratic programming [16], but also Associative Markov networks (AMN) [17]. Other approaches, such as ^{[18][19][20][21]}[18,19,20,21], use deep learning techniques based on convolutional neural networks (CNN). A review [22] presents the different categories of these approaches. The limit of these machine learning and deep learning methods is the requirement of large data sets to obtain a satisfying result. Among the data-driven approaches, other popular methods are stochastic methods. Stochastic methods aim at the recognition of the context or are based on shape. The recognition of context can provide semantic information describing a scene [23] or the geometry [24]. Shape-based recognition is used in [25] to identify semantic geometric classes by taking advantage of pre-structured knowledge. Ontologies are increasingly used to represent this knowledge and semantic information, all the more as they facilitate information retrieval [26] through the semantic web, and semantic techniques for querying cultural heritage data [27]. Through this work, the semantic technique is presented as being used to represent the result, but the semantic technique can also be used during recognition. The interest in using ontologies to process the data is mainly visible in the domain of image processing. In [28], a domain ontology is used to develop a recognition method. In [29], the detection and classification of objects are performed using ontology and reasoning techniques. However, most of these works only use semantic techniques to achieve some steps of the processing. The Knowledge-based object Detection in Image and Point cloud approach (KnowDIP) [30] uses semantic techniques at each step of the processing and is thus able to benefit from all advantages provided by the semantic technique, to both guide the process of computer-based modelling (through an adaptive selection of algorithms and an iterative classification) and represent the result of the 3D model understanding [31].

Semantics have an essential role in disseminating and sharing cultural heritage data collection and information gathering. Its main benefit is to solve problems of interoperability. It can enrich and homogenise the scheme of cultural heritage metadata to improve the searching and navigation functionalities of a cultural portal, as presented in [32]. It can also be used to publish and connect different sources of cultural data. Some approaches, such as [33], can connect a range of cultural heritage types, such as paintings, archaeological sites, archaeological exhibits and points of interest located in contemporary urban space. Such a connection and mapping are achieved through the CrossCult knowledge base’s semantic-based design that aims to enhance the capabilities of the CrossCult platform and mobile applications. The proposed knowledge base contains an upper-level ontology based on CIDOC-CRM concepts and some additional concepts, such as Reflective Topic. It also includes the CrossCult Classification schema incorporated into the upper-level ontology. This knowledge base aims to connect and map information and data from cultural heritage institutions based on four flagship pilot cases from eight locations across Europe. Other existing approaches publish more specific cultural heritage data types (e.g., biography, artworks, and cultural heritage buildings) as Linked Open Data. The authors of ^[34][35][34,35] create an Irish CH knowledge base based on CIDOC-CRM, whose knowledge is derived from the Dictionary of Irish biography and linked to DBpedia. The work presented in [36] proposes that open linked data from the data on artworks and authors of the web portal of the Russian Museum be published. The proposed method consists of transforming data into RDF using CIDOC-CRM vocabulary. It links the thesauri of the British Museum to the SKOS: concept and specific concepts of CIDOC-CRM. It finally interlinks and enriches the knowledge representation with DBpedia. This enrichment consists of adding information about authors (e.g., date of birth and death and artistic movement author belongs to) first and annotating with links to DBpedia resource unstructured text as artwork descriptions and author biographies. Concerning cultural heritage building data, they require gathering both BIM information and cultural information. The authors of ^[37][38][37,38] propose the ontology HBIM that integrates Getty vocabulary and IFCOWL to create a catalogue of cultural heritage buildings and architectural complexes. This study belongs to the INCEPTION project aiming to provide a catalogue to be able to visualize, update, exchange, and divulgate cultural heritage buildings and architectural complexes. The work presented in [39] proposes a 3D model that is fully interoperable and rich in its informative content, enabling the user to query a repository composed of semantically structured and rich HBIM data. Existing approaches use semantic representation, mainly based on CIDOC-CRM vocabulary, to publish open data, gather different data sources, and facilitate the search and navigation of cultural heritage. However, only a few of them ^[34][35][36][34,35,36] exploit the strength of existing Linked Open Data, such as DBpedia. The approach presented in this study proposes the exploitation of the rich interlinking of Wikidata entities to gather and collect information from different sources of Linked Open Data.

This related work study shows a lack of end-to-end approaches supporting cultural heritage documentation from data acquisition to its presentation. However, it highlights the potential of semantic to support this process and presents relevant semantic-based approaches to support some steps of this process. It thus allows us to determine COSCH-KR and KnowDIP approaches as relevant in supporting data acquisition and processing, respectively. These two approaches bring support in different contexts of application, and each of them provides a part of the knowledge domain intervening in the end-to-end process from cultural heritage acquisition to its presentation. As far as cultural enrichment is concerned, we are not yet aware of a flexible approach adapted to different contexts. However, a review of the related work study allows us to observe that most Linked Open Data sources and enrichment approaches are based on the CIDOC-CRM ontology. This ontology is, therefore, unavoidable when publishing and sharing cultural heritage data, information, and knowledge.

3. Conclusions

Semantic web technologies and Linked Open Data are increasingly utilized for the publishing of cultural heritage documentation. Documentation publishing, such as Linked Open Data, provides an essential source of knowledge and information that can be used to enrich and support cultural heritage object documentation. However, using Linked Open Data as an information source for cultural documentation requires the gathering of different sources from Linked Open Data to enrich the cultural documentation. Existing approaches using Linked Open Data as an information source for documentation are generally specific to a domain and focus on specific Linked Open Data sources. These approaches show semantic potential, but they do not entirely exploit this potential to support the documentation process from the acquisition to its presentation. Semantics can gather different knowledge domains, guide the documentation process in different contexts, and gather Linked Open Data sources for documentation enrichment with the goal of providing rich cultural heritage documentation. This study shows the semantic potential of these two approaches to support the end-to-end documentation process from data acquisition to cultural heritage presentation. The proposed method comprises three knowledge-based processing steps: acquisition technology recommendation, object recognition to structure the data, and data enrichment through Linked Open Data. This method provides two main contributions. The first one is an end-to-end process to support the safeguard of cultural heritage. This end-to-end process is based on acquisition technology recommendations and object recognition, which can adapt to different contexts of cultural heritage. Thanks to this flexibility, the proposed method can support data digitization in its application to various cultural heritage cases. As shown through the two case studies, the proposed method is applicable to large cultural heritage objects, such as a terrace house, a watermill, or a chapel, but also smaller objects, such as statues. The second contribution is the gathering and centralization of a variety of information and documents related to cultural heritage objects, thanks to Linked Open Data. The flexibility and the connection between the different steps of the proposed methods are provided thanks to the semantics.