This paper presents SemSime, a semantic similarity method for searching over a set
of digital resources previously annotated by means of concepts from a weighted reference ontology.
It is based on a frequency approach for weighting the ontology, and refines both the user request and the annotations of the digital resources with rating scores. Such scores are High, Medium, and Low and, in the user request, indicate the preferences assigned by the user to each of the concepts representing the searching criteria whereas, in the annotations of the digital resources, they represent the levels of quality associated with each concept in describing the resources. The experiment we have performed shows that SemSime outperforms the previous semantic search method SemSim.
The most significant improvement within the Semantic Web research area pertains to reasoning and searching abilities. In this perspective, semantic similarity reasoning, which relies on the knowledge coded in a reference ontology , is a different technique with respect to the well-known deductive reasoning used in expert systems. In , we proposed SemSim, a semantic search method based on a Weighted Reference Ontology (WRO). In SemSim, both the resources in the search space and the requests of users are represented by means of an Ontology Feature Vector (OFV), which is a set of concepts from the WRO. We distinguish the user request, also denoted as Request Vector (RV), from the description of a resource, also referred to as Annotation Vector, indicated by AV. In the search process, SemSim contrasts the RV against each AV, and the result is a ranking of the resources that exhibit the highest similarity degree with respect to the request defined by the user.
In , we analyzed two different approaches in order to weigh the reference ontology, namely the frequency-based and the uniform probabilistic approaches. In the experiment described in that paper, we show that SemSim by the frequency-based approach outperforms the SemSim by the uniform probabilistic approach, as well as the most representative similarity methods from the literature.
In this work, we present a new method, referred to as SemSime. It relies on the frequency-based approach and revises SemSim along two directions. According to the first direction, in contrasting the RV with the AV, SemSime takes into consideration the cardinality of the set of the concepts (features) in the user request rather than the maximal cardinality of the compared OFV. This choice allows us to give more relevance to the features which are requested by the user rather than the extra features contained in the annotation vectors available in the search space. Along the second direction, SemSim has been enhanced with the rating scores High (H), Medium (M), and Low (L) in the OFV, with regard to both the request and the search space resources. Within the request, rating scores denote the preferences given by the user to the concepts of the WRO used to specify the query whereas, within the annotation vectors, rating scores represent the levels of quality associated with the concepts when they describe the resources. Consider an example rooted in the tourism domain, where the user is searching for a vacation package by specifying the following features: InternationalHotel (H), LocalTransportation (M), CulturalActivity (H), and Entertainment (L). On the basis of the given rating scores, he/she gives a high preference to resorts which are international hotels offering cultural activities, and less priority to the remaining features, in particular to the entertainments. Analogously, a holiday package annotated with HorseRiding (H), Museum (M), and ThaiMeal (L) is characterized by a high quality level with regard to the horse riding service, rather than the facilities in visiting museums or having Thai meals at lunch or dinner. Note that, in , a proposal concerning rating scores was given, where the concepts of the WRO are weighted according to the uniform probabilistic approach , rather than the frequency-based one. Furthermore, in our approach we assumed that, given a facility (for instance, HorseRiding) included in a tourist package, the higher the user’s priority about that facility, the higher the expectancy about the quality of the same facility and, therefore, the greater the availability of the user for considering more expensive solutions.
In this paper, we have experimented SemSime in the domain of tourism and we have compared it to the SemSim method defined in  and a further evolution of SemSim, referred to as SemSimRV. Essentially, SemSimRV is the original SemSim method where, in line with the first direction adopted in SemSime illustrated above, more priority has been given to the features indicated by the user in his/her request. The results of the experiment show that SemSime outperforms both these methods.