Your browser does not fully support modern features. Please upgrade for a smoother experience.

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Liza Vinhoven	--	1793	2022-09-16 16:10:22	\|
2	Changed Definition section and rewrote and reorganized entry content and structure	Liza Vinhoven	-154 word(s)	1639	2022-09-19 13:51:34	\| \|
3	format correct	Catherine Yang	Meta information modification	1639	2022-09-20 04:43:32	\| \|
4	revised	Catherine Yang	-22 word(s)	1617	2022-09-20 04:44:43	\| \|
5	format correct	Catherine Yang	+ 3 word(s)	1620	2022-09-20 08:54:51	\|

Video Upload Options

We provide professional Academic Video Service to translate complex research into visually appealing presentations. Would you like to try it?

No, upload directly Yes

Cite

If you have any further questions, please contact Encyclopedia Editorial Office.

Select a Style

Vinhoven, L. Integrating Text-Mining into the Curation of Disease Maps. Encyclopedia. Available online: https://encyclopedia.pub/entry/27257 (accessed on 11 January 2026).

Vinhoven L. Integrating Text-Mining into the Curation of Disease Maps. Encyclopedia. Available at: https://encyclopedia.pub/entry/27257. Accessed January 11, 2026.

Vinhoven, Liza. "Integrating Text-Mining into the Curation of Disease Maps" Encyclopedia, https://encyclopedia.pub/entry/27257 (accessed January 11, 2026).

Vinhoven, L. (2022, September 16). Integrating Text-Mining into the Curation of Disease Maps. In Encyclopedia. https://encyclopedia.pub/entry/27257

Vinhoven, Liza. "Integrating Text-Mining into the Curation of Disease Maps." Encyclopedia. Web. 16 September, 2022.

Integrating Text-Mining into the Curation of Disease Maps

Edit

The interactive, user-friendly disease map viewer was developed to support the automated creation of systems medicine models such as disease maps by text mining. It sits at the interface between computational text mining and the manual expert creation of disease maps, was developed to sit at the interface between computational text mining and the manual expert creation and brings together the time-saving advantages of text mining with the accuracy of manual data curation.

Text mining disease map systems biology

1. Introduction

In light of the rapidly increasing data and knowledge on disease and their underlying bioligical pathways, it is becoming more and more essential to integrate, store and visualize them. In order to analyse and interpret the data efficiently, it is important that these knowledge representations are human- as well as machine-readable. For this purpose, the approaches such as systems medicine disease maps have been gaining importance over the last years. Disease maps were proposed by Mazein et al. and are defined as "comprehensive, knowledge-based representations of disease mechanisms" ^[1]. They are based on the systems biology models and written in the Systems Biology Graphical Notation ^[2], but combine regulatory networks, metabolic and signalling pathways, as well as extensions such as e.g. different phenotypes. These disease maps can be used for a range of applications, such as identifying disease biomarkers and drug targets, drug repositioning, structuring omics data, and developing improved diagnostics^[1]^[3]. The largest disease map to date is the COVID-19 disease map ^[4]. It was created by 130 researchers and consists of 42 diagrams with a total 5499 elements, connected by 1836 interactions, which were curated from 617 publications and preprints. This highlights the sheer time and manpower required to manually curate these valuable knowledge resources. One way to support the construction of disease maps is by text mining. Text mining refers to the automated annotation of human-written texts to extract the information and bring it into a human- and machine-readable format, thereby speeding up the curation and annotation process of human-written text ^[5]. To do so, many possible information technologies are applicable, for example, machine learning, pattern matching, or the processing of natural, human-readable language ^[6].

In general, a text mining algorithms will follow the steps below.

1. As an input, the algorithm will take a human-readable sentence, in this case from a biological paper. It will then first highlight the named entities (NE), which are terms that are then normalized and transformed into identifiers. These NEs can be proteins, genes, diseases, or any other biologically relevant term, taken from an underlying database that contains NEs that the system should be able to identify.

2. The entities are assigned to unique identifiers, which are then organized into an identifier scheme.

3. The extracted relationships from the input text data are included between named entities.

The resulting network of nodes and relationships can then be compared and expanded with additional text data. With the help of this network, new hypotheses can be formed and these can then be the subject of further research ^[6].

In the last years, great strides have been made in the development of text mining algorithms with high sensitivity and specificity, but they cannot yet replace a human expert curator. Therefore, the researchers developed a tool to bring together the speed of text mining and the accuracy of expert knowledge and experience of scientists to support the creation of systems medicine disease maps.

Our tool consists of an interactive disease map viewer, which takes the output of text mining algorithms, translates it to the required format, and displays it in a cellular layout similar to disease map. As the disease map viewer is a stand-alone tool, the user is able to utilize the text mining approach they find most suitable for their use case or even include results from more than one system. The user then has the possibility to examine the interactions identified by the text mining algorithm and evaluate them based on the text passage they are based on. In the end, this results in a list of automatically parsed but expert-validated interactions, which can then be used as a basis for a disease map. Ultimately, this simplifies and significantly speeds up the curation step during the construction of disease maps.

2. Application

The required input data for the disease map viewer is biological interaction data parsed by a text mining algorithm. The results have to be formatted in two simple, reproducible CSV files, one containing the interactions between the entities themselves and the other specifying the subcellular localization of each biological entity. A flowchart of the input data, software, and output data of the systems can be seen in Figure 1.

Figure 1. Flowchart of the processes included in the tool. Input knowledge and data are shown in green on the right, the software modules are shown in yellow, and the output files are shown in blue on the right. Two CSV files, one containing the list of interactions and one containing the subcellular localisation of the entities, serve as input for the CytoscapeJSON parser implemented in Python. The resulting JSON file serves as input for the disease map viewer, where the interactions are validated by expert knowledge. The validated interactions can then be exported in a cellular layout in a JSON file or as a list of interactions in a CSV file.

To prepare text mining results that are easy to store, share, and use, the researchers used a Python script to convert them from a simple CSV file to JSON format. Simply put, the JSON data structure of the text mining results is a list of every element (nodes, compartments, and edges) in the disease map. This SBML-based JSON format is used by the Cytoscape.js library to create the graphical SBGN map from it. The interface is built around the Cytoscape.js instance that renders and displays disease maps to help the user annotate and review the text-mined disease map conveniently.

Figure 2 shows the interface with exemplary data. The main graph is shown in a cell-like layout, where the user can zoom in and out. The rectangular nodes represent the molecular entities and are localized in the subcellular compartment specified in the JSON file. The arrow-shaped edges represent molecular interactions between them. All entities (genes/proteins and compartments), as well as their respective edges, can be moved freely by dragging to improve structure and visibility to fit the user’s needs.

Figure 2. Interface of the disease map viewer. The large window in the middle shows the text mining data as a coarse disease map in a cellular layout. The left sidebar shows the legend and filter options, and the right sidebar shows the review function, where the supporting sentences from the parsed publications are displayed and the user can validate or reject an interaction. The buttons on the bottom left show the timeline option, where the interaction data can be filtered by date of publication.

The colouring is the colour of categorization of found verbs. All “activating” edges are coloured green, “inhibiting” edges are coloured red, “neutral” edges are coloured blue, and “undefined” edges have a grey colour, while incoherent interactions are shown in brown.

The left sidebar shows the legend and filter options for the edges in the graph. As a default, all edges are displayed, but the user can uncheck types of edges to hide them and thus obtain a better overview of the remaining categories of edges. This legend can be opened and closed by clicking the top button “hide/show filter”.

Another way the data from the text mining are categorized is by the thickness of the edges in the graph. The more distinct publications have been found to have both connected nodes mentioned in the same sentence, the thicker the edge between them. In the bottom-left corner of the filter window, the user can filter the edges depending on the number of supporting publications. The slider can be moved to define a minimum number of publications an edge needs to have to display it. Moreover, below the slider is a button that will reset the filter and reload the map.

In order to integrate expert knowledge and validate text-mined data, the researchers included a review function, as observed in the right-hand panel of the interface. The user can examine all interactions with two methods: by clicking the “Next edge” button to iterate all interactions that need to be reviewed or by directly selecting a specific edge from the graph. The review panel will then display the two nodes connected by the clicked edge and the colour of the edge between both, as well as the current review status of the interaction. Below this, a list of PubMed IDs is displayed together with the sentences that have been used to identify the interaction in each reference. The verbs that have been used to categorize the interaction are coloured in red. The user can then load the entire text to obtain more context for the sentence. The user can then review the interaction with all available data on hand and assign a status to the interaction. If the expert approves the text-mined interaction, the “accept” status can be selected. If the text-mined interaction is a false positive, the “decline” status is appropriate, and if more research needs to be conducted to approve the interaction, the “further inspection needed” status can be assigned.

To view the status of the review process, the data can be downloaded either as a CSV file with all interactions, their current review status, and the PubMed ID from with the interaction, which was text mined from the disease map, or as a JSON file with the entire disease map in a JSON object that can be saved for reloading in a later session or to share with other users.

References

Alexander Mazein; Marek Ostaszewski; Inna Kuperstein; Steven Watterson; Nicolas Le Novère; Diane Lefaudeux; Bertrand De Meulder; Johann Pellet; Irina Balaur; Mansoor Saqi; et al.Maria Manuela NogueiraFeng HeAndrew PartonNathanaël LemonnierPiotr GawronStephan GebelPierre HainautMarkus OllertUgur DogrusozEmmanuel BarillotAndrei ZinovyevReinhard SchneiderRudi BallingCharles Auffray Systems medicine disease maps: community-driven comprehensive representation of disease mechanisms. npj Systems Biology and Applications 2018, 4, 1-10, 10.1038/s41540-018-0059-y.
Nicolas Le Novère; Michael Hucka; Huaiyu Mi; Stuart Moodie; Falk Schreiber; Anatoly Sorokin; Emek Demir; Katja Wegner; Mirit I Aladjem; Sarala Wimalaratne; et al.Frank T BergmanRalph GaugesPeter GhazalHideya KawajiLu LiYukiko MatsuokaAlice VillégerSarah BoydLaurence CalzoneMélanie CourtotUgur DogrusozThomas FreemanAkira FunahashiSamik GhoshAkiya JourakuSohyoung KimFedor KolpakovAugustin LunaSven SahleEsther SchmidtSteven WattersonGuanming WuIgor GoryaninDouglas KellChris SanderHerbert SauroJacky SnoepKurt KohnHiroaki Kitano The Systems Biology Graphical Notation. Nature Biotechnology 2009, 27, 735-741, 10.1038/nbt.1558.
Marek Ostaszewski; Stephan Gebel; Inna Kuperstein; Alexander Mazein; Andrei Zinovyev; Ugur Dogrusoz; Jan Hasenauer; Ronan M T Fleming; Nicolas Le Novère; Piotr Gawron; et al.Thomas LigonAnna NiarakisDavid NickersonDaniel WeindlRudi BallingEmmanuel BarillotCharles AuffrayReinhard Schneider Community-driven roadmap for integrated disease maps. Briefings in Bioinformatics 2018, 20, 659-670, 10.1093/bib/bby024.
Marek Ostaszewski; Anna Niarakis; Alexander Mazein; Inna Kuperstein; Robert Phair; Aurelio Orta-Resendiz; Vidisha Singh; Sara Sadat Aghamiri; Marcio Luis Acencio; Enrico Glaab; et al.Andreas RueppGisela FoboCorinna MontroneBarbara BraunerGoar FrishmanLuis Cristóbal Monraz GómezJulia SomersMatti HochShailendra Kumar GuptaJulia ScheelHanna BorlinghausTobias CzaudernaFalk SchreiberArnau MontagudMiguel Ponce de LeonAkira FunahashiYusuke HikiNoriko HiroiTakahiro G YamadaAndreas DrägerAlina RenzMuhammad NaveezZsolt BocskeiFrancesco MessinaDaniela BörnigenLiam FergussonMarta ContiMarius RameilVanessa NakonecnijJakob VanhoeferLeonard SchmiesterMuying WangEmily E AckermanJason E ShoemakerJeremy ZuckerKristie OxfordJeremy TeutonEbru KocakayaGökçe Yağmur SummakKristina HanspersMartina KutmonSusan CoortLars EijssenFriederike EhrhartD A B RexDenise SlenterMarvin MartensNhung PhamRobin HawBijay JassalLisa MatthewsMarija Orlic-MilacicAndrea Senff-RibeiroKaren RothfelsVeronica ShamovskyRalf StephanCristoffer SevillaThawfeek VarusaiJean-Marie RavelRupsha FraserVera OrtseifenSilvia MarchesiPiotr GawronEwa SmulaLaurent HeirendtVenkata SatagopamGuanming WuAnders RiuttaMartin GolebiewskiStuart OwenCarole GobleXiaoMing HuRupert W OverallDieter MaierAngela BauchBenjamin M GyoriJohn A BachmanCarlos VegaValentin GrouèsMiguel VazquezPablo PorrasLuana LicataMarta IannuccelliFrancesca SaccoAnastasia NesterovaAnton YuryevAnita de WaardDenes TureiAugustin LunaOzgun BaburSylvain SolimanAlberto ValdeolivasMarina Esteban-MedinaMaria Peña-ChiletKinza RianTomáš HelikarBhanwar Lal PuniyaDezso ModosAgatha TreveilMarton OlbeiBertrand De MeulderStephane BallereauAurélien DugourdAurélien NaldiVincent NoëlLaurence CalzoneChris SanderEmek DemirTamas KorcsmarosTom C FreemanFranck AugéJacques S BeckmannJan HasenauerOlaf WolkenhauerEgon L WillighagenAlexander R PicoChris T EveloMarc E GillespieLincoln D SteinHenning HermjakobPeter D'EustachioJulio Saez-RodriguezJoaquin DopazoAlfonso ValenciaHiroaki KitanoEmmanuel BarillotCharles AuffrayRudi BallingReinhard Schneider COVID‐19 Disease Map, a computational knowledge repository of virus‐host interaction mechanisms. Molecular Systems Biology 2021, 17, e10851, 10.15252/msb.202110851.
Nathan Harmston; Wendy Filsell; Michael P H Stumpf; What the papers say: Text mining for genomics and systems biology. Human Genomics 2010, 5, 17, 10.1186/1479-7364-5-1-17.
Fei Zhu; Preecha Patumcharoenpol; Cheng Zhang; Yang Yang; Jonathan Chan; Asawin Meechai; Wanwipa Vongsangnak; Bairong Shen; Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics 2013, 46, 200-211, 10.1016/j.jbi.2012.10.007.

©Text is available under the terms and conditions of the Creative Commons-Attribution ShareAlike (CC BY-SA) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Others

Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : Liza Vinhoven

View Times: 494

Update Date: 20 Sep 2022

Table of Contents

Notice

You are not a member of the advisory board for this topic. If you want to update advisory board member profile, please contact office@encyclopedia.pub.

Confirm

Only members of the Encyclopedia advisory board for this topic are allowed to note entries. Would you like to become an advisory board member of the Encyclopedia?

Yes

${ textCharacter }/${ maxCharacter }

Submit

Cancel

There is no comment~

${ textCharacter }/${ maxCharacter }

Submit

Cancel

${ selectedItem.replyTextCharacter }/${ selectedItem.replyMaxCharacter }

Submit

Cancel

Confirm

Are you sure to Delete?

Yes No