Knowledge Extraction for Health Management from Online Communities

Knowledge Extraction for Health Management from Online Communities: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Health Care Sciences & Services

Contributor: Yanli Zhang , Xinmiao Li ,

Yu Yang

Tao Wang

Knowledge extraction from rich text in online health communities can supplement and improve the existing knowledge base, supporting evidence-based medicine and clinical decision making. The extracted time series health management data of users can help users with similar conditions when managing their health.

online post
online health communities
knowledge recovery

1. Introduction

The incidence of chronic diseases is rapidly increasing worldwide [1]. Chronic diseases represented by hypertension, stroke, diabetes, and coronary heart disease seriously endanger human health and become important diseases threatening people’s health. According to the World Health Organization, users with diabetes, hypertension, and cardiovascular problems rely on functional medications every day [1]. Therefore, patients and doctors must be aware of the effects of commonly used drugs [2]. Disease management is a crucial aspect of health management [3]. Determining drug effects is the primary concern of drug management [4]. The phrase “diseases, drugs, and drug effects (DDEs)” refers to an obvious improvement effect or other effects observed when drugs are used to treat diseases in different individuals [5]. Moreover, DDEs can reflect the effects experienced by different individuals after taking medicine. Consequently, a patient can consult the physician regarding continuing, stopping, replacing, or intervening the treatment scheme depending on the drug’s effects. In general, drug effects corresponding to various diseases are obtained via clinical diagnoses; however, these effects vary for different people, and those identified in limited clinical cases are far from sufficient. Online health communities (OHCs) provide a convenient and fast communication channel, and after taking medication, users can review symptoms and drug effects in OHCs [6]. For example, users can understand the drug effects experienced by others over time; this enables a better understanding of the drug effects for various diseases with respect to users with similar conditions [2,7,8] and avoid adverse effects [9,10,11]. Although this user-generated content (UGC) in OHCs is timely and effective, a large amount of UGC remains unused for disease management.

Scholars mostly studied the extraction of the relationship among diseases, symptoms, and tests [12], and the relationship between diseases, drugs, and efficacy garnered the attention of many scholars [13,14,15]. Extracting the relationship among DDEs from the UGC of OHCs can generate a large amount of information regarding disease medication, which can be used to establish, supplement, and improve the existing knowledge base of DDEs and can assist clinical decision making [15]. Knowledge extraction results are also a key step in establishing a medical knowledge map [16]. At the same time, the drug effects shared by these large user groups provide support for evidence-based medicine [17]. However, research with respect to extracting the relationship among DDEs from unstructured texts in OHCs and then establishing a knowledge base of disease medication effects based on the UGC in OHCs is scarce.

2. Research on Knowledge Extraction from OHCs

There are several users in OHCs, and user-generated content is now rich enough for knowledge extraction. For example, the MedDRA was used to extract the knowledge of drugs and drug effects from the Spanish drug effect database [15]. There is also a rule-based approach for extracting knowledge on dietary recommendations from open data [21]. In addition, disease- and drug-related knowledge extraction in the online health community plays a significant role in the field of relationship extraction. For example, adverse drug events (ADEs) were extracted from the user-generated content of OHCs [8,9,10], and new indications for drugs outside the drug label were found from online communities [2,7].

3. Research on Medical Text Knowledge Extraction

Initially, scholars mostly used pattern matching and machine learning methods to extract knowledge related to diseases and drugs from medical texts. The former adopted syntactic structure analyses, and relation extraction is carried out by expert-defined rules. These methods generally have a low recall rate. Iqbal et al. extracted the relationship between drugs and side effects from electronic medical records based on rules [20]. In the I2B2/VA challenge task, the relationship between medical concepts in patients’ clinical records was extracted; specifically, the following three types of medical relationships were extracted using machine learning [22]. In addition, support vector machine and kernel function methods are widely used for relationship extractions in the biomedical field. This study uses the abundant features of support vector machines to extract knowledge between chemical substances and diseases from research articles in PubMed [23]. Furthermore, it employs multiple algorithms from machine learning to extract knowledge between cures, preventions, and side effects from clinical records [24] and to extract relationships related to patients’ medical problems (disease, examination, and treatment) from discharge summaries [19]. Machine learning has been widely used in the knowledge extraction of adverse drug events (ADEs) [9,10,11], new indications for on-label drug use [2], drug–drug interactions [14], and the relationship between chemical substances and disease [23]. Most of the relational extraction corpora of the above studies come from relatively structured texts (for instance, electronic medical records). However, there are few studies that have extracted the relationship of DDEs from a large number of colloquial and unstructured texts.

4. Research on Medical Knowledge Extraction Based on Deep Learning

Artificial intelligence has been extensively used in the industry and in medical practice. The methods of machine learning require specific domain knowledge and artificial features in relation extraction tasks, which require considerable manpower. Deep learning experienced a period of development and influenced many fields and industries [25]. It is a new type of artificial intelligence technology. In various information extraction tasks, it uses multi-layer cross-connected nodes to establish a multi-relation classification model for input data, which can automatically and intelligently extract features, thus saving a tremendous amount of manual work. Deep learning has been widely used when processing health information. The research used the GRU model to extract knowledge on bacteria-related information from the academic literature of biomedicine [26]. Luo et al. achieved good results in evaluating the relationship among multiple medical problems on the i2b2/VA relationship classification challenge dataset [27]. Yadav et al. used an efficient multi-task deep learning framework to classify the relationship between drug–drug interactions, protein–protein interactions, and medical concepts [28]. Deep learning is also widely used in medical practice, such as nodule detection [29], medical labeling, and scanning [30]. Moreover, it is widely used in the knowledge extraction of adverse drug events (ADEs) [13], drug–drug interactions [31], and the therapeutic effect of drugs on diseases [32,33,34]. Health informatics and natural language processing have become important fields with respect to deep learning technology application [35]. Word correlation training is used to convert words into word vector representation, and BERT word vector and word translation representation are important technologies for assisting deep learning.

However, the current research on extracting medical knowledge primarily focuses on relatively structured small sample data, such as medical literature summaries. However, these corpora are relatively small in scale, and relatively limited knowledge was obtained by relation extraction. In addition, the application of these research methods on a large sample corpus of colloquial and unstructured texts is not ideal. There is little research on how to apply the technology of deep learning to large oral and unstructured sample data to extract the relationships among DDEs, but billions of users in OHCs have generated exceptional amounts of data. By using these large amounts of data, we can obtain valuable knowledge that can help improve the existing knowledge base, thereby enabling auxiliary clinical decision support.

This entry is adapted from the peer-reviewed paper 10.3390/ijerph192416590

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.