Knowledge Extraction for Health Management from Online Communities

Knowledge Extraction for Health Management from Online Communities: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor: Yanli Zhang , Xinmiao Li ,

Yu Yang

Tao Wang

Knowledge extraction from rich text in online health communities can supplement and improve the existing knowledge base, supporting evidence-based medicine and clinical decision making. The extracted time series health management data of users can help users with similar conditions when managing their health.

online post
online health communities
knowledge recovery

1. Introduction

The incidence of chronic diseases is rapidly increasing worldwide ^[1]. Chronic diseases represented by hypertension, stroke, diabetes, and coronary heart disease seriously endanger human health and become important diseases threatening people’s health. According to the World Health Organization, users with diabetes, hypertension, and cardiovascular problems rely on functional medications every day ^[1]. Therefore, patients and doctors must be aware of the effects of commonly used drugs ^[2]. Disease management is a crucial aspect of health management ^[3]. Determining drug effects is the primary concern of drug management ^[4]. The phrase “diseases, drugs, and drug effects (DDEs)” refers to an obvious improvement effect or other effects observed when drugs are used to treat diseases in different individuals ^[5]. Moreover, DDEs can reflect the effects experienced by different individuals after taking medicine. Consequently, a patient can consult the physician regarding continuing, stopping, replacing, or intervening the treatment scheme depending on the drug’s effects. In general, drug effects corresponding to various diseases are obtained via clinical diagnoses; however, these effects vary for different people, and those identified in limited clinical cases are far from sufficient. Online health communities (OHCs) provide a convenient and fast communication channel, and after taking medication, users can review symptoms and drug effects in OHCs ^[6]. For example, users can understand the drug effects experienced by others over time; this enables a better understanding of the drug effects for various diseases with respect to users with similar conditions ^[2]^[7]^[8] and avoid adverse effects ^[9]^[10]^[11]. Although this user-generated content (UGC) in OHCs is timely and effective, a large amount of UGC remains unused for disease management.

Scholars mostly studied the extraction of the relationship among diseases, symptoms, and tests ^[12], and the relationship between diseases, drugs, and efficacy garnered the attention of many scholars ^[13]^[14]^[15]. Extracting the relationship among DDEs from the UGC of OHCs can generate a large amount of information regarding disease medication, which can be used to establish, supplement, and improve the existing knowledge base of DDEs and can assist clinical decision making ^[15]. Knowledge extraction results are also a key step in establishing a medical knowledge map ^[16]. At the same time, the drug effects shared by these large user groups provide support for evidence-based medicine ^[17]. However, research with respect to extracting the relationship among DDEs from unstructured texts in OHCs and then establishing a knowledge base of disease medication effects based on the UGC in OHCs is scarce.

2. Research on Knowledge Extraction from OHCs

There are several users in OHCs, and user-generated content is now rich enough for knowledge extraction. For example, the MedDRA was used to extract the knowledge of drugs and drug effects from the Spanish drug effect database ^[15]. There is also a rule-based approach for extracting knowledge on dietary recommendations from open data ^[18]. In addition, disease- and drug-related knowledge extraction in the online health community plays a significant role in the field of relationship extraction. For example, adverse drug events (ADEs) were extracted from the user-generated content of OHCs ^[8]^[9]^[10], and new indications for drugs outside the drug label were found from online communities ^[2]^[7].

3. Research on Medical Text Knowledge Extraction

Initially, scholars mostly used pattern matching and machine learning methods to extract knowledge related to diseases and drugs from medical texts. The former adopted syntactic structure analyses, and relation extraction is carried out by expert-defined rules. These methods generally have a low recall rate. Iqbal et al. extracted the relationship between drugs and side effects from electronic medical records based on rules ^[19]. In the I2B2/VA challenge task, the relationship between medical concepts in patients’ clinical records was extracted; specifically, the following three types of medical relationships were extracted using machine learning ^[20]. In addition, support vector machine and kernel function methods are widely used for relationship extractions in the biomedical field. This research uses the abundant features of support vector machines to extract knowledge between chemical substances and diseases from research articles in PubMed ^[21]. Furthermore, it employs multiple algorithms from machine learning to extract knowledge between cures, preventions, and side effects from clinical records ^[22] and to extract relationships related to patients’ medical problems (disease, examination, and treatment) from discharge summaries ^[23]. Machine learning has been widely used in the knowledge extraction of adverse drug events (ADEs) ^[9]^[10]^[11], new indications for on-label drug use ^[2], drug–drug interactions ^[14], and the relationship between chemical substances and disease ^[21]. Most of the relational extraction corpora of the above studies come from relatively structured texts (for instance, electronic medical records). However, there are few studies that have extracted the relationship of DDEs from a large number of colloquial and unstructured texts.

4. Research on Medical Knowledge Extraction Based on Deep Learning

Artificial intelligence has been extensively used in the industry and in medical practice. The methods of machine learning require specific domain knowledge and artificial features in relation extraction tasks, which require considerable manpower. Deep learning experienced a period of development and influenced many fields and industries ^[24]. It is a new type of artificial intelligence technology. In various information extraction tasks, it uses multi-layer cross-connected nodes to establish a multi-relation classification model for input data, which can automatically and intelligently extract features, thus saving a tremendous amount of manual work. Deep learning has been widely used when processing health information. The research used the GRU model to extract knowledge on bacteria-related information from the academic literature of biomedicine ^[25]. Luo et al. achieved good results in evaluating the relationship among multiple medical problems on the i2b2/VA relationship classification challenge dataset ^[26]. Yadav et al. used an efficient multi-task deep learning framework to classify the relationship between drug–drug interactions, protein–protein interactions, and medical concepts ^[27]. Deep learning is also widely used in medical practice, such as nodule detection ^[28], medical labeling, and scanning ^[29]. Moreover, it is widely used in the knowledge extraction of adverse drug events (ADEs) ^[13], drug–drug interactions ^[30], and the therapeutic effect of drugs on diseases ^[31]^[32]^[33]. Health informatics and natural language processing have become important fields with respect to deep learning technology application ^[34]. Word correlation training is used to convert words into word vector representation, and BERT word vector and word translation representation are important technologies for assisting deep learning.

However, the current research on extracting medical knowledge primarily focuses on relatively structured small sample data, such as medical literature summaries. However, these corpora are relatively small in scale, and relatively limited knowledge was obtained by relation extraction. In addition, the application of these research methods on a large sample corpus of colloquial and unstructured texts is not ideal. There is little research on how to apply the technology of deep learning to large oral and unstructured sample data to extract the relationships among DDEs, but billions of users in OHCs have generated exceptional amounts of data. By using these large amounts of data, people can obtain valuable knowledge that can help improve the existing knowledge base, thereby enabling auxiliary clinical decision support.

This entry is adapted from the peer-reviewed paper 10.3390/ijerph192416590

References

Bardhan, I.; Chen, H.; Karahanna, E. Connecting systems, data, and people: A multidisciplinary research roadmap for chronic disease management. MIS Q. 2020, 44, 185–200.
Rastegar-Mojarad, M.; Liu, H.; Nambisan, P. Using social media data to identify potential candidates for drug repurposing: A feasibility study. JMIR Res. Protoc. 2016, 5, e5621.
Zhang, T.; Wang, K.; Li, N.; Hurr, C.; Luo, J. The Relationship between Different Amounts of Physical Exercise, Internal Inhibition, and Drug Craving in Individuals with Substance-Use Disorders. Int. J. Environ. Res. Public Health 2021, 18, 12436.
Lin, C.C.; Hwang, S.J. Patient-centered self-management in patients with chronic kidney disease: Challenges and implications. Int. J. Environ. Res. Public Health 2020, 17, 9443.
Mehta, D.; Jackson, R.; Paul, G.; Shi, J.; Sabbagh, M. Why do trials for Alzheimer’s disease drugs keep failing? A discontinued drug perspective for 2010–2015. Expert Opin. Investig. Drugs 2017, 26, 735–739.
Wang, L.; Alexander, C.A. Big data analytics in medical engineering and healthcare: Methods, advances and challenges. J. Med. Eng. Technol. 2020, 44, 267–283.
Zhao, M.N. Off-Label Drug Use Detection Based on Heterogeneous Network Mining. In Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI), Park City, UT, USA, 23–26 August 2017; p. 331.
Nguyen, K.A.; Mimouni, Y.; Jaberi, E.; Paret, N.; Boussaha, I.; Vial, T.; Jacqz-Aigrain, E.; Alberti, C.; Guittard, L.; Remontet, L.; et al. Relationship between adverse drug reactions and unlicensed/off-label drug use in hospitalized children (EREMI): A study protocol. Therapies 2021, 76, 675–685.
Antipov, E.A.; Pokryshevskaya, E.B. The Effects of Adverse Drug Reactions on Patients’ Satisfaction: Evidence From Publicly Available Data on Tamiflu (Oseltamivir). Int. J. Med. Inf. 2019, 125, 30–36.
Swathi, D.N. Predicting Drug Side-Effects From Open Source Health Forums Using Supervised Classifier Approach. In Proceedings of the 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 10–12 June 2020; pp. 796–800.
Kang, K.; Tian, S.; Yu, L. Drug Adverse Reaction Discovery Based on Attention Mechanism and Fusion of Emotional Information. Autom. Control. Comput. Sci. 2020, 54, 391–402.
Zhang, Y.L.; Li, X.M.; Zhang, Z. Disease-Pertinent Knowledge Extraction in Online Health Communities Using GRU Based on a Double Attention Mechanism. IEEE Access 2020, 8, 95947–95955.
Fan, B.; Fan, W.; Smith, C.; Garner, H. Adverse Drug Event Detection and Extraction from Open Data: A Deep Learning Approach. Inf. Process. Manag. 2020, 57, 102131.
Zheng, W.; Lin, H.F.; Zhao, Z.H.; Xu, B.; Zhang, Y.; Yang, Z.; Wang, J. A Graph Kernel Based on Context Vectors for Extracting Drug–Drug Interactions. J. Biomed. Inf. 2016, 61, 34–43.
Martínez, P.; Martínez, J.L.; Segura-Bedmar, I.; Moreno-Schneider, J.; Luna, A.; Revert, R. Turning User Generated Health-Related Content Into Actionable Knowledge Through Text Analytics Services. Comput. Ind. 2016, 78, 43–56.
Yu, T.; Li, J.H.; Yu, Q.; Tian, Y.; Shun, X.; Xu, L.; Zhu, L.; Gao, H. Knowledge Graph for TCM Health Preservation: Design, Construction, and Applications. Artif. Intell. Med. 2017, 77, 48–52.
Anastopoulos, I.N.; Herczeg, C.K.; Davis, K.N.; Dixit, A.C. Multi-drug Featurization and Deep Learning Improve Patient-Specific Predictions of Adverse Events. Int. J. Environ. Res. Public Health 2021, 18, 2600.
Eftimov, T.; Koroušić Seljak, B.; Korošec, P. A Rule-Based Named-Entity Recognition Method for Knowledge Extraction of Evidence-Based Dietary Recommendations. PLoS ONE 2017, 12, e0179488.
Iqbal, E.; Mallah, R.; Rhodes, D.; Wu, H.; Romero, A.; Chang, N.; Dzahini, O.; Pandey, C.; Broadbent, M.; Stewart, R.; et al. ADEPt, a Semantically Enriched Pipeline for Extracting Adverse Drug Events From Free-Text Electronic Health Records. PLoS ONE 2017, 12, e0187121.
Kholghi, M.; Sitbon, L.; Zuccon, G.; Nguyen, A. Active learning: A step towards automating medical concept extraction. J. Am. Med. Inform. Assoc. 2016, 23, 289–296.
Peng, Y.F.; Wei, C.H.; Lu, Z.Y. Improving Chemical Disease Relation Extraction With Rich Features and Weakly Labeled Data. J. Cheminform 2016, 8, 53.
Mahendran, D.; McInnes, B.T. Extracting adverse drug events from clinical notes. AMIA Summits Transl. Sci. Proc. 2021, 2021, 420–429.
Lv, X.; Guan, Y.; Yang, J.; Wu, J. Clinical relation extraction with deep learning. Int. J. Hybrid Inf. Technol. 2016, 9, 237–248.
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444.
Li, L.S.; Wan, J.; Zheng, J.Q.; Wang, J. Biomedical Event Extraction Based on GRU Integrating Attention Mechanism. BMC Bioinform. 2018, 19, 285.
Luo, Y.; Cheng, Y.; Uzuner, Ö.; Szolovits, P.; Starren, J. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes. J. Am. Med. Inform. Assoc. 2018, 25, 93–98.
Yadav, S.; Ramesh, S.; Saha, S.; Ekbal, A. Relation extraction from biomedical and clinical text: Unified multitask learning framework. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 19, 1105–1116.
Gruetzemacher, R.; Gupta, A.; Paradice, D. 3D Deep Learning for Detecting Pulmonary Nodules in CT Scans. J. Am. Med. Inform. Assoc. 2018, 25, 1301–1310.
Xiao, C.; Choi, E.; Sun, J. Opportunities and Challenges in Developing Deep Learning Models Using Electronic Health Records Data: A Systematic Review. J. Am. Med. Inform. Assoc. 2018, 25, 1419–1428.
Jimenez, C.; Molina, M.; Montenegro, C. Deep Learning—Based Models for Drug-Drug Interactions Extraction in the Current Biomedical Literature. In Proceedings of the International Conference on Information Systems and Software Technologies (ICI2ST), Quito, Ecuador, 13–15 November 2019; pp. 174–181.
Dua, M.; Makhija, D.; Manasa, P.Y.L.; Mishra, P. A CNN–RNN–LSTM Based Amalgamation for Alzheimer’s Disease Detection. J. Med. Biol. Eng. 2020, 40, 688–706.
Zeng, X.; Song, X.; Ma, T.; Pan, X.; Zhou, Y.; Hou, Y.; Zhang, Z.; Li, K.; Karypis, G.; Cheng, F. Repurpose Open Data to Discover Therapeutics for COVID-19 Using Deep Learning. J. Proteome Res. 2020, 19, 4624–4636.
Watts, J.; Khojandi, A.; Vasudevan, R.; Ramdhani, R. Optimizing Individualized Treatment Planning for Parkinson’s Disease Using Deep Reinforcement Learning. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 5406–5409.
Yuan, S.; Yu, B. HClaimE: A Tool for Identifying Health Claims in Health News Headlines. Inform. Process. Manag. 2019, 56, 1220–1233.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.