Prompt Engineering in Medical Education: Comparison
Please note this is a comparison between Version 1 by Thomas F Heston and Version 2 by Fanny Huang.

Prompt engineering is a systematic approach to effectively communicating with generative language models (GLMs) to achieve the desired results. Well-crafted prompts yield good responses from the generative language models (GLMs), while poorly constructed prompts will lead to unsatisfactory responses. Besides the challenges of prompt engineering, significant concerns are associated with using GLMs in medical education, including ensuring accuracy, mitigating bias, maintaining privacy, and avoiding excessive reliance on technology.

  • prompt engineering
  • medical education

1. Introduction

Generative language models (GLMs) are neural networks trained primarily on language data gathered from the Internet. GLMs are large language models specifically designed to generate high-quality, human-like text. GLMs are built upon a generative pre-trained transformer model (GPT). The first version, GPT-1, was released in 2018 [1]. This version had approximately 117 million parameters utilizing just over 100,000 nodes. Since then, the scale of GPT models has rapidly increased. GPT-2, released in 2019, had around 1.5 billion parameters, followed by GPT-3 in 2020, with 175 billion parameters. The latest version, GPT-4, released in 2023, is estimated to utilize 1 trillion parameters [2].
One notable development in the GPT series is the introduction of GPT-3.5, which includes an online chat interface. OpenAI introduced ChatGPT in 2022, allowing users to interact directly with GPT-3.5 and GPT-4. ChatGPT employs natural language processing and can respond to various inputs from human users. It can understand multiple languages, including computer coding languages, and perform data analysis and basic mathematical calculations. Other GLM chatbots such as Google Bard and Bing AI have real-time access to the Internet, and Anthropic easily allows uploading files for analysis. However, for all GLMs, structuring the input in a specialized manner ensures the most appropriate output. This process, called prompt engineering, effectively communicates with the GLMs to achieve desired results [3]. Although in existence for less than a year, GLM chatbots, including medical education, dramatically impact society.
Prompt engineering is a crucial process in maximizing the benefits of GLMs. Not only is it a method of optimizing responses from ChatGPT, Google Bard, and similar chatbots, it is also a method to challenge thinking and develop greater understanding in students [4]

2. Generative Language Models in Medical Education

GLMs have great potential to improve learning and comprehension in medical education. They can interactively and in real-time interact with a human user using a natural language such as English or Spanish. Because of their ability to communicate in natural languages, GLMs have the potential to simulate realistic patient scenarios, provide useful information on various medical topics, and assist in developing patient communication skills [5].
GLMs, due to their extensive training databases, contain a tremendous volume of medical information. A recent study looked at the performance of ChatGPT on the United States Medical Licensing Exam (USMLE). The researchers found that ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Moreover, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that GLMs have significant potential to assist with medical education and even potentially aid in clinical decision-making [6][7].
The use of GLMs in medical education is part of a broader trend toward digitization and the incorporation of technology in teaching. This trend has been accelerated by the COVID-19 pandemic, which has required remote learning and reliance on online resources. Utilizing GLMs is a critical component of this trend, offering the potential to enhance personalized learning, foster critical thinking, and improve evidence-based thinking in medicine [8][9].
GLMs can also create realistic patient simulations and give personalized feedback to the student. They can help overcome language barriers and assist students in learning a foreign language, focusing on healthcare settings. However, despite these advantages, ensuring content quality, addressing biases, and managing ethical and legal concerns remain challenges in using artificial intelligence (AI) and GLMs in medical education [10].

3. Prompt Engineering in Generative Language Models

Prompt engineering is crucial to utilizing large language models effectively, especially in medical education. It involves designing the input or ‘prompt’ in a way that guides the model to produce the desired output [11].
In medical education, prompt engineering can create realistic patient scenarios, generate multiple-choice questions, or provide explanations of complex medical concepts. Prompt engineering can also control the model’s output’s length, complexity, and style. For example, prompts can be designed to elicit short, simple responses for beginner students or more complex, detailed responses for advanced learners. Prompt engineering can also generate messages appropriate for patient education and mass media campaigns [12]. Moreover, prompt engineering can help minimize potential pitfalls, such as the generation of incorrect or misleading information. Educators can guide the model with carefully crafted prompts to provide more accurate and reliable information.

4. Types of Prompts

4.1. Zero-Shot and Few-Shot Prompts

A zero-shot prompt asks a question of the GLM about data that it was not specifically trained on. The “zero” in “zero-shot” represents that the GLM has little or no specific training on the specific task or question in the prompt. “Shot” represents giving the GLM an example, so “zero-shot” means that the GLM was not specifically trained to do the task or answer the question and that the prompt itself does not give an example for the GLM to work off of. Translation tasks are examples of zero-shot prompts because GLMs haven’t been given specific training examples. However, based on its extensive training in languages, it can generalize and generate a plausible translation without task-specific training. Few-shot prompts are like zero-shot ones in that the GLM hasn’t been specifically trained to answer the question or do the requested task. However, the prompt contains an example to help the GLM understand the request. For example, the prompt “Give me a quiz” is zero-shot, but “Give me a quiz on alcoholic cirrhosis” is a few-shot prompt.

4.2. Prompting Levels

It has been proposed that prompts can be categorized into levels 1 to 4 [13]. The first level is a simple question. The next level adds context about the writer and the GLM. The third level provides examples for the GLM to work from, and the fourth level allows the GLM to break down the request into components. It is similar to how telling GPT-3 to work a mathematical problem step by step, given the GLM components, helps it work through the prompt more accurately. Sometimes, it helps to have the GLM assist in creating a prompt. Table 1 gives an example of an iterative prompt that helps generate a prompt that the GLM can understand and use to provide the desired output.
Table 1. Iterative prompts have the GLM help create the ideal prompt.
Prompt

Your first response will be to ask me what the prompt should be about. Together, researchers will create a clear prompt through continual iterations by going through the next steps. Based on my input, you will generate two sections: (a) revised prompt (provide your rewritten prompt. It should be clear, concise, and easily understood by you). (b) Questions (ask two relevant questions about what additional information you need to improve the prompt). researchers will continue this iterative process with me providing additional information to you and you updating the prompt in the Revised prompt section. When I say researchers are done, you will reply with the final revised prompt and nothing else.

  • Level 1 prompts ask simple questions like “Tell me about type-2 diabetes.”
  • Level 2 prompts add context to Level 1, e.g., “You are to play the role of a Professor of Medicine at Oxford, and I am your student. Tell me about type-2 diabetes.”
  • Level 3 prompts involve giving examples of Level 2 prompts. For example, users may start with this prompt: “I learn best by reading short essays. Here is an example of an essay particularly educational to me: [here cut-n-paste an example essay].” Then, submit the Level 2 prompt previously given, and the output should be closer to the desired result.

4.3. Structured Prompts

Another proposed method to consistently get good results is to provide key components to your prompts reliably. One method is to have a prompt containing the following components: context, general request, how the GLM is to act, and output format. The context is when you describe who is asking the question. For example, “I am a college freshman taking my first biology class”. This helps the GLM tailor the response to the prompter. The general request is a broad overview of what you want from the GLM. For example, “I need some help understanding the Krebs Cycle”. Next, the GLM is told how to act. One common way of doing this is to assign it a role. For example, “You are to play the role of my college professor who is knowledgeable about the Krebs Cycle and an outstanding teacher”. Finally, the GLM is told exactly what to do and how to format the output. To continue with the previous examples, researchers would now state, “Please provide me with a frequently asked question (FAQ) listing the most fundamental features of the Krebs Cycle. Please provide 15 items in the FAQ. Each question should be 25 words or less, and each answer should be 50 words or less”. The prompt, when completed, would be, “I am a college freshman taking my first biology class. I need some help understanding the Krebs Cycle. You are to play the role of my college professor, knowledgeable about the Krebs Cycle, and an outstanding teacher. Please provide me with a FAQ listing the most fundamental features of the Krebs Cycle. Please provide 15 items in the FAQ. Each question should be 25 words or less, and each answer should be 50 words or less”.

4.4. Iterative Prompts

5. Ethical Implications

Prompt engineering helps enable the ethical use of GLMs such as ChatGPT and Google Bard, but there remains significant challenges to overcome. Some of the major concerns in medical education include a) facilitating cheating, b) privacy violations, c) copyright violations, d) decreased education in teamwork, and e) reinforcement of biases [14]. While there has been rapid advancement in the neural net software architecture underlying GLMs, additional work needs to be done to ensure strong guardrails are established within the software itself in order to maintain safe and ethical boundaries [15].

References

  1. Improving Language Understanding by Generative Pre-Training . OpenAI. Retrieved 2023-9-6
  2. GPT-4 . Wikipedia. Retrieved 2023-9-6
  3. Learn Prompting . Learn Prompting. Retrieved 2023-9-6
  4. Heston, T. F. (2023). Prompt Engineering For Students of Medicine and Their Teachers. ArXiv. /https://arxiv.org/abs/2308.11628 doi: 10.48550/arXiv.2308.11628
  5. Rehan Ahmed Khan; Masood Jawaid; Aymen Rehan Khan; Madiha Sajjad; ChatGPT - Reshaping medical education and clinical management. Pak. J. Med Sci. 2023, 39, 605-607.
  6. Tiffany H. Kung; Morgan Cheatham; Arielle Medenilla; Czarina Sillos; Lorie De Leon; Camille Elepaño; Maria Madriaga; Rimel Aggabao; Giezel Diaz-Candido; James Maningo; et al.Victor Tseng Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit. Heal. 2023, 2, e0000198.
  7. Aidan Gilson; Conrad W Safranek; Thomas Huang; Vimig Socrates; Ling Chi; Richard Andrew Taylor; David Chartash; How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023, 9, e45312.
  8. Malik Sallam; ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Heal. 2023, 11, 887.
  9. Hyunsu Lee; The rise of ChatGPT: Exploring its potential in medical education. Anat. Sci. Educ. 2023, 00, 1-6.
  10. Mert Karabacak; Burak Berksu Ozkara; Konstantinos Margetis; Max Wintermark; Sotirios Bisdas; The Advent of Generative Language Models in Medical Education. JMIR Med Educ. 2023, 9, e48163.
  11. Jiaqi Wang; Enze Shi; Sigang Yu; Zihao Wu; Chong Ma; Haixing Dai; Qiushi Yang; Yanqing Kang; Jinru Wu; Huawen Hu; Chenxi Yue; Haiyang Zhang; Yiheng Liu; Xiang Li; Bao Ge; Dajiang Zhu; Yixuan Yuan; Dinggang Shen; Tianming Liu; Shu Zhang. Prompt Engineering for Healthcare: Methodologies and Applications; ArXiv: null, 2023; pp. 14670.
  12. Sue Lim; Ralf Schmälzle. Artificial Intelligence for Health Message Generation: Theory, Method, and an Empirical Study Using Prompt Engineering; null: null, 2022; pp. 07507.
  13. Improve ChatGPT Prompts with Priming . YouTube. Retrieved 2023-9-6
  14. Heston, T.F.; Khun, C.; The good, the bad, and the ugly of chat gpt in medical education. International Journal of Current Research 2023, 15(8), 25496-25499.
  15. Thomas F. Heston; Charya Khun; Prompt Engineering in Medical Education. Int. Med Educ. 2023, 2, 198-205.
More
Video Production Service