Qualitative Research Methods for Large Language Models

Qualitative Research Methods for Large Language Models: Comparison

Please note this is a comparison between Version 1 by Andreas Dengel and Version 2 by Lindsay Dong.

In the current era of artificial intelligence, large language models are being increasingly used for various applications, such as language translation, text generation, and human-like conversation. The fact that these models consist of large amounts of data, including many different opinions and perspectives, could introduce the possibility of a new qualitative research approach: Due to the probabilistic character of their answers, “interviewing” these large language models could give insights into public opinions in a way that otherwise only interviews with large groups of subjects could deliver.

large language models
qualitative research
artificial intelligence
machine learning

1. LWharge Language Models (LLMs) t Are LLMs?

Large language models (LLMs) are artificial intelligence (AI) models based on deep learning (i.e., neural networks) used to generate text ^[1][2]. These models have a complex underlying architecture and a large number of parameters, trained on very large amounts of existing documents. While many older natural language processing approaches used supervised learning for specific tasks, most LLMs use semi-supervised approaches, which makes it easier to train them on large quantities of data.

The introduction of transformer models by Google ^[3] also helped to train large models more quickly, since this new architecture allowed for greater parallelization of training, which reduced training times compared to older architectures such as recurrent neural networks (RNNs). This allowed the creation of models that were pretrained on large amounts of text, such as Google’s BERT (bidirectional encoder representations from transformers) ^[4].

In 2018, OpenAI introduced the first generative pretrained transformer (GPT) ^[5]. While there had been other pretrained models before, GPT also had generative capabilities. The model was trained with a mixture of unsupervised pretraining to set general parameters and a supervised fine-tuning step in order to adapt to specific tasks. This first version of GPT was trained on 4.5 GB of text from unpublished books ^[6]. In the following years, OpenAI released several updated GPT models. In 2020, they published GPT 3 ^[7], which was trained on around 570 GB of text from a filtered version of Common Crawl, a openly available crawl of the internet. For newer models such as GPT 3.5 and GPT 4 ^[8], which were released in early 2023, no official information about the training data is available. In November 2022, OpenAI released ChatGPT which is based on GPT 3.5 and is fine-tuned for chatting. With the release of GPT 4, a version of ChatGPT using the newer model was also made available to paying subscribers. Just like the GPT 3.5 and 4 models, no official information on training data and parameters are available for ChatGPT. In general, the abilities of GPT lie in “its learning conditional probabilities in language (its so-called “statistical capabilities”)” ^[9].

While OpenAIs models are the most prominent LLMs, they are not the only ones available. The large popularity of ChatGPT lead to other companies releasing their own competitors. For example Meta released LLaMA (Large Language Model Meta AI) in February 2023 ^[10]. Google, who laid the groundwork for LLMs with the development of the transformer architecture, also developed their own LLMs. One of those is LaMDA (Language Model for Dialogue Applications) ^[11], which was first announced in 2021. Another model called PaLM (Pathways Language Model) ^[12] was first made available in March 2023. An updated version called PaLM 2 was announced later that year. The PaLM model is trained on a collection of web documents, books, Wikipedia, conversations, and GitHub code. In response to the release of ChatGPT, Google launched their own LLM-powered chatbot named BARD in March 2023. It was originally based on LaMDA models, but now uses PaLM ^[12].

2. Potential Problems in Conducting Qualitative Research

In qualitative research, there are several challenges that need to be addressed.“Qualitative research” does not have a singular definition, similarly to quantitative research. Different qualitative methods are required to cater to specific research objectives ^[13]. However, qualitative research methods are often grouped together, leading to the application of inconsistent standards. This fails to capture the essence of qualitative research. As interpretation and analysis play a crucial role in qualitative research, there is the potential for subjective biases to influence the findings ^[14]. Researchers’ personal beliefs, experiences, and preconceptions can inadvertently affect participant selection, data collection methods, and analysis, leading to biased results ^[15]. For instance, even the formulation of interview questions can be affected by judgment and subjectivity, leading to suggestive questions or limiting participants’ freedom of expression. All these potential issues require careful consideration from the researchers’ side throughout the process. Moreover, potential conflicts of interest may arise concerning the stakeholders involved in the study. Evaluating and including critical responses, as well as acknowledging possible conflicts of interest, constitute challenges for researchers. These challenges are not only related to the interview method, but similar complications can occur with other forms of qualitative research. Purposive or convenience sampling is often used in qualitative studies, which may not provide representative results for a broader population. Limited sample sizes also raise concerns about the generalizability of findings ^[16]. While sample size is less critical in qualitative studies compared to quantitative studies, randomization of participants still poses a potential problem. Efforts are often made to enhance the reproducibility of qualitative studies; however, significant challenges exist. For example, if a study is conducted using randomization in a school setting, different results may be obtained, depending on which school and city are chosen. Online surveys can also present challenges, as the responses tend to be primarily from ambitious participants who may bias the study’s results. Consequently, researchers must carefully consider their sampling strategy and acknowledge the limitations of their study. Establishing the trustworthiness, validity, and reliability of qualitative findings is another persistent challenge ^[17]. Unlike quantitative research, which can rely on statistical measures for objectivity, qualitative research heavily depends on the researcher’s interpretation ^[18]. Strategies such as triangulation, member checking, and inter-rater reliability can enhance validity and reliability but are not foolproof. Qualitative content analysis (e.g., ^[19]) tries to address the issue of objectivity in particular by providing verifiable guidelines for the analysis of a text (e.g., transcripts). Additionally, qualitative research often involves engaging in personal and sensitive discussions with participants (e.g., ^[20]). Researchers must ensure informed consent, protect confidentiality and privacy, and navigate ethical dilemmas, including power imbalances and the potential for harm. These ethical considerations add another layer of complexity to qualitative research. In conclusion, qualitative research poses various challenges that researchers need to address. From issues related to inconsistent standards and subjective biases to sampling limitations, reproducibility challenges, and establishing validity and reliability, conducting qualitative research requires careful consideration and strategic planning. Furthermore, ethical concerns must be taken into account to ensure the well-being and privacy of participants. As large language models are based on a large amount of existing documents, and thus reflecting (most of the time) real opinions and perspectives on certain topics, AI could potentially offer some potential solutions to certain challenges. However, its introduction into qualitative research also raises questions and considerations about bias, transparency, and data privacy.

3. Existing Work on Qualitative Research Methods with AI and LLMs

Artificial intelligences and LLMs have not only been a topic of research since ChatGPT 3.5 and the associated breakthrough in public perception but already support numerous disciplines in practice, such as in medicine ^[21], healthcare ^[22], and economics ^[23]. All of these studies primarily focus on the exploration of artificial intelligences through the use of qualitative research methods and do not employ qualitative research methods with artificial intelligences. In education, on the other hand, the use of AI and LLM is largely unexplored. For example, one of the few research papers in this area addressed how education, teaching, and learning could be improved by using AI for qualitative data analysis ^[24]. Christou ^[25] examined the widespread impact of artificial intelligence in research and academia, particularly in qualitative research through literature and systematic reviews, addressing its strengths, limitations, ethical dilemmas, and potential biases. He proposed five key considerations for its appropriate and reliable use, including understanding AI-generated data, addressing biases and ethical concerns, cross-referencing information, controlling the analysis process, and demonstrating the cognitive input and skills of the researcher throughout the study. In Christou’s discussion on the role of AI in qualitative research, the example of InfraNodus was given: This AI system was designed to perform various tasks related to textual data analysis, such as the categorization of information, cluster creation, and creating visual graphs. This can, for example, be used on texts created from interviews ^[25]. Furthermore, it can be assumed that the researcher should engage in some degree of manual coding or categorization, due to the reliance on analytical software or AI systems, which often employ predefined rules or algorithms to identify patterns, themes, or keywords in text data. Additionally, the researcher is responsible for providing comprehensive documentation of the analysis methodology, justification, and precise execution procedure employed. It is essential for the researcher to be able to explain the rationale and algorithms employed by the AI system in conducting the analysis, and the researcher’s cognitive evaluative skills play a valuable role in the analytical process and the formulation of conclusions ^[25]. Drawing such conclusions from an AI’s analyses to answer practical research questions requires the expertise and contextual knowledge of the researcher ^[26][27]. It can be concluded that AI can be used in qualitative research (e.g., for systematic reviews, qualitative empirical studies, and conceptual studies), but only if the researcher adheres to certain key considerations and guidelines ^[25].

References

Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and other large language models are double-edged swords. Radiology 2023, 307, e230163.
Rillig, M.C.; Ågerstrand, M.; Bi, M.; Gould, K.A.; Sauerland, U. Risks and benefits of large language models for the environment. Environ. Sci. Technol. 2023, 57, 3464–3466.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; Volume 30.
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805.
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 18 July 2023).
Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A.; Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 19–27.
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
OpenAI. GPT-4 Technical Report. 2023. Available online: http://xxx.lanl.gov/abs/2303.08774 (accessed on 18 July 2023).
Sobieszek, A.; Price, T. Playing games with AIs: The limits of GPT-3 and similar large language models. Minds Mach. 2022, 32, 341–364.
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971.
Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. Lamda: Language models for dialog applications. arXiv 2022, arXiv:2201.08239.
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. arXiv 2022, arXiv:2204.02311.
Oppong, S.H. The problem of sampling in qualitative research. Asian J. Manag. Sci. Educ. 2013, 2, 202–210.
Mruck, K.; Breuer, F. Subjectivity and reflexivity in qualitative research—A new FQS issue. Hist. Soc. Res./Hist. Sozialforschung 2003, 28, 189–212.
Chenail, R.J. Interviewing the investigator: Strategies for addressing instrumentation and researcher bias concerns in qualitative research. Qual. Rep. 2011, 16, 255–262.
Higginbottom, G.M.A. Sampling issues in qualitative research. Nurse Res. 2004, 12, 7.
Whittemore, R.; Chase, S.K.; Mandle, C.L. Validity in qualitative research. Qual. Health Res. 2001, 11, 522–537.
Thomson, S.B. Qualitative research: Validity. Joaag 2011, 6, 77–82.
Mayring, P. Qualitative content analysis. Companion Qual. Res. 2004, 1, 159–176.
Surmiak, A.D. Confidentiality in Qualitative Research Involving Vulnerable Participants: Researchers’ Perspectives; Forum Qualitative Sozialforschung/Forum: Qualitative Social Research (FQS): Berlin, Germany, 2018; Volume 19.
Laï, M.C.; Brian, M.; Mamzer, M.F. Perceptions of artificial intelligence in healthcare: Findings from a qualitative survey study among actors in France. J. Transl. Med. 2020, 18, 14.
Haan, M.; Ongena, Y.P.; Hommes, S.; Kwee, T.C.; Yakar, D. A qualitative study to understand patient perspective on the use of artificial intelligence in radiology. J. Am. Coll. Radiol. 2019, 16, 1416–1419.
Yang, Y.; Siau, K.L. A Qualitative Research on Marketing and Sales in the Artificial Intelligence Age. Available online: https://www.researchgate.net/profile/Keng-Siau-2/publication/325934359_A_Qualitative_Research_on_Marketing_and_Sales_in_the_Artificial_Intelligence_Age/links/5b9733644585153a532634e3/A-Qualitative-Research-on-Marketing-and-Sales-in-the-Artificial-Intelligence-Age.pdf (accessed on 18 July 2023).
Longo, L. Empowering qualitative research methods in education with artificial intelligence. In World Conference on Qualitative Research; Springer: Cham, Switzerland, 2019; pp. 1–21.
Christou, P. How to Use Artificial Intelligence (AI) as a Resource, Methodological and Analysis Tool in Qualitative Research? Qual. Rep. 2023, 28, 1968–1980.
Christou, P.A. How to use thematic analysis in qualitative research. J. Qual. Res. Tour. 2023, 1, 79–95.
Guest, G.; MacQueen, K.M.; Namey, E.E. Applied Thematic Analysis; Sage Publications: Thousand Oaks, CA, USA, 2011.