Health services research (HSR) benefits from qualitative approaches, which build understanding of complex issues and the ‘hows and whys’ of the way health services function. Rigorous qualitative studies can, for example, offer nuanced insights into patient and provider experiences or systemic barriers to health care access and quality. However, the volume of data in qualitative research demands substantial human effort to analyze. A recent study we conducted involved 12 clinical sites with 12 hour-long interviews per site, which generated about 170 hours of data not including archival material.
Unlike earlier natural language processing models, Generative AI and large language models (LLMs) allow researchers to more actively and creatively participate in the analysis through natural language interaction. LLMs offer powerful tools that could potentially improve the efficiency, creativity, and quality of qualitative research. By accelerating labor-intensive processes such as transcription, coding, and analysis, LLMs may enable researchers to study larger qualitative datasets. Many bold claims have been made by qualitative data analysis software companies about the time- and cost savings gained by harnessing ‘the power of AI’, and even the potential to ‘uncover’ novel or deeper insights than human-only analysis.
Despite the proliferation of new LLM-based qualitative data analysis tools, work is still being done to determine how best to operationalize the use of LLMs to enhance qualitative data analysis for HSR. Doing so requires rethinking traditional approaches to qualitative analysis, as well as careful consideration of what constitutes ‘enhanced’ data analysis. As such, health services researchers looking to use LLMs must grapple with at least two key questions: (1) how to evaluate the outcomes of LLM-driven/AI-assisted qualitative analyses, and (2) how to assess the utility of LLMs while models are continuously improving.
1. Evaluating AI-assisted Qualitative Analysis
Evaluating AI-assisted qualitative analysis presents unique challenges compared to AI-assisted quantitative analysis, where outcomes can be objectively measured. The quality of qualitative research is typically evaluated in relation to the rigor, transparency, and coherence of its methods, as well as the extent to which it makes a meaningful contribution to theory or practice. Qualitative researchers bring diverse frameworks and literature to the data they analyze: two rigorous and valid investigations of the same dataset may lead to different interpretations of the data and different recommendations, in ways that reflect the specific questions and theoretical lenses that informed the data analysis. As a result, replicability (i.e., whether different researchers produce the ‘same’ analysis) is contested as a marker of quality in qualitative research. Therefore, assessing AI output based on its ability to produce the same output as human investigators represents a limited strategy, and may even be misleading.
Appropriate questions for evaluating AI use in qualitative research might include:
- How do AI-assisted processes compare to manual approaches in terms of efficiency?
- How, if at all, do AI-assisted workflows strengthen outcomes and conclusions or the applied or theoretical contribution of the analysis?
- Does AI assistance enhance creativity or facilitate identification of novel themes or patterns in the data?
- Which tasks or aspects of analysis are AI-assisted processes best suited to, and which require more or sole input from the researcher?
2. Assessing Utility While Models Are Continuously Improving
The rapid evolution of LLMs necessitates a flexible approach to assessing their utility. For example, Open AI’s GPT-4o model was released in May 2024 followed by Open AI’s o1 model in December 2024. The advances are evident in their performance: o1 achieved 78.0 percent accuracy on PhD-level science questions, significantly outperforming GPT-4o at 56.1 percent and even surpassing expert human performance at 69.7 percent. In January 2025, DeepSeek-R1 was released with comparable performance of o1 while being 90-95 percent cheaper to use for consumers. There are at least five important implications of this rapid technological progress for using AI-assisted qualitative analysis.
- Anticipate Emerging Models: Qualitative investigators should remain open to AI’s potential even if current models do not fully meet their needs because emerging versions may quickly do so.
- Invest in Flexible Infrastructure: Research teams and institutions need to build processes to securely use AI with sensitive data in compliance with their IRB or institutional protocols and systems for securely storing and accessing data for AI-assisted workflows. Developing adaptable AI infrastructure can be time-consuming and resource-intensive, so we urge researchers to start this process early.
- Experiment with Human-AI Collaboration: Qualitative investigators should actively experiment with integrating AI into their end-to-end workflow to better understand how AI works and where it can provide the most value as LLMs continue to evolve. This process requires investments in time and resources as researchers seek to find the right balance between manual methods and AI-based analysis and determine how the AI can complement existing workflows. For example, within the qualitative analysis process, researchers might use AI-assisted tools to automate portions of coding and theme generation, such as those offered by software like ATLAS.ti or NVivo. Alternatively, some may move away from conventional coding entirely and adopt conversational analysis, using LLMs interactively to explore data through iterative, dynamic prompts.
- Develop Model-Agnostic Methods: Qualitative investigators should design workflows and evaluation frameworks that are not tied to specific AI models. It is important not to over-rely on specific methods that can become redundant or obsolete as models improve. For example, researchers could work to develop prompt engineering techniques suited to a current model, only to find the technique is not applicable to an updated model. Instead, focusing on methods that emphasize human-AI collaboration (as suggested above) will produce insights and capabilities valuable for the longer term.
- Foster Interdisciplinary Collaboration: By engaging with cutting-edge developments in computer science now, qualitative investigators can shape the development of AI tools that are more relevant and effective for their field. Through interdisciplinary partnerships, qualitative investigators will benefit from early access to emerging technologies, while computer scientists will gain valuable input on the practical challenges and opportunities of their technologies. Together, researchers can establish best practices and standards that aim to both facilitate access to AI’s transformative potential and also expose AI’s pitfalls, preventing misguided or inappropriate use of these new tools that damages the quality of qualitative research. This relationship creates feedback loops that drive innovation and ensure that AI can be integrated into qualitative research in effective and impactful ways.
In qualitative research, AI should not be expected to replace human researchers but to serve as a tool that augments their capabilities. Rather than asking whether AI can exactly replicate the outcomes of manual methods, researchers should explore how AI can complement and enrich traditional qualitative approaches through human-AI collaboration. This requires institutional flexibility and investment, adaptable infrastructure, and experimentation to enable integration that remains relevant while technology improves and maintains coherence with the goals and context of diverse health services research studies.