Updated: Jul 17
There’s a lot of discussion about the potential implications of advances in AI on qualitative data analysis happening right now. The cynical might even say there’s a whole bunch of jumping on the bandwagon going on, heralding shortcuts to analysis (mainly around time-saving) and new “insights” supposedly afforded by these technologies. Others have their literal and metaphorical arms in the air in horror at the idea of computer assistance for qualitative analysis in this form, aghast at the idea of Qualitative-AI and it’s potential to harm our “craft”. As is often the case, it’s the more measured middle ground where the most meaningful and useful debates are to be had.
AI is here to stay and clearly has an impact on the practice of qualitative data analysis, but before the qualitative research community dismisses it out of hand or embraces it without critical thinking, let’s remember and reflect on two things;
AI, and in particular machine learning, likely the most useful form for qualitative data analysis in my opinion, is actually not new in the CAQDAS-field;
the way AI is viewed and the uses to which it can usefully be put are contingent on several factors, not least methodological paradigm, amounts and kinds of qualitative materials being analysed, and the purposes to which the AI tools are put within our workflow.
This post looks briefly at the history of AI in qualitative software (CAQDAS-packages), giving a quick overview of options and the topics being discussed right now in this space. I’ll make some comments about point 2. in a later post - The methodological piece: the role of AI in qualitative analysis – coming soon.
It’s not new, folks
When you look at the trajectory of the Computer Assisted Qualitative Data AnalysiS (CAQDAS) field, AI actually has a relatively long history. The first qualitative software programs were available from the late 1980s; more than 30 years ago now. Debate about the use of computers in qualitative analysis has ensued ever since the beginning, so the discussions we’re now having are themselves not new (this article by Kristi Jackson, Trena Paulus & Nick Woolf is a good one to read if you’re interested in these debates and how unsubstantiated criticisms of CAQDAS are propagated).
Qualrus: the first ‘intelligent CAQDAS’ (no longer available)
The first CAQDAS-package to include real assistance - i.e. beyond searching for words/phrases and auto-coding the hits - came around 15 years into the CAQDAS-history, with Qualrus, developed by Prof Ed Brent (University of Missouri and Ideaworks Inc.) and available from 2002. Although no longer available, this CAQDAS-package included case-based reasoning, natural language understanding, machine learning and semantic networks to suggested codes based on patterns in qualitative data. These suggestions could be accepted or rejected as appropriate by the user and the program learnt based on those decisions which were subsequently used to inform subsequent suggested codes.
DiscoverText: balancing what humans and computers do best (free for academics)
DiscoverText, an online Ap developed by Dr Stuart Shulman and colleagues under the auspices of U.S. National Science Foundation-funded academic research has been balancing human interpretation and the power of machine learning since 2009, almost fifteen years now. It developed out of the Coding Analysis Toolkit (CAT) that was a tool for qualitative measurement and adjudication available from 2007-2020. Amongst the key sets of AI tools embedded in DiscoverText are: automatic duplicate detection and near-duplicate clustering (analogous to plagiarism detection), and machine-learning coding tools based on initial human coding (undertaken collaboratively by ‘peers’) that is adjudicated by humans, and then used to train a machine classifier that will score the likelihood that additional data falls into the categories and then coding on that basis.
If you want to know more about DiscoverText, you can check out:
the DiscoverText website (where you can access it for free if you’re an academic),
the #CAQDASchat podcast episode I did with Stu Shulman recently where he discusses the tool and describes its core features,
the scholarly and other published mentions of DiscoverText where you can see how others have been using it
a webinar Stuart Shulman gave for the CAQDAS Networking Project on humans and machines learning together
Provalis Research tools (WordStat and QDA Miner)
Provalis Research develops several analytic products, among which are QDA Miner and Wordstat that provide tools for the qualitative analysis and data mining of textual material. These tools include both unsupervised machine learning models, such as topic extraction using clustering (available in WordStat since 1999), clustered coding (available in QDA Miner since 2011), topic modelling (available in WordStat since 2014), and supervised machine learning such as automatic document classification (available in WordStat since 2005), query-by-example (available in QDA Miner since 2007) and code similarity searching (available in QDA Miner since 2011).
If you want to know more about these tools, you can check out:
The Provalis website (where a search for ‘machine learning’ will bring up several resources, including
The recording of a webinar presented by Provalis Research CEO Normand Péladeau on Automatic Document Classification
A webinar Normand Péladeau gave for the CAQDAS Networking Project on The black box of sentiment analysis: What's in it, and how to do it better.
A review of QDA Miner myself and Ann Lewins wrote for the CAQDAS Networking Project in 2020
Developed by Andrew Smith and Michael Humphreys in 2000, Leximancer includes unsupervised machine learning tools for automatic content analysis that generate concept models of textual material presented in visualisations of categories and relationships. The language models used are based on the data being analysed and using semi-supervised learning can be trained using user-specified variables.
If you want to know more about these tools check out
o A review of Leximancer myself and Ann Lewins wrote for the CAQDAS Networking Project in 2020
o A webinar given by Andrew Smith for the CAQDAS Networking Project on managing patterns of meaning latent in text using Leximancer
o This article about unsupervised semantic mapping of natural language with Leximancer concept mapping, by Andrew Smith and Michael Humphreys
o The Leximancer website
Other forms of assistance for qualitative data analysis: automated transcription
It’s also worth mentioning that AI has also infiltrated the qualitative research field in the form of automated transcription for several years, and the Covid-pandemic accelerated its normalisation as qualitative researchers, many of whom may previously have balked at the idea of gathering qualitative data online were forced to do so. Suddenly it became the norm to interview participants and conduct focus-group discussions via video-conferencing platforms such as Zoom, Microsoft Teams and Google Hangouts. With the recordings came the automated transcripts, and suddenly their use began to become more widely accepted in our community of practice. Note, also that some CAQDAS packages had previously developed AI driven automated transcription tools, such as NVivo Transcription and the more recently developed Quirkos Transcribe
Newer Qual-AI tools
Okay, so now we’ve established that computer-assistance in qualitative analysis isn’t actually “new”, what’s causing all the hullabaloo right now? It seems to me there are three genres of newer Qual-AI tools on the deck that qualitative researchers are variously embracing, exploring and resisting:
using chatbot tools like ChatGPT alongside CAQDAS-packages to facilitate different aspects of qualitative data analysis;
the integration of these newer AI capabilities into existing CAQDAS-packages; and
the development of new Aps designed specifically to harness new AI capabilities for qualitative analysis.
Let’s check each out…
ChatGPT in qualitative analysis
The release of OpenAI’s ChatGPT is what prompted much of the hullabaloo we’re seeing in the qualitative community; what many researchers are responding to – some in delight, some in interest, some in despair! Many of us have experimented and commented on the use of ChatGPT use alongside CAQDAS-packages, coming to different conclusions…here’s three…
Philip Adu discusses using ChatGPT as a data summarization tool, advocating this when working with large amounts of data that it is “not humanly possible” to analyse. The example Philip gives in this presentation is extracting transcripts from YouTube and having ChatGPT summarize the content as a pre-cursor to analysis that is then undertaken using NVivo. Summaries of different lengths are generated by ChatGPT using prompts such as ‘Summarize the following’ (producing a short paragraph) and ‘Continue with 1000 words of summary’ to generate a more detailed summary (see circa 30 minutes into the presentation).
Andreas Muller discusses in this video using ChatGPT to summarize coded data, generate ideas for developing a coding framework and to define codes in addition to summarizing data itself. For example, these uses in combination with MAXQDA, including prompts for using ChatGPT to summarize an interview transcript, coded data segments relating to a topic, suggesting potential codes from qualitative material and generating summaries of coded segments that can be used as code definitions (including examples).
The integration of open-AI into existing CAQDAS-packages
We’ve recently seen two of the pioneer CAQDAS-packages integrate Open-AI tools into their products (subsequent posts will look at these in more detail)
ATLAS.ti has recently released a beta version of it’s Open-AI powered “open coding” feature that will automatically suggest and code selected textual transcripts. The user is presented with the suggested codes and has the option to accept or reject them (and also add your own codes to quotations) before choosing to implement the coding. The coding happens at the level of paragraphs (from which the coded quotations are created). To find out more check out the following post in this series
MAXQDA had also recently released a beta version of it’s “AI Assist” tool which offers the creation of different levels of summary (standard, shorter or text in bullet points) based on segments of text that have already been coded. These summaries can be added to Code Memos or generated within Summary Grids and then used in the same way any summaries of coded data would have been, that researchers had written themselves. The MAXQDA AI Assist tool is also powered by Open AI. To find out more check out the following post in this series
CoLoop - a new Ap designed for Qualitative-AI
In addition are the emergence of new Aps designed specifically to facilitate qualitative analysis using Open-AI tools. One example, currently in beta, is CoLoop, developed by Genei.io (keep an eye out for a subsequent post from me detailing more about this tool).
CoLoop is an “AI Copilot” for qualitative research that allows the user to ask questions of the data using AI prompts using a “chat” function. This can be used to summarize textual material that you upload into the system (e.g. transcripts of interviews and focus-groups or other textual material) based on prompts. The supporting evidence – i.e. the qualitative data that the summaries are based on – are accessible and navigable within the system. In addition, users specify at the outset what the project objectives are, and can optionally upload an overview or interview questions etc. that are used to teach the system what you’re trying to achieve with the analysis. This is used by CoLoop to generate an editable project description. Speakers labelled in a customary way will be automatically identified such that you can ask questions based on what certain speakers say about a topic etc. in the transcripts.
Once data are uploaded, CoLoop will automatically generate and allow you to review codes, based on the content of the uploaded material. You can also add your own codes to the system. This produces a high-level overview of the material, which can be refined by using prompts to ask specific questions (there are a series of suggestions provided based on the content of the material uploaded). A series of separate ‘conversations’ can be held in the CoLoop without loosing earlier content – by resetting the chat to prevent getting stuck in loops. CoLoop also includes AI generated transcription, allowing audio files to be uploaded and transcribed, and an AI-generated ‘analysis grid’ that allows the overviews and verbatim segments to be compared across speakers.
The key difference for qualitative data analysis between CoLoop and the use of OpenAI powered tools like ChatGPT is that what CoLoop generates is based on the transcripts you upload, rather than its understanding generated from online sources. In addition, the algorithm has been instructed against hypothesising or offering suggestions unless explicitly instructed, and all information contains referenced sources to the transcripts. Therefore, where there is no relevant answer in the transcripts to a prompt given by the researcher, CoLoop will say so.
To find out more check out the following post in this series
There’s much afoot
So its clear there’s a lot going on in the Qual-AI space – and there has been for a while. You might not have been aware of the use of AI in CAQDAS before the more recent developments, and if you weren’t then its definitely worth exploring them in more depth – as well as those newer ones on the block in the past few months.