AI "open coding" beta from ATLAS.ti - what's it good for?

Updated: Jul 6, 2023

As discussed in my previous post, What’s afoot in the Qualitative AI space? various forms of AI, including machine learning, is not new in the qualitative space, with many CAQDAS-packages having implemented various tools over the past 15+ years. What we’re seeing now, though, in terms of generative AI, is different.

I’ve experimented with three of these, which I discuss in my next few posts. The first two are (very different) AI additions to existing and well-established qualitative software (CAQDAS) programs, the third is on of the new players in the field:

Beta of “AI open-coding” in ATLAS.ti (discussed in this post - keep reading...)
Beta of AI Assist from MAXQDA
The AI copilot beta from CoLoop

Framing my experimentations

When would these tools actually be useful? That’s always my starting point in considering new tools. We should never use a tool just because it’s available, but because it allows us to better accomplish an analytic need. That’s the frame within which I experimented with these tools and make comments about them. I'll pick up on these to discuss the broader methodological implications of generative-AI in the qualitative data analysis space, in a subsequent post

Disclaimer. Please note, at the time of writing all these tools are in beta and therefore they are likely to change significantly over coming months. Therefore if you’re reading this post more than a few weeks after I wrote it, you’ll need to check out the new developments that are likely to have been released since, as well.

ATLAS.ti’s “open coding” beta

The recent beta release by one of the pioneer CAQDAS packages ATLAS.ti offers pretty quick, what they’re calling “open coding” of texts using language models powered by the GPT models from OpenAI.

It will code paragraphs, so preparing text for the purpose needs to be considered to get the most out of it. This includes following the advice in the help menu concerning speaker identifiers, short paragraphs (which will be ignored) etc.

Is it useful for ‘open coding’?

As it’s currently implemented I’m yet to be entirely convinced of its usefulness for the “open coding” of primary qualitative data generated for a specific research project that has (even loose) research questions guiding the analysis. I came to this conclusion by trying it out in a few different contexts. First on data I know very well - transcripts of interviews I conducted for my PhD. I asked ATLAS.ti to “open code” eight of these transcripts. It said this would take more than an hour, in fact it did it in less than 10 minutes. “Hurrah” you might might be thinking, “that’s saved a mighty load of time”.

Time saving? Erm, depends what you’re trying to accomplish

But did it really? I’m not convinced. Why? Because it produced more than 450 codes, and most of them aren’t really “codes” as most qualitative researchers would understand ‘codes’ to be. With that many, the time it would take me to review them all and refine the coding framework (e.g. by merging AI generated codes that I decide are really capturing the same concept, getting rid of the ones that are not relevant to my analytic focus, etc.) would not be any less than if I had just done the coding myself from scratch. I still have to consider what has been coded by the system and review it in the context of the project objectives – so what it’s giving me at the moment isn’t time-saving if that’s my purpose.

Adjusting the AI-generated codes

What I do really like about how ATLAS.ti’s AI ‘open coding’ has been implemented is the way the results are initially displayed and the ease and flexibility with which you can review and refine them. It’s just that with this many codes, it isn’t time-saving. (in my methodological implications post I’ll come back to this in relation to the timing and role of computer and human when considering harnessing AI tools for qualitative analysis).

Isn’t open-coding meant to generate a lot of codes?

The process of open coding when researchers do it themselves can also generate a lot of codes – I’ve seen dozens and dozens projects over the years with more than 450 codes after a human-generated open coding process. So what’s the difference?

The human has done the work so knows the data intimately after the process so reviewing and refining the codes is informed by this in-depth understanding – with AI-generated open coding this is not the case, so the review/refine process of the codes needs to involve careful consideration of the data coded in order to do so.
The codes developed are completely different – human-generated open codes tend to be much more analytic, because they have been generated through interpretation in the context of the research project’s objectives and the interpretive lens of the researcher. AI-generated open codes are necessarily much more descriptive, and do not take the context of the research or the focus of the project into account.

Generating that many codes from open coding is always a problem, even when human-generated, and I’ve spent many hours over the years helping researchers move on to think more conceptually, and reduce the codes to a manageable and analytically useful framework. Sometimes that is indeed done by reviewing/refining 450+ codes, but often the decision is made to leave that version of the project, and start coding again, this time at a higher conceptual level. In those instances, the open coding is not seen as a waste of work, but as data familiarisation.

It is useful for data familiarisation

And that’s how I currently see the main value of ATLAS.ti’s AI-generated open coding: as an additional tool in our toolbox for familiarising. It’s certainly true that the list of codes generated give an overview of content which would be helpful when working with data that wasn’t generated specifically for a research project.

And so, at the moment, I’m much more convinced of its usefulness for exploring secondary qualitative materials that researchers are not already familiar with, by which I mean as an additional means of data familiarisation (a common initial phase of analysis in many established qualitative data analysis methods, including Grounded Theory, IPA, and Thematic Analysis). For such a purpose AI “open coding” as its currently implemented in ATLAS.ti is very useful, along with related text searching tools, such as Word Frequency, Concept Search, Opinion Search, etc.

How well you already ‘know’ the data informs which tools are useful

When you gather data through primary data collection methods (e.g. observations, interviews, focus-group discussions etc.) you ‘know’ the data before analysis ‘formally’ begins (particularly if you decide not to use automated transcription tools like those mentioned in my previous post, in favour of transcribing yourself when transcription becomes an important moment of contact with data.) But when data has been generated by others or for a non-research purpose you, the analyst, isn’t familiar in the same way before ‘formal’ analysis begins.

In such projects there may be a place for generative-AI tools, and the main value I see of the ‘open-coding’ tool provided by ATLAS.ti as it is currently implemented, is for such purposes – namely to quickly explore text at a high level as a means of becoming familiar, potentially useful as a pre-cursor to (but not as yet a replacement of) human coding.

Exciting times

That said, I am excited for the future, I see the application of generative-AI tools in CAQDAS packages such as ATLAS.ti as another tool in our toolbox. These programs are like garden sheds, full of tools that we pick and choose appropriate to the task at hand. For me an appropriate task for AI-open coding is data familiarisation not coding, but that may likely change as the tools develop.

I’ll pull out some of these reflections and place them in a methodological context in a later post, but next will come my thoughts on the beta of AI Assist in MAXQDA, and then a new generative-AI tool for qualitative analysis, CoLoop.

Make informed decisions: try it out for yourself

It's always best to make your own decisions about what tools work best for you. It's great to hear other researcher's thoughts and experiences, and hopefully this post is useful in that respect, but your project is yours, you are the expert in what is needed, whether its data, methods or tools. The best way to decide whether a new development is appropriate is to try it out for yourself. That's the only way any of us can make truly informed decisions

Next up

I'll be reporting on my experimentations with MAXQDA's AI Assist tool...