How many codes are needed in a qualitative analysis?

Updated: Dec 1, 2020

There is no answer to this perennial question – not even any guidelines. You need as many codes as you need – in other words, however many are needed to capture what’s going on in the data in relation to your analytic focus and research objectives. How many depends on what you’re using the codes to represent, how you derive them, and how you intend to use them in the analysis. I’ve done substantial projects with as few as 22 codes, and others that required several hundred.

Coding in qualitative analysis

Coding is understood and applied in qualitative analysis in incredibly varied ways. Just like there are many and vastly different qualitative methodologies, there are many vastly different approaches to coding. So it’s not particularly helpful to provide ‘best-practice tips’ concerning coding. What works in one project may be entirely inappropriate in another. That’s why it’s so important when using CAQDAS packages to ensure that the analytic strategies drive the software tactics.

Terminological confusions

Part of the difficulty is the terminology. Most CAQDAS packages use the term “code” at the tactics level – to refer to a component in the software rather anything in the project’s analytic strategies (there are exceptions - NVivo uses the term “node”, and Transana uses “keyword”). But there is a critical difference in what we call “codes” at the strategies level - what you plan to do - and “codes” at the tactics level of your chosen software program - how you plan to do it. (Check out our other blog posts and Part 1 of our books for the importance of always distinguishing between analytic strategies and software tactics).

What are codes used to represent?

The intuitive way of thinking about codes is to use them to represent concepts, to label and define and link codes to data segments to gather together all instances of the concept. This is appropriate for many methodologies and a very common use of codes at the tactics level. But codes can be used for all sorts of other purposes within a CAQDAS program. For example in our books we describe several:

for counting. Codes can be created to count instances of meaning in data whenever there is a need to quantify. For example, in a video project studying nonverbal interaction, each use of a particular hand gesture could be linked to a code for that gesture in order to quantify its frequency. These codes would be used for different purposes than other codes used to conceptualise other characteristics of gestures.
for evaluation. A separate set of evaluative codes can supplement concept codes. This avoids having a single code serve two purposes and thereby reduce the power of later interrogations. If we are exploring the impact of bullying on aspects of self-esteem, we might create two kinds of codes: evaluative codes named low, medium, or high impact (to code for the degree of impact) and various self-esteem concept codes.
for housekeeping. Codes can serve administrative or organisational purposes. For example, at the end of your working day, you may create a code named “where I am” and link it to the last data segment you worked on—this allows you to jump straight there the next morning. Or a code named “good quote” might be linked to segments that are candidates for illustrating a concept in a report.

Directions of analysis

Contrasting ways of working can broadly be distinguished according to the direction of work. Most projects work predominantly top-down (deductive), bottom-up (inductive), or iterate between the two (abductive). There’s no right or wrong, what’s appropriate depends on the research objectives and analytic plan. (See Lewins & Silver 2014 for discussion of this topic).

Working inductively

For example, if working inductively, it’s common to initially generate quite a lot of codes because you create them as you see interesting concepts in the data - one reason that researchers sometimes end up with hundreds of codes and then don’t always know what to do next. This is problematic if you’re helter-skelter coding because you haven’t planned what you’re going to do next.

Working deductively

If working deductively, perhaps testing a hypothesis or using concepts derived from existing theories to frame your analysis, you’ll likely start with fewer codes. Your hypothesis or theoretical framework will tell you the key concepts that you’ll be looking for in that data, and that you need to capture through coding.

Iterating back and forth

But of course, it’s not that simple. Even though many projects work predominantly in one or other general direction, most actually go back and forth a little between the two. For example, when working inductively, aiming to generate an interpretation that is grounded in the data being analysed, it’s often the case that prior ideas frame the lens through which we interpret data. That’s what a conceptual framework is all about. Similarly, when working deductively we will usually want to incorporate flexibility to see other things in the data that the main concepts within the theoretical framework.

The number of codes isn’t static

And it’s likely that you’ll have a different number of codes at different stages of an analysis. For example, the fewer broad codes started off within a deductive project often need to be split later on to account for nuances identified within the data, and the many detailed codes generated in an inductive project often need to be combined later on. CAQDAS programs are designed to facilitate refining codes and their organisation within the software in these ways.