What is AI Coding?

Use AI Coding to automatically generate inductive codes for your text data.

What is AI Coding?

ATLAS.ti is proud to offer AI Coding, the first of many truly automated features, powered by the world-leading GPT models from OpenAI. 

AI Coding supports you, the researcher, by "reading" your documents and conducting fully automated descriptive coding. Of course, you can review and refine the codes that are generated to suit your research, but it can save valuable time by processing your data and doing the menial task of initial coding for you. The time you save can be used to refine and merge codes, conduct analysis, and produce visualizations when reporting your research.
Please note that AI Coding is a beta feature. You are welcome to use it freely, and we welcome your feedback on all aspects of it.

How does AI Coding work?

1. Select your text document(s)

To analyze multiple text documents, select your documents from the Document Manager. Then click on three dots at the bottom of the page, select "Tools" and then select "AI Coding". To analyze a single document, you can also open any document and click on Tools and then click on "AI Coding".

2. Start AI Coding 

After clicking "Start Coding," ATLAS.ti will ask if you want to proceed and show you a rough estimate of the time it will take to process your data.

AI Coding will upload your document content to ATLAS.ti and OpenAI servers. We will never upload your data without your explicit consent. If you wish to proceed, toggle the checkbox where you acknowledge that you agree to our EULA and Privacy Policy. OpenAI will NOT use ATLAS.ti user data to train OpenAI’s models.

3. Review the results

After AI Coding has finished, ATLAS.ti will present you an overview of the results, then you can click "Create quotations".

To easily review your results, we recommend clicking on "Save as view" to save a view of the AI Coding results. This view will include the date, time, and all of the created codes and their associated quotations. You can always access this view from your View page, and this view also makes it easy for you to review the codings and make any edits as necessary (e.g., add/remove codes, rename codes, merge codes in the Code Manager, etc.).

  • It’s best to submit documents that belong together thematically in the same round of AI Coding. This will improve coding quality and enable ATLAS.ti to ignore repeating patterns across documents, such as interview questions.
  • AI Coding works per paragraph. For best results, interview questions or participant names in transcripts should be on their own paragraph, and the paragraph structure of a document should be well-defined. When editing your document, use the numbers on the left margin to determine separation between paragraphs.
  • PDF documents do not contain paragraphs, even if they look like they do, so results from PDF documents will likely be poor.
  • AI Coding skips very short paragraphs.
  • AI Coding only looks at the plain text of your documents. Existing codes, quotations, and formatting have no bearing on its results.

AI Coding In Depth

AI Coding works with the GPT family of large language models. These models are based on vast amounts of different texts and additional training by human researchers, enabling them to be used in a general-purpose way.
We at ATLAS.ti offer easy access to these capable models with our AI Coding feature without the need to understand and navigate the technical details and limitations. ATLAS.ti automatically splits your text into chunks that are handled the AI and passes them to the GPT models for repeated analysis. The results of the analysis are algorithmically combined to offer the best mix of codes covering different topics without producing an overwhelming amount of codes.

Here is an in-depth view on these steps:

Large language models are sensitive to the surrounding text as they have a window of attention, also called the context. Choosing the right context is key in working with a large language model. A short context can mean that not a lot information can be processed, because some words or even sentences only make sense in a larger context. Choosing a context that is too large overflows the model with information and makes it harder for code-extraction algorithms to find meaningful codes. ATLAS.ti choses contexts that are no less than 100 characters long, but breaks at natural paragraphs boundaries. We found that these confines have a positive impact on coding quality while leaving out most headings and titles which are of less importance to qualitative researchers. As a further step to reduce noise in the analysed data, ATLAS.ti removes paragraphs from analysis that have more than one occurrence, as these are likely recurring phrases from interviews or speaker names.
In the first step of analysis, the AI models are tasked with finding meaningful codings for each segment of data, acting as if the model was a qualitative research assistant. As the models from the GPT family are quite creative, AI Coding may produce similar codings, while others touch on the same topic but are expressed in dissimilar ways. This procedure can produce hundreds of different codes that are not manageable in a time-saving manner.
To address this, we use a characteristic of language models called embeddings, where each word or phrase can be associated with a point in a multidimensional space. One of the useful properties of this feature is that words and phrases that are more similar to one another are located nearer to each other than words or phrases that are less similar. Leveraging this property, ATLAS.ti clusters the codes by combining the ones nearest to each other into a collection. This clustering is repeated until a satisfactory number of collections is reached.
The combined codes are associated with the generated segments and proposed in the ATLAS.ti interface for viewing quotations.
Please keep in mind that, while it is generally beneficial that the GPT models were trained on vast amounts of text, this may in some situations result in incorrect output that does not accurately reflect real people, places, or facts. In some cases, GPT models may encode social biases such as stereotypes or negative sentiment towards certain groups. You should evaluate the accuracy of any output as appropriate for your use case.