Data saturation and decision making about when to stop or go further in verbal protocol analysis

Analyzing verbal protocols is an iterative process. In particular, a critical aspect of verbal reporting data analysis relates to choosing, revising, or creating a coding framework that is best suited to describing the data (Bogdan & Biklen, 2007). That said, typically, a first consideration is determining how a particular coding framework may be effective in helping to decide the units of analysis that will be used. Chi (1997) recommended the following when considering verbal reports: “(a) the grain size of the segment, (b) the correspondence of the grain size to the questions one is asking, (c) the characteristics in the data used for segmenting, and (d) when it may not be necessary to segment” (p. 284). While it is important to draw on existing literature and the data collected to inform the nature of analysis, the units of analysis may need to be adjusted once coding has begun if the unit used is not sufficient to determine whether or not a particular code describes certain strategic processes underlying the corresponding verbal reports.

Coding frameworks in qualitative data analysis are either established a priori from existing literature or theoretical frameworks, or they are informed by existing theory but emerge from the data itself, typically drawing on grounded theory analysis approaches (Corbin & Strauss, 2015; Glaser & Strauss, 1967). It is important to refine the coding framework until it sufficiently describes the data, especially when verbal reports involve “discovery”—aspects of a phenomenon that have not been previously cataloged. We take an example of this process again from our study of epistemic processing in online reading (Cho et al., 2018). Figure 23.3 shows the structure of epistemic processes involved in online reading that were operated at multiple levels, grounded in the verbal report data of the participating high school students.

In this structure, the top-level categories (i.e., epistemic judgment, epistemic monitoring, epistemic regulation) were not related only to dimensions found in previous literature (Barzilai & Zohar, 2012; Greene, Yu, & Copeland, 2014; Hofer, 2004) but were also supported by our data. The next level of category (e.g., acritical, surface-level, and critical processing within epistemic judgement) describe the qualities of strategy use in relation to judgments of internet sources. The third level categories (e.g., noticing authority and probing authority within surface-level processing) identify specific actions or dispositions and their qualities found in the data. This multilayered structure of coding depicts how the initial framework was further refined, interrelated, and reorganized to best describe the data collected in our study.

This process can be described as data saturation, which Corbin and Strauss (2015) outline as having the following characteristics: “no new or relevant data seem to emerge regarding a category, the category is well developed in terms of its properties and dimensions demonstrating variation, and the relationships among categories are well established and validated” (p. 212).

Multilayered Coding Structure of the Epistemic Processes in a Critical Internet Reading

Figure 23.3 Multilayered Coding Structure of the Epistemic Processes in a Critical Internet Reading: A Grounded Theory

In our example, the existing literature and the recursive analysis of our data worked together to inform a robust coding scheme that sufficiently captured the multilayered and multifaceted structure of the epistemic processing apparent in our data set. Therefore, as the coding scheme becomes more developed and represents more of the data, the coding process becomes less challenging as the data more readily fit the descriptive categories of the coding scheme.

While this process is important to describing the data within one study, we note that theoretical saturation may not always occur within a single study. As described in the previous section, the mental models of strategic processing in internet reading have been challenged, questioned, revised, and refined by attending to underexamined aspects of reading (i.e., epistemic processing) and the underlying factors that influenced them (e.g., uncertainties of online information space and readers’ responses to the contextual features). This window into previously unresearched phenomena is an important affordance of the verbal protocol analysis and has contributed to the subsequent development of research questions and data analysis. As such, through building on previous works, researchers may be able to seek to foreground the importance of replication and multiple contexts to understanding what a single study contributes, or not, to an overall understanding of strategic processing and the possibility of reaching data saturation within a single study, while recognizing that theoretical saturation requires subsequent research.

