The framework of empirical investigation
In an ambitious research project involving various subprojects and lasting for several years, seven teams have conducted participant observation of groups in six European countries.17 In each of these countries, one or two researchers observed most or all meetings of local groups belonging to the GJMs for a period ranging from several months to almost two years.
Local GJMs groups usually meet one to four times per month. For the most part, their meetings are filled with reports, information exchange, preparation and assessments of activities, brainstorming, chatting, joking, etc. We observed these activities and provided three different kinds of documentation: first, a general portrait of the group under investigation with information on its history, structure, cleavages, personnel, etc.; second, a record of each group session including the agenda items, the patterns of participation and decision making, and the occurrence of controversies; finally, a rather detailed protocol of the controversies.
Controversies, defined as ‘an explicit and extended verbal disagreement in the group’,18 were at the centre of our empirical interest. For each controversy, participant observers filled a codesheet during the interaction or, based on their notes, immediately after the group session. In pilot exercises, we found that the coding of every speech act during the controversies was impossible. Too much had to be registered in a short time. Almost all groups disliked tape recording. This was also a reason not to apply some sort of a ‘discourse quality index’ as proposed by Steenbergen et al. (2003). Instead, after a series of tests, we aimed at a more holistic or comprehensive coding of sequences of speech acts, relying on the intelligence and emphatic judgement of a trained coder who, for most of the time, was able to distinguish, for example, between an interaction in which the speakers treat each other as equal or unequal, between an intervention based on hard or soft power, and between a relaxed, a tense and a mixed atmosphere during a controversy. For each of these categories, we provided a number of examples. Admittedly, boundaries are sometimes difficult to draw when it comes, for example, to discerning degrees of hard power (‘rather hard power’ vs ‘hard power clearly prevailing’).19 While some particular coder decisions remained arbitrary, we still are confident that generally the coders’ decisions were both valid and reliable. However, apart from the training phase in which we reached satisfying levels of inter-coder reliability, no further reliability tests have been conducted.