Corpus material and methodology
The language data analysed for and discussed in this chapter are taken from two specialized corpora, one representing RAs (Swales 2004) written by native speakers of English (amounting to about 75,000 words) and the other comprising articles written by Czech expert writers (amounting to about 58,000 words). The former comprises ten RAs selected from the journal Applied Linguistics published between the years 2000 and 2008, while the texts of the latter were all written for the linguistics journal Discourse and Interaction in the years 2008 to 2011, namely by ten Czech expert writers, some of whom are the author’s colleagues. Thus both corpora represent recent data. In addition, for purposes of comparison still another slightly more extensive specialized corpus has been used, namely a sample of about 88,000 words taken from a learner corpus representing Master’s theses written in the field of linguistics by Czech students of English at Masaryk University, Brno. Since these final written achievements of university students represent the top 20 per cent of the results achieved in the period 2005 to 2010, it is believed that they can be used for a comparison with RAs produced by expert writers.
Although relatively small in size, all the above-described corpora are considered sufficient and useful for the present study, since, in agreement with Flowerdew (2004: 18), it is assumed that despite certain limitations in terms of size, representativeness and generalizability of their results specialized corpora are more appropriate than large general corpora for a comparative study of academic written discourse, especially for an analysis of particular language features such as causal and contrastive DMs in one particular genre.
It remains to be noted that in order to get comparable data for the analysis it has been necessary to exclude from all three corpora all parts of texts which comprise tables, figures, graphs, references, sources, and quotations. All the results discussed and exemplified below have been normalized for the frequency of occurrence of the selected DMs per 1,000 words.
As regards the methods applied during the investigation, all the texts were first computer-processed using the AntConc concordancer and then examined manually in order to obtain both qualitative and quantitative results, since some of the language items under examination can perform functions other than those of DMs in written discourse. The most important findings are exemplified (and given mostly in normalized numbers) in the tables below.