Home Language & Literature COGNITIVE APPROACH TO NATURAL LANGUAGE PROCESSING
Now that we have investigated machine performance on the reverse association task, the next question is how humans perform on this task, and how the results of a human and a machine compare. For this purpose, we conducted an experiment with the aim of collecting human reverse associations, and later on compared its results with those obtained in a simulation.
As our dataset we used the test set which - as a follow-up activity of [RAP 13] - we had prepared for the CogALex-IV Shared Task on the Lexical Access Problem [RAP 14]. The aim of this shared task had been to compare different automatic methods for computing reverse associations11. In contrast, here we investigate human performance on this task. Therefore, to compare human with machine associations, it is good that both studies use a dataset based on the same source, i.e. on the EAT .
The dataset has been produced by conducting the following steps:
Let us now look a bit closer at this procedure. The EAT lists for each of the 8,400 stimulus words the associative responses as obtained from about 100 test persons who were asked to produce the word coming spontaneously to their mind.
As, given its origin in the 1970s, the EAT uses uppercase characters only, we decided to modify its capitalization. For this purpose, for each word occurring in the EAT, we looked up which form of capitalization showed the highest occurrence frequency in the British National Corpus [BUR 98]. By this form, we replaced the respective word, e.g. DOOR was replaced by door, and GOD was replaced by God. This way we hoped to come close to what might have been produced during compilation of the EAT if case distinctions had been taken into account. Since this method is not perfect, e.g. words often occurring in the initial position of a sentence might be falsely capitalized, we did some manual checking, but cannot claim to have achieved perfection.
For each stimulus word, only the top five associations (i.e. the associations produced by the largest numbers of test persons) were retained, and all other associations were discarded. The decision to keep only a small number of associations was motivated by the results shown in Figure 4.3, which indicate that associations produced by only very few test persons tend to be of an arbitrary nature. We also wanted to avoid unnecessary complications, which is why we decided on a fixed number, although the choice of exactly five associations is somewhat arbitrary.
Table 4.5. Top 12 items of the dataset. The target words were of course undisclosed to the test takers
From the remaining dataset, we removed all items that contained nonalphabetical characters. We also removed items that contained words that did not occur in the BNC. The reason for this is that quite a few of them are misspellings. By these measures, the number of EAT items was reduced from initially 8,400 to 7,416. From these, we randomly selected 2,000 items that were used as our dataset. Table 4.5 shows the (alphabetically) first 12 items in this dataset.
|< Prev||CONTENTS||Next >|