Desktop version


Human performance

Now that we have investigated machine performance on the reverse association task, the next question is how humans perform on this task, and how the results of a human and a machine compare. For this purpose, we conducted an experiment with the aim of collecting human reverse associations, and later on compared its results with those obtained in a simulation.


As our dataset we used the test set which - as a follow-up activity of [RAP 13] - we had prepared for the CogALex-IV Shared Task on the Lexical Access Problem [RAP 14]. The aim of this shared task had been to compare different automatic methods for computing reverse associations11. In contrast, here we investigate human performance on this task. Therefore, to compare human with machine associations, it is good that both studies use a dataset based on the same source, i.e. on the EAT[1] [2].

The dataset has been produced by conducting the following steps:

  • 1) take the EAT as the basis;
  • 2) modify capitalization;
  • 3) extract 2,000-item subset (mainly at random);
  • 4) retain only the top five associations for each stimulus word;
  • 5) remove stimulus words.

Let us now look a bit closer at this procedure. The EAT lists for each of the 8,400 stimulus words the associative responses as obtained from about 100 test persons[3] who were asked to produce the word coming spontaneously to their mind.

As, given its origin in the 1970s, the EAT uses uppercase characters only, we decided to modify its capitalization. For this purpose, for each word occurring in the EAT, we looked up which form of capitalization showed the highest occurrence frequency in the British National Corpus [BUR 98]. By this form, we replaced the respective word, e.g. DOOR was replaced by door, and GOD was replaced by God. This way we hoped to come close to what might have been produced during compilation of the EAT if case distinctions had been taken into account. Since this method is not perfect, e.g. words often occurring in the initial position of a sentence might be falsely capitalized, we did some manual checking, but cannot claim to have achieved perfection.

For each stimulus word, only the top five associations (i.e. the associations produced by the largest numbers of test persons) were retained, and all other associations were discarded. The decision to keep only a small number of associations was motivated by the results shown in Figure 4.3, which indicate that associations produced by only very few test persons tend to be of an arbitrary nature. We also wanted to avoid unnecessary complications, which is why we decided on a fixed number, although the choice of exactly five associations is somewhat arbitrary.

Given words

Target words

able incapable brown clever good


able knowledge skill clever can


about near nearly almost roughly


above earth clouds God skies


above meditation crosses passes rises


abuse wrong bad destroy use


accusative calling case Latin nominative


ache courage blood stomach intestine


ache nail dentist pick paste


aches hurt agony stomach period


action arc knee reaction jerk


actor theatre door coach act


Table 4.5. Top 12 items of the dataset. The target words were of course undisclosed to the test takers

From the remaining dataset, we removed all items that contained nonalphabetical characters. We also removed items that contained words that did not occur in the BNC. The reason for this is that quite a few of them are misspellings. By these measures, the number of EAT items was reduced from initially 8,400 to 7,416. From these, we randomly selected 2,000 items that were used as our dataset. Table 4.5 shows the (alphabetically) first 12 items in this dataset[4].

  • [1] The best performing systems in this shared tasks all used methodologies similar to thesystem described in this paper (i.e. based on word co-occurrences in large corpora) and theirresults were of similar quality. As they have been discussed in detail in [RAP 13], we do notrepeat this discussion here.
  • [2] Note that, in addition to the dataset described here, the shared task at CogALex-IV used anadditional so-called “training set”, which was intended for the development and optimizationof the automatic algorithms. This training set was produced in exactly the same way, justusing a different selection of 2,000 items.
  • [3] To distribute work, different groups of test persons operated on the stimulus words.
  • [4] The full dataset can be downloaded from
< Prev   CONTENTS   Source   Next >

Related topics