Discussion, conclusions and outlook
Reverse associations by a human
Associating with several given words, as required in the reverse association task, is not easy. This has been remarked by several test persons and is also confirmed by a considerable number of omissions in the test sheets. For several reasons, it is difficult to come up with an expected word:
- 1) In many cases, the given words might almost quite as strongly point to other target words. For example, when given the words gin, drink, scotch, bottle and soda, instead of the target word whisky, the alternative spelling whiskey should also be fine, and possibly some other alcoholic beverages, such as rum or vodka, might also be acceptable.
- 2) The target vocabulary was not restricted in any way, so in principle hundred thousands of words had to be considered as candidates.
- 3) Although most of the target words were base forms, the dataset also contains a good number of cases where the target words were inflected forms. Of course, it is hard to get these inflected forms exactly right.
Owing to these difficulties, we expected low-performance figures, and this expectation was confirmed.
As mentioned in section 4.4.2, we had not disclosed the nature of the dataset to the test subjects, i.e. did not tell them that the underlying idea is the reverse association task. Alternatively, we could have informed people about this and asked them to come up with a word with which they would associate each of the five given words. But this would probably have been conceived as an even more sophisticated task, potentially blocking spontaneity. We also tend to think that, although associations can occasionally be asymmetric, assymetry is likely not to have a decisive effect in our scenario. However, this is certainly a question which requires further investigation
While in previous studies involving multi-stimulus associations, human performance had always been much better than what simulation programs produced (see e.g. [RAP 08]), this is not the case here. In the CogALex shared task, a number of teams had tested their algorithms, which were mostly based on the analysis of word co-occurrences in large text corpora, on the very same dataset, and achieved performances of up to 30.45%. In the current paper, on this dataset, we presented a very similar result. These results are much better than the human performance of our non-native speakers shown in Table 4.6. Although native speakers can be expected to do better, we think that it will be challenging to outperform the automatic results. This might well mean that in one of its core disciplines, namely association, human intelligence is not better than a machine, although of course further investigation is required to confirm this finding.
-  As our data source (the EAT) did not provide any, it was not practical for us to try to comeup with alternative solutions in the chosen reverse association framework. However, we thinkthat doing so would mainly affect absolute but not relative performance (i.e. the rankingbetween different systems should remain similar).
-  For example, flower pot ^ soil are associated strongly, but soil ^ flower pot not to thesame extent.