Databases of Paradigmatic Semantic Relation Pairs for German and English Adjectives, Nouns, Verbs
The databases are collections of semantically related word pairs in German and English which were compiled via human judgement experiments hosted on Amazon Mechanical Turk. They address the three paradigmatic relations antonymy, hypernymy and synonymy.
The German database was collected by Silke Scheible and Sabine Schulte im Walde. It consists of three parts:
- A representative selection of target lexical units drawn from GermaNet, using a principled sampling technique and taking into account the three major word classes adjectives, nouns, and verbs, which are balanced according to semantic category, polysemy, and type frequency.
- A set of 8,910 human-generated semantically related word pairs, based on the target lexical units.
- A subset of 1,684 semantically related word pairs, rated for the strengths of relations.
The English database was collected by Giulia Benotto and Alessandro Lenci (generation experiment), and Gabriella Lapesa (rating experiment).
The datasets
- focus on multiple paradigmatic relations;
- systematically work across word classes;
- explicitly balance the targets according to semantic category, polysemy and type frequency;
- explicitly provide positive and negative rating evidence.
See here
on how to obtain the data.
References:
Gabriella Lapesa, Sabine Schulte im Walde, Stefan Evert (2014)
Judging Paradigmatic Relations: A New Collection of English Ratings
In: Proceedings of the 20th Architectures and Mechanisms for Natural Language Processing Conference (AMLaP).
Edinburgh, Scotland, UK.
Silke Scheible, Sabine Schulte im Walde (2014)
A Database of Paradigmatic Semantic Relation Pairs for German Nouns, Verbs, and Adjectives
In: Proceedings of the COLING Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP). Dublin, Ireland.
Sabine Schulte im Walde
Distinguishing between Paradigmatic Semantic Relations across Word Classes: Human Ratings and Distributional Similarity
Journal of Language Modelling 8(1):53-101.