Ratings of (Non-)Literality for German Particle Verbs

Three German native speakers with a linguistic background annotated 8,128 sentences across 165 particle verbs and 10 different particles with literality ratings. The sentences were randomly extracted (50 sentences per particle verbs) from DECOW14AX, a German web corpus containing 12 billion tokens (Schäfer and Bildhauer, 2012; Schäfer, 2015).

The annotation was done on a 6-point scale [0,5], ranging from clearly literal (0) to clearly non-literal (5) usage. The total agreement of the annotators on all six categories was 43%, Fleiss' kappa=0.35. Dividing the scale into two disjunctive ranges with three categories each ([0,2] and [3,5]), the total agreement of the annotators on the two categories was 79%, Fleiss' kappa=0.70.

The data are freely available for education, research and other non-commercial purposes. See here on how to obtain the data.


Maximilian Köper, Sabine Schulte im Walde
Distinguishing Literal and Non-Literal Usage of German Particle Verbs
In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). San Diego, CA, June 2016.