Ratings of (Non-)Literal Language Usage for Particle Verbs
Across projects and across languages, we collected ratings of (non-)literal language usage for particle verbs (PVs). In the following, we describe the datasets. See individual resource links for how to obtain the data.
- GERMAN-NLIT (Köper and Schulte im Walde, 2016):
Three German native speakers with a linguistic background annotated 8,128 sentences across 165 particle verbs and 10 different particles with literality ratings. The sentences were randomly extracted (50 sentences per particle verbs) from DECOW14AX, a German web corpus containing 12 billion tokens (Schäfer, 2015).
The annotation was done on a 6-point scale [0,5], ranging from clearly literal (0) to clearly non-literal (5) usage. The total agreement of the annotators on all six categories was 43%, Fleiss' kappa=0.35. Dividing the scale into two disjunctive ranges with three categories each ([0,2] and [3,5]), the total agreement of the annotators on the two categories was 79%, Fleiss' kappa=0.70. Disregarding all cases of disagreement, the final dataset comprises 6436 sentences: 4174 literal and 2262 non-literal uses across 159 particle verbs and 10 particles.
- ESTONIAN-NLIT (Aedmaa et al., 2018):
We selected 210 PVs across 34 particles: we started with a list of 1,676 PVs that occurred at least once in a 170-million token newspaper subcorpus of the Estonian Reference Corpus (ERC) and removed PVs with a frequency=<9. Then we sorted the PVs according to their frequency and selected PVs across different frequency ranges for the dataset. In addition, we included the 20 most frequent PVs.
For each of the 210 target PVs, we then automatically extracted 16 sentences from the ERC. The sentences were manually double-checked to make sure that verb and adverb formed a PV and did not appear as independent word units in a clause.
The resulting set of sentences was evaluated by three annotators with a linguistic background, on a 6-point scale [0,5], ranging from clearly literal (0) to clearly non-literal (5) usage. The agreement among 3 annotators on all 6 categories is fair (Fleiss' kappa=0.36). A binary distinction based on the average sentence scores into literal (average=<2.4) and non-literal (average>=2.5) resulted in substantial agreement (kappa=0.73). The final dataset includes 1,490 sentences: 1,102 non-literal and 388 literal usages across 184 PVs with 120 different base verbs and 32 particle types.
- GERMAN-DOMAIN-NLIT (Schulte im Walde et al., 2018):
For a dataset to assess meaning components in German particle verbs, which frequently undergo meaning shifts, we collected base verb and particle verb sentences from 15 human participants across a specified set of 14 BV source and 12 PV target domains.
Three German native speakers annotated the 2,933/4,487 BV/PV sentences with ratings on a 6-point scale [0,5], ranging from clearly literal (0) to clearly non-literal (5) language. Dividing the scale into two disjunctive ranges [0,2] and [3,5] broke down the ratings into binary decisions. The agreement of the annotators was Fleiss' kappa=0.27 (full scale) and kappa=0.47 (binary).
References:
Eleri Aedmaa, Maximilian Köper, Sabine Schulte im Walde (2018)
Combining Abstractness and Language-specific Theoretical Indicators for Detecting Non-Literal Usage of Estonian Particle Verbs
In: Proceedings of the NAACL 2018 Student Research Workshop (NAACL-SRW). New Orleans, LA.
Maximilian Köper, Sabine Schulte im Walde (2016)
Distinguishing Literal and Non-Literal Usage of German Particle Verbs
In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). San Diego, CA.
Sabine Schulte im Walde, Maximilian Köper, Sylvia Springorum (2018)
Assessing Meaning Components in German Complex Verbs: A Collection of Source-Target Domains and Directionality
In: Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM). New Orleans, LA.