Dataset of Sentence Generation for German Particle Verb Neologisms

For this collection, participants generated sentences with attested particle verbs (PVs) and also with yet not attested formations which we call systematic neologisms of German particle verbs (neoPVs). We consider a PV a neoPV if it was not listed in the Duden online dictionary (, and if it was not attested in the German web corpus SdeWaC (Faaß and Eckart, 2013).

The data comprise a total of 125 PVs: Five different particles (ab, an, auf, aus, nach) were combined with verbs from five different (not necessarily disjunctive) semantic verb classes: (1) DE-ADJECTIVAL, e.g. kürzen "shorten", (2) ACHIEVMENT/ACCOMPLISHMENT, e.g. finden "find", (3) PHYSICAL PROCESS, e.g. stricken "knit", (4) MENTAL PROCESS, e.g. denken "think", and (5) STATE, e.g. lieben "love". The chosen base verbs (BVs) were balanced for their corpus frequencies in the SdeWaC.

The participants in the study were presented a PV and two tasks: first, they were asked to provide a rating of 0-3 whether the PV was known or unknown. Then they were asked to generate at least one sentence using the PV, such that the sentences illustrated the verb meaning. After the generation, the subjects had the opportunity to mark a checkbox, if they felt it was difficult to generate a sentence for the particular PV.

See here on how to obtain the data.


Sylvia Springorum, Sabine Schulte im Walde, Antje Roßdeutscher (2013)
Sentence Generation and Compositionality of Systematic Neologisms of German Particle Verbs
In: Proceedings of the 5th Conference on Quantitative Investigations in Theoretical Linguistics (QITL). Leuven, Belgium.