GhoSt-NN - A Representative Gold Standard of German Noun-Noun Compounds
Ghost-NN is a gold standard of German noun-noun compounds including 868 compounds annotated with corpus frequencies of the compounds and their constituents, productivity and ambiguity of the constituents, semantic relations between the constituents, and compositionality ratings of compound-constituent pairs. Moreover, a subset of the compounds containing 180 compounds is balanced for the productivity of the modifiers (distinguishing low/mid/high productivity) and the ambiguity of the heads (distinguishing between heads with 1, 2 and >2 senses).
The resource comprises three parts:
- a set of 154,960 noun-noun candidate compounds and their constituents, accompanied by corpus frequency, productivity and degree of ambiguity;
- the final gold standard Ghost-NN of 868 noun-noun compounds and their constituents, accompanied by corpus frequency, productivity, ambiguity, and annotated with semantic relations and compositionality ratings;
- the carefully balanced Ghost-NN subsets of 20x9 and 5x9 compounds and their constituents, categorised according to 9 criteria combinations for modifier productivity and head ambiguity.
The data are freely available for education, research and other
purposes. See here
on how to obtain the data.
Sabine Schulte im Walde, Anna Hätty, Stefan Bott, Nana Khvtisavrishvili
Ghost-NN: A Representative Gold Standard of German Noun-Noun Compounds
In: Proceedings of the 10th Conference on Language Resources and Evaluation (LREC). Portoroz, Slovenia, May 2016.