Prof. Dr. Sabine Schulte im Walde

ComposiGen — Noun Compounds with Compositionality and Abstractness Ratings, plus Generated Images

We created two datasets for English noun compounds where the target compounds and their constituent nouns are annotated with both compositionality and concreteness ratings, and accompanied by automatically generated LLM-based noun definitions and images obtained from using these definitions in text-to-image diffusion models. The images are used in compositionality prediction approaches.

In Kurtyigit et al. (2025), we relied on compound-constituent compositionality ratings for 88 compounds from Reddy et al. (2011), and additionally annotated the compounds with human concreteness ratings on a scale. The resource can be downloaded from here.
In Godbersen et al. (2026), we extracted noun compounds from a web corpus, and selected a total of 200 compounds based on the combination of concreteness ratings of their two constituents, relying on Brysbaert et al. (2014). The compounds were controlled by focusing on targets that share head constituents (e.g., cup cake, potato cake, wedding cake). Furthermore, we collected human-elicited compound-constituent compositionality ratings. The resource can be downloaded from here.

References:

Jule Godbersen, Sinan Cem Kurtyigit, Emma Raimundo Schulz, Tonmoy Rakshit, Diego Frassinelli, Sabine Schulte im Walde, Carina Silberer (2026)
Fruitcakes and Cupcakes Emerging from Noise: The ComposiGen Dataset of Compounds and their Compositionality
In: Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC). Palma de Mallorca, Spain.

Sinan Kurtyigit, Diego Frassinelli, Carina Silberer, Sabine Schulte im Walde (2025)
A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds
In: Findings of the Association for Computational Linguistics: ACL. Vienna, Austria.

Marc Brysbaert, Amy Beth Warriner, Victor Kuperman (2014)
Concreteness Ratings for 40 Thousand generally known English Word Lemmas
Behavior Research Methods 64:904-911.

Siva Reddy, Diana McCarthy, Suresh Manandhar (2011)
An Empirical Study on Compositionality in Compound Nouns
In: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP). Chiang Mai, Thailand.

Resources: ComposiGen

ComposiGen — Noun Compounds with Compositionality and Abstractness Ratings, plus Generated Images