SSiW


ComposiGen — Noun Compounds with Compositionality and Abstractness Ratings, plus Generated Images

We created two datasets for English noun compounds where the target compounds and their constituent nouns are annotated with both compositionality and concreteness ratings, and accompanied by automatically generated LLM-based noun definitions and images obtained from using these definitions in text-to-image diffusion models. The images are used in compositionality prediction approaches.


References:

Jule Godbersen, Sinan Cem Kurtyigit, Emma Raimundo Schulz, Tonmoy Rakshit, Diego Frassinelli, Sabine Schulte im Walde, Carina Silberer (2026)
Fruitcakes and Cupcakes Emerging from Noise: The ComposiGen Dataset of Compounds and their Compositionality
In: Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC). Palma de Mallorca, Spain.

Sinan Kurtyigit, Diego Frassinelli, Carina Silberer, Sabine Schulte im Walde (2025)
A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds
In: Findings of the Association for Computational Linguistics: ACL. Vienna, Austria.



Marc Brysbaert, Amy Beth Warriner, Victor Kuperman (2014)
Concreteness Ratings for 40 Thousand generally known English Word Lemmas
Behavior Research Methods 64:904-911.

Siva Reddy, Diana McCarthy, Suresh Manandhar (2011)
An Empirical Study on Compositionality in Compound Nouns
In: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP). Chiang Mai, Thailand.