SSiW


BilderNetle - A Dataset of German Noun-to-ImageNet Mappings

BilderNetle ("little ImageNet" in Swabian German) is a dataset of German noun-to-ImageNet synset mappings. ImageNet is a large-scale and widely used image database, built on top of WordNet, which maps words into groups of images, called synsets (Deng et al., 2009). Multiple synsets exist for each meaning of a word. For example, ImageNet contains two different synsets for the word mouse: one contains images of the animal, while the other contains images of the computer peripheral. This BilderNetle dataset provides mappings from German noun types to images of the nouns via ImageNet.

Starting with a set of noun compounds and their nominal constituents (von der Heide and Borgwaldt, 2009), five native German speakers and one native English speaker (including the authors of this paper) worked together to map German nouns to ImageNet synsets. With the assistance of a German-English dictionary, the participants annotated each word with all its possible meanings. After discussing the annotations with the German speakers, the English speaker manually mapped the word meanings to synset senses in ImageNet. Finally, the German speakers reviewed samples of the images for each word to ensure the pictures accurately reflect the original noun in question. Not all words or meanings were mapped to ImageNet, as there are a number of words without entries in ImageNet, but the resulting data set contains a considerable amount of polysemy. The final dataset contains 2,022 word-synset mappings for 309 words. After extracting sections of images using bounding boxes when available by ImageNet (and using the entire image when bounding boxes are unavailable), the dataset contains 1,305,602 images.


Reference:

Stephen Roller, Sabine Schulte im Walde (2013)
A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Seattle, WA.