BilderNetle - A Dataset of German Noun-to-ImageNet Mappings
BilderNetle ("little ImageNet" in Swabian German) is a dataset of German noun-to-ImageNet synset mappings. ImageNet is a large-scale and widely used image database, built on top of WordNet, which maps words into groups of images, called synsets (Deng et al., 2009). Multiple synsets exist for each meaning of a word. For example, ImageNet contains two different synsets for the word mouse: one contains images of the animal, while the other contains images of the computer peripheral. This BilderNetle dataset provides mappings from German noun types to images of the nouns via ImageNet.Starting with a set of noun compounds and their nominal
constituents (von der Heide and Borgwaldt, 2009), five native German
speakers and one native English speaker (including the authors of this
paper) worked together to map German nouns to ImageNet synsets. With
the assistance of a German-English dictionary, the participants
annotated each word with all its possible meanings. After discussing
the annotations with the German speakers, the English speaker manually
mapped the word meanings to synset senses in ImageNet. Finally, the
German speakers reviewed samples of the images for each word to ensure
the pictures accurately reflect the original noun in question. Not all
words or meanings were mapped to ImageNet, as there are a number of
words without entries in ImageNet, but the resulting data set contains
a considerable amount of polysemy. The final dataset contains 2,022
word-synset mappings for 309 words. After extracting sections of
images using bounding boxes when available by ImageNet (and using the
entire image when bounding boxes are unavailable), the dataset
contains 1,305,602 images.
Reference:
Stephen Roller, Sabine Schulte im Walde (2013)
A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Seattle, WA.