German Verb Subcategorisation Database extracted from MATE Dependency Parses
Based on the SubCat-Extractor, we induced verb subcategorisation information from German MATE dependency parses. The subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style. Here are a few examples from the paper:
As a natural and immediately subsequent step, we induced a
subcategorisation frame lexicon from the verb data. Taking voice into
account, we summed over the various complement combinations a verb
lemma appeared with. For example, among the most frequent
subcategorisation frames for the verb glauben ‘believe’ are a
subcategorised clause ‘believe that’ (freq: 52,710), a subcategorised
prepositional phrase with preposition anacc ‘believe in’
(freq: 4,596) and an indirect object ‘trust s.o.’ (freq: 2,514). In
addition, we took the actual complement heads into account. For
example, among the most frequent combinations of heads that are
subjects and indirect objects of glauben are So far, we have applied the SubCat-Extractor to the German web corpus
SdeWaC
(Faaß and Eckart, 2013), which contains approx. 880 million words, and
a Wikipedia dump from April 10, 2011, containing approx. 430 million
words.
See here
on how to obtain the data.
Reference:
Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew (2013)
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK.