German Verb Subcategorisation Database extracted from MATE Dependency Parses
Based on the SubCat-Extractor, we induced verb subcategorisation information from German MATE dependency parses. The subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.
So far, we have applied the SubCat-Extractor to the German web corpus SdeWaC (Faaß and Eckart, 2013), which contains approx. 880 million words, and a Wikipedia dump from April 10, 2011, containing approx. 430 million words.
The data are freely available for education, research and other
purposes. See here
on how to obtain the data.
Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource [pdf/bib]
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK, July 2013.