German Noun Chunker

The noun chunker for German was developed by Helmut Schmid and Sabine Schulte im Walde. It is based on a German Head-Lexicalised Probabilistic Context-Free Grammar. The manually developed grammar was semi-automatically extended by robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by the probabilistic parser LoPar.


Helmut Schmid, Sabine Schulte im Walde (2000)
Robust German Noun Chunking with a Probabilistic Context-Free Grammar
In: Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany.

Predicate Argument Clustering (PAC)

The clustering approach was developed within the SFB-732 project D4 (Modular Lexicalization of Probabilistic Context-Free Grammars). PAC provides a cluster analysis for verb-frame-argument tuples of varying argument numbers, and incorporates a generalisation of the arguments by WordNet-based selectional preferences.

You can download the software here.


Sabine Schulte im Walde, Christian Hying, Christian Scheible, Helmut Schmid (2008)
Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences
In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, OH.

Sabine Schulte im Walde, Helmut Schmid, Wiebke Wagner, Christian Hying, Christian Scheible (2010)
A Clustering Approach to Automatic Verb Classification incorporating Selectional Preferences: Model, Implementation, and User Manual
SinSpeC Working Papers of the SFB 732 "Incremental Specification in Context", Volume 7.

SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.

The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet's MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).

See here on how to obtain the tool.


Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew (2013)
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK.