German Chunker

The chunker for German was developed by Helmut Schmid and Sabine Schulte im Walde. It is based on a German Head-Lexicalised Probabilistic Context-Free Grammar. The manually developed grammar was semi-automatically extended by robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by the probabilistic parser LoPar.


Helmut Schmid, Sabine Schulte im Walde
Robust German Noun Chunking with a Probabilistic Context-Free Grammar [pdf/bib]
In: Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany, August 2000.

Predicate Argument Clustering (PAC)

The clustering approach was developed within the SFB-732 project D4 (Modular Lexicalization of Probabilistic Context-Free Grammars). PAC provides a cluster analysis for verb-frame-argument tuples of varying argument numbers, and incorporates a generalisation of the arguments by WordNet-based selectional preferences.

PAC is freely available for education, research and other non-commercial purposes. You can download the software here.


Sabine Schulte im Walde, Christian Hying, Christian Scheible, Helmut Schmid
Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences [pdf/poster/bib]
In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, OH, June 2008.

Sabine Schulte im Walde, Helmut Schmid, Wiebke Wagner, Christian Hying, Christian Scheible
A Clustering Approach to Automatic Verb Classification incorporating Selectional Preferences: Model, Implementation, and User Manual [url/pdf/bib]
SinSpeC: Working Papers of the SFB 732 "Incremental Specification in Context", Volume 7, December 2010.

SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.

The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet's MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).

The SubCat-Extractor is freely available for education, research and other non-commercial purposes. See here on how to obtain the tool.


Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource [pdf/bib]
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK, July 2013.