SSiW




German Noun Chunker

The noun chunker for German was developed by Helmut Schmid and Sabine Schulte im Walde. It is based on a German Head-Lexicalised Probabilistic Context-Free Grammar. The manually developed grammar was semi-automatically extended by robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by the probabilistic parser LoPar.

Reference:

Helmut Schmid, Sabine Schulte im Walde (2000)
Robust German Noun Chunking with a Probabilistic Context-Free Grammar
In: Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany.


Predicate Argument Clustering (PAC)

The clustering approach was developed within the SFB-732 project D4 (Modular Lexicalization of Probabilistic Context-Free Grammars). PAC provides a cluster analysis for verb-frame-argument tuples of varying argument numbers, and incorporates a generalisation of the arguments by WordNet-based selectional preferences.

You can download the software here.

References:

Sabine Schulte im Walde, Christian Hying, Christian Scheible, Helmut Schmid (2008)
Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences
In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, OH.

Sabine Schulte im Walde, Helmut Schmid, Wiebke Wagner, Christian Hying, Christian Scheible (2010)
A Clustering Approach to Automatic Verb Classification incorporating Selectional Preferences: Model, Implementation, and User Manual
SinSpeC Working Papers of the SFB 732 "Incremental Specification in Context", Volume 7.


SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.

The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet's MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).

See here on how to obtain the tool.

Reference:

Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew (2013)
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK.