SSiW




German Chunker

The chunker for German was developed by Helmut Schmid and Sabine Schulte im Walde. It is based on a German Head-Lexicalised Probabilistic Context-Free Grammar. The manually developed grammar was semi-automatically extended by robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by the probabilistic parser LoPar.

Reference:

Helmut Schmid, Sabine Schulte im Walde
Robust German Noun Chunking with a Probabilistic Context-Free Grammar [pdf/bib]
In: Proceedings of the 18th International Conference on Computational Linguistics. Saarbrücken, Germany, August 2000.


Predicate Argument Clustering (PAC)

The clustering approach was developed within the SFB-732 project D4 (Modular Lexicalization of Probabilistic Context-Free Grammars). PAC provides a cluster analysis for verb-frame-argument tuples of varying argument numbers, and incorporates a generalisation of the arguments by WordNet-based selectional preferences.

PAC is freely available for education, research and other non-commercial purposes. You can download the software here.

References:

Sabine Schulte im Walde, Christian Hying, Christian Scheible, Helmut Schmid
Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences [pdf/poster/bib]
In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, OH, June 2008.

Sabine Schulte im Walde, Helmut Schmid, Wiebke Wagner, Christian Hying, Christian Scheible
A Clustering Approach to Automatic Verb Classification incorporating Selectional Preferences: Model, Implementation, and User Manual [url/pdf/bib]
SinSpeC: Working Papers of the SFB 732 "Incremental Specification in Context", Volume 7, December 2010.


SubCat-Extractor - Induction of Verb Subcategorisation from Dependency Parses

The SubCat-Extractor is a tool to obtain verb subcategorisation data from parsed German corpora. It is based on a set of detailed rules that go beyond what is directly accessible in the parses. The extracted subcategorisation database is represented in a compact but linguistically detailed and flexible format, comprising various aspects of verb information, complement information and sentence information, within a one-line-per-clause style.

The input format required by the SubCat-Extractor is parsed text produced by Bernd Bohnet's MATE dependency parser (Bohnet, 2010). The parses are defined according to the tab-separated CoNNL format. The extraction rules are specified for part-of-speech tags from the STTS tagset (Schiller et al., 1999) and syntactic functions from TIGER (Brants et al., 2004).

The SubCat-Extractor is freely available for education, research and other non-commercial purposes. See here on how to obtain the tool.

Reference:

Silke Scheible, Sabine Schulte im Walde, Marion Weller, Max Kisselew
A Compact but Linguistically Detailed Database for German Verb Subcategorisation relying on Dependency Parses from a Web Corpus: Tool, Guidelines and Resource [pdf/bib]
In: Proceedings of the 8th Web as Corpus Workshop. Lancaster, UK, July 2013.