On this page you find the list of tools and datasets related to sentiment analysis to which I contributed.
First shared task on abusive language detection for German. The shared task comes with a large dataset (5000 training and 3500 test instances) comprising tweets from Twitter.
Lexicon of Abusive Words
A large lexicon of English abusive words bootstrapped from a small set of manually labeled negative expressions (annotated as either abusive or not-abusive). Currently classifiers based on such lexicons produce by far best results in cross-domain classification.
German Verbal Polarity Shifers
A large list of German verbal polarity shifters. This resource has been bootstrapped from a small base lexicon of German verbal shifters a
a large set of English verbal shifters using a crosslingual approach.
Sense-level Lexicon of Verbal Shifters
A complete sense-level lexicon of English verbal polarity shifters and their shifting scope. Our lexicon covers all verbs of WordNet v3.1 that are single word or particle verbs. Polarity shifter and scope labels are given for each lemma-synset pair (i.e. each word sense of a lemma).
Annotated Corpus for Disambiguating Verbal Shifters
A gold standard of 2000 labeled sentences where each sentence contains a mention of an ambiguous shifter (e.g. spoil).
The sentence label indicates whether the usage of the ambiguous conveys shifting (as in spoil the chances of success) or not (as in spoil a child) in that particular sentence.
A Large Word List of English Verbal Shifters
A list of about 1000 verbal shifters. Shifters, such as abandon, are similar to negations
(e.g. not) in that they move the polarity of a phrase towards its inverse, as in abandon all hope.
This resource has been bootstrapped from a small base lexicon in which a random sample of 2000 verbs from WordNet have been manually annotated.
Negation Modeling for German Sentiment Analysis
A data set focusing on the scope of German negation and a rule-based tool that automatically detects the scope of a wide range of different negation words. The tool also supports sentence-level polarity classification. Negation modeling is incorporated in that classifier.
Morphologically Complex Words
A data set comprising about 9000 complex polar expressions (e.g. compounds) along their polarity label. This resource also includes very rare complex expressions (taken from Wortwarte.de) along their polarity label and morphological analysis.
German Opinion Role Extractor
This software is designed for the extraction of subjective expressions, sentiment sources and sentiment targets from German text. It has been developed according to the specification of the STEPS Shared Task (see below). The tool comes with pre-processing scripts (i.e. part-of-speech tagging, named entity recognition and syntactic parsing).
STEPS Shared Task 2016
2nd iteration of the Shared Task on Source, Subjective Expression and Target Extraction from Political Speeches. Annotation guidelines were heavily refined. New annotated data were produced.
Opinion Compound Dataset
Resource comprising German compounds (e.g. Expertenmeinung
) that have been annotated with regard to opinion roles. Release comprises two datasets: one dataset comprising 2000 opinion compounds in which the modifier is annotated as either conveying some opinion role or none; 1000 opinion compounds in which the modifier is annotated as either conveying an opinion holder or an opinion target.LINK_to_resourceLINK_to_publication
Verb View Lexicon
Resource that classifies all opinion verbs from the English Subjectivity Lexicon and from the German Zurich Sentiment Lexicon according to their sentiment views. Each verb is categorized in one of three view categories. Categories are inspired by the different argument positions an opinion holder can assume. The categories are: agent view, where the opinion holder is realized as the agent of the opinion verb (e.g. love, hate, think), patient view, where the opinion holder is realized as the patient of the opinion verb (e.g. please, disappoint, surprise), and speaker view, where the opinion holder is the implicit speaker of the utterance (e.g. succeed, cheat, lie).LINK_to_resourceLINK_to_publication
MLSA: A Multi-Layered Reference Corpus for German Sentiment Analysis
This corpus consists of 270 sentences manually annotated for objectivity and subjectivity (Layer 1), word and phrase polarity (Layer 2) and expressions of private states (Level 3).LINK_to_resource