Resources / processors

 
 

Resources

  • ASSERT
    ASSERT is an automatic statistical semantic role tagger, that can annotate naturally occuring text with semantic arguments. When presented with a sentence, it performs a full syntactic analysis of the sentence, automatically identifies all the verb predicates in that sentence, extracts features for all constituents in the parse tree relative to the predicate, and identifies and tags the constituents with the appropriate semantic arguments.
    »
    « 0.14beta: index
     
  • Alchemy
    Alchemy is a software package providing a series of algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation.
  • BART
    BART, the Beautiful/Baltimore Anaphora Resolution Toolkit, is a tool to perform fully automatic machine-learning based automatic coreference annotation on written text.
  • Berkeley Aligner
    The BerkeleyAligner is a software package that combines the innovations of recent work in unsupervised word alignment at Berkeley. This package is meant both as an alternative to the ubiquitous GIZA++ and as a test bed for new alignment ideas.
    »
    « 2.0: index | 1.0: index
     
  • Berkeley Parser
    A version of the Berkeley Parser trained on TueBa-D/Z. Kept separate since the compiled grammar is not compatible with other ones, e.g. the ones from the original distribution under Google Code.
    »
    « 1.1: index
     
  • Bohnet
    dependency parsing, Bernd Bohnet. 2010. Top Accuracy and Fast Dependency Parsing is not a Contradiction. The 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China.
  • Brill Tagger
    This the original implementation of the Brill Tagger.
  • Buckwalter Arabic Morphological Analyzer
    The Buckwalter Arabic Morphological Analyzer is used for POS-tagging Arabic text. The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82,158 entries representing 38,600 lemmas).
  • CDG
    Constraint dependency grammar parsing system
  • Collins Parser
    The Collins Parser is statistical parser for English.
  • CorScorer
    Perl package for scoring coreference resolution systems using different metrics. This scorer was used in the 2011 CoNLL Shared Task.
  • Dan Bikels Parser
    The software is an extensible, parallel parsing engine that accommodates many different types of generative, statistical parsing models (including an emulation of Mike Collins's parsing model with equally good performance), and can easily be extended to new domains and new languages.
  • ECLiPSe
    The ECLiPSe Constraint Programming System
  • Extracting syntactically constrained paraphrases
    Paraphrase extraction software and data used in Chris Callison-Burch's EMNLP-08 paper "Syntactic Constraints on Paraphrases Extracted from Parallel Corpora."
  • GADeL
    GADeL is a Genetic Algorithm for Default Logic implemented in Sicstus Prolog Objects.
  • GIZA++
    GIZA++ is an extension of the program GIZA. It is a program for learning statistical translation models from bitext.
    »
    « 1.02: index | 1.0 (part of EGYPT): index
     
  • GWSD
    GWSD is a system for Unsupervised Graph-based All-Words Word Sense Disambiguation. Please refer to (Sinha and Mihalcea, 2007) for a description of the graph-based disambiguation method, as well as for brief descriptions of all the similarity measures and the graph-centrality algorithms used by GWSD.
  • German Topological Parser
    This package contains parsing models trained on the TueBaD/Z corpus (specifically the version that was released for the ACL 2008 Parsing German workshop) for use with the Berkeley parser.
  • HILDA
    HILDA (HIgh-Level Discourse Analyzer) is a discourse parser, it analyzes a text and uncovers the underlying functional relations between its different parts. The text is annotated under a theory of text organization called Rhetorical Structure Theory.
  • JNET
    The JULIE Lab Named Entity Tagger (JNET) is a generic and configurable multi-class named entity recognizer. JNET's comprehensive feature set allows to employ JNET for most domains and entity classes.
  • JSBD
    JULIE Sentence Boundary Detector (JSBD) is a ML-based sentence splitter. It can be retrained on supported training material and is thus neither language nor domain dependent.
  • JULIE Token Boundary Detector (JTBD)
    The JULIE Lab Sentence Boundary Detector (JSBD) and the JULIE Lab Token Boundary Detector (JTBD) are machine learning-based tools, developed and optimized for handling life science documents containing many tricky cases which many other, especially rule-based tools, don't handle appropriately.
  • JavaRAP
    JavaRAP is an implementation of the classic Resolution of Anaphora Procedure (RAP) given by Lappin and Leass (1994) . It resolves third person pronouns, lexical anaphors, and identifies pleonastic pronouns. The original purpose of the implementation is to provide anaphora resolution result to our TREC 2003 Q&A system.
    »
    « 1.11: index
     
  • LBJ NER Tagger
    This is a state of the art NER tagger that tags plain text with named entitites (people / organizations / locations / miscellaneous).
  • LKB
    The LKB system is a grammar and lexicon development environment for use with unification-based linguistic formalisms. While not restricted to HPSG, the LKB implements the DELPH-IN reference formalism of typed feature structures (jointly with other DELPH-IN software using the same formalism).
  • LingPipe
    LingPipe is a suite of Java libraries for information extraction and data mining.
    »
    « 3.9.2: index | 3.7.0: index | 3.5.1: index
     
  • Link Grammar Parser
    The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax.
    »
    « 4.1b: index
     
  • LoPar
    LoPar is an implementation of a parser for head-lexicalised probabilistic context-free grammars.
  • MINIPAR
    MINIPAR is a broad-coverage parser for the English language. An evaluation with the SUSANNE corpus shows that MINIPAR achieves about 88% precision and 80% recall with respect to dependency relationships. MINIPAR is very efficient, on a Pentium II 300 with 128MB memory, it parses about 300 words per second.
  • MSTParser
    MSTParser is a non-projective dependency parser that searches for maximum spanning trees over directed graphs. Models of dependency structure are based on large-margin discriminative training methods. Projective parsing is also supported.
  • MXPOST
    MXPOST is a JAVA (JDK 1.1) implementation of the part-of-speech tagger described in: Adwait Ratnaparkhi. A Maximum Entropy Part-Of-Speech Tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996. University of Pennsylvania
  • MaltParser
    MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model.
    »
    « 1.4.1: index | 1.4: index | 1.3: index
     
  • Mate-SRL
    Semantic role labeling system. See A. Björkelund, L. Hafdell, and P. Nugues. Multilingual semantic role labeling. In Proceedings of The Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), pages 43--48, Boulder, June 4--5 2009.
  • Memory-based Tagger Generator and Tagger
    The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences.
  • MorphAdorner
    MorphAdorner is a Java command-line program which acts as a pipeline manager for processes performing morphological adornment of words in a text. We use the term "adornment" in preference to terms such as "annotation" or "tagging" which carry too many alternative and confusing meanings. Adornment harkens back to the medieval sense of manuscript adornment or illumination -- attaching pictures and marginal comments to texts.
  • IRST-LM
    The IRST Language Modeling Toolkit features algorithms and data structures suitable to estimate, store, and access very large LMs. Our software has been integrated into a popular open source Statistical Machine Translation decoder called Moses, and is compatible with language models created with other tools, such as the SRILM Tooolkit.
    »
    « 3712-srilm: index | 3712-irstlm: index | 2010-08-13: index | 1.6.0: index | 1.15.11: index
     
  • Named Entity Tagger
    The Named Entity Tagger is a self-contained package which incorporates versions of SNoW and FEX, together with an inference module. It includes a network trained to recognize Person, Location, Organization and Misc. entities in English.
  • OpenCCG
    OpenCCG, the OpenNLP CCG Library, is an open source natural language processing library written in Java, which provides parsing and realization services based on Mark Steedman's Combinatory Categorial Grammar (CCG) formalism.
  • ParseBanker
    The LFG Parsebanker Interface is a Web-based tool for building LFG treebanks (parsebanks). It includes a discriminant-based mechanism for disambiguation of parses.
  • RASP
    RASP is a domain-independent, robust parsing system for English.
  • Reranking Parser
    A reranking parser which uses a regularized MaxEnt reranker to select the best parse from the 50-best parses returned by a generative parsing model.
  • SPASS
    SPASS is an automated theorem prover for first-order logic with equality.
    »
    « 3.0: index
     
  • Semafor
    SEMAFOR: Semantic Analysis of Frame Representations is a tool for automatic analysis of the frame-semantic structure of English text.
  • SenseLearner
    The goal of the SenseLearner project is to conduct exploratory research of various WSD techniques to enable the development of a tool for semantic tagging of all words in unrestricted text.
  • Shalmaneser
    Shalmaneser is a supervised learning toolbox for shallow semantic parsing, i.e. the automatic assignment of semantic classes and roles to text. The system was developed for Frame Semantics; thus we use Frame Semantics terminology and call the classes frames and the roles frame elements. However, the architecture is reasonably general: It can handle any role-semantic paradigm (e.g., PropBank roles) and any set of word senses (e.g., WordNet synsets), provided the input data is offered in SalsaTigerXML.
  • Sleepy Student Parser
    'Sleepy' is a simple unlexicalized parser for German, returning both syntactic category and grammatical function labels in the tree. It will not be able to parse some sentences - coverage is only 93% on newspaper text.
  • Stanford POS Tagger
    This software is a Java implementation of the log-linear part-of-speech taggers described in: Kristina Toutanova and Christopher D. Manning. 2000. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70. Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.
    »
    « 2.0: index | 1.6: index
     
  • Stanford Parser
    This package is a Java implementation of probabilistic natural language parsers, both highly optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser. The original version of this parser was mainly written by Dan Klein, with support code and linguistic grammar development by Christopher Manning. Extensive additional work (internationalization and language-specific modeling, flexible input/output, grammar compaction, lattice parsing, typed dependencies output, user support, etc.) has been done by Roger Levy, Christopher Manning, Teg Grenager, Galen Andrew, Marie-Catherine de Marneffe, Bill MacCartney, Huihsin Tseng, Pi-Chuan Chang, Wolfgang Maier, and Jenny Finkel.
    »
    « 1.6.3: index | 1.6: index
     
  • Stanford Named Entity Recognizer
    CRFClassifier is a Java implementation of a Named Entity Recognizer. The software provides an implementation of Conditional Random Field sequence models, of the sort pioneered by Lafferty, McCallum, and Pereira (2001), coupled with well-engineered feature extractors for Named Entity Recognition.
    »
    « 1.1.1: index | 1.1: index
     
  • Tarsqi Toolkit
    The Tarsqi Toolkit (TTK) is a set of components for extracting temporal information from a news wire text. TTK extracts time expressions, events, subordination links and temporal links; in addition, it ensures consistency of temporal information.
  • Theorist
    A compiler from Theorist into Prolog (it has been tested on both Sicstus and Quintus Prologs), and many example Theorist programs, including most of the standard nonmonotonic (but very monotonous) examples, diagnostic examples, and examples of scene interpretation.
  • TreeTagger
    The TreeTagger is a tool for annotating text with part-of-speech and lemma information which has been developed within the TC project at the Institute for Computational Linguistics of the University of Stuttgart.
  • UKB
    UKB
    »
    « 0.1.5: index | 0.1.3: index | 0.1.0: index
     
  • WFSC
    WFSC compiles regular expressions into multi-tape weighted finite-state machines (n-WFSMs) with symbol classes. These machines define regular (also called rational) n-ary relations which assign a weight from some semiring to any n-tuple of strings (0 if the n-tuple is not accepted). Special cases of n-WFSMs are weighted acceptors (n=1) and weighted transducer (n=2).
  • XLE
    XLE consists of algorithms for parsing and generating Lexical Functional Grammars (LFGs) along with a rich graphical user interface for writing and debugging such grammars.
    »
    « 2010-10-06: index | 2010-02-19: index | 2009-09-18: index | 2009-08-12: index | 2009-01-21: index | 2008-10-27: index | 2008-08-28, 64bit: index | 2008-08-28: index | 2008-02-25: index | 2007-04: index | 18. April 2008: index
     
  • XRay
    XRay is a Prolog-technology theorem prover for reasoning from incomplete information; it is based on an approach to query-answering in default logics described in Schaub (1995).
  • YamCha
    YamCha (Yet Another Multipurpose CHunk Annotator) is a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. YamCha is using a state-of-the-art machine learning algorithm called Support Vector Machines (SVMs), first introduced by Vapnik in 1995.