# MONTY TAGGER - A Brill-Based POS Tagger for Python/Java
# 
# Author: Hugo Liu <hugo@media.mit.edu>
# Project Page: <http://web.media.mit.edu/~hugo/montytagger>
# 
# Copyright (c) 2002,2003 by Hugo Liu, MIT Media Lab
# Original Brill data (c) Eric Brill, UPenn, M.I.T.
#
# Use is granted under the GNU General Public License (GPL):
# <http://www.gnu.org/licenses/gpl.html>
#
# About MontyTagger:
#   - tokenizes and tags English texts
#   - uses Penn Treebank tagset
#   - basic tagging based on Brill'94
#   - uses Brill94-compatible lexicon and rule files
#     (LEXICON,LEXICALRULEFILE,CONTEXTUALRULEFILE) included
#   - basic tagging at 200 words/sec in python
#   - basic tagging has 96% word-level accuracy
#     on English non-fiction (same as Brill94)
#   - written in python, full cross-platform compatibility
#   - also available as a Java .jar file
#
# Suggestions for Use and API:
#   - running "python MontyTagger.py" from the command line
#     will bring up the interactive interpreter
#   - type "python MontyTagger.py /?" for command line usage
#   - Python API:
#       - tag(text,expand_contractions_p,all_pos_p)
#           - use this to tokenize & tag text
#           - returns text in word/NN format
#           - expand_contractions_p = 0 or 1; changes
#             contraction handling in tokenizer
#           - all_pos_p = 0 or 1; if set to 1, will
#             display all plausible tags for each word
#             as word/TAG1/TAG2 
#       - tag_tokenized(text,all_pos_p)
#           - use this to tag already tokenized text
#
#
# New in Version 1.2:
#   - lexicon reimplemented; additional optimizations
#   - 100% tagging speed improvement
#      - python: (v1.0: 200words/s, v1.2: 500words/s)
#      - java: (v1.0: 80words/s, v1.2: 200words/s)
#
#   - 160%-400% memory usage improvement
#      - python: (v1.0: 20mb, v1.2: 5mb)
#      - java: (v1.0: 40mb, v1.2: 25mb)
#
#   - 400%-1000% improvement in tagger loading time
#      - python: (v1.0: 10secs, v1.2: 1sec)
#      - java: (v1.0: 22secs, v1.2: 5secs)
#
# New in Version 1.0:
#   - python version tested and benchmarked
#   - currently TBL training is not implemented
#
# --please send bugs & suggestions to hugo@media.mit.edu--
#