PhraseHunter::Token Class Reference

Virtual base class for all Token descendants. More...

#include <token.h>

Inherits safe_bool< T >< >.

Inherited by PhraseHunter::CorpusTokenBase, PhraseHunter::EmptyToken, and PhraseHunter::MutableToken.

Inheritance diagram for PhraseHunter::Token:

Inheritance graph
[legend]
Collaboration diagram for PhraseHunter::Token:

Collaboration graph
[legend]
List of all members.

Public Member Functions

virtual ~Token ()
virtual size_t length () const
 Return the real length (bytes) of a token string in the index.
bool isUniform () const
 Returns true if all occurrences in all files are expected to look *exactly* the same.
virtual bool isEmpty () const
 Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty.
virtual unsigned int corpusFrequency () const=0
 Return the number of times this Token occurs in the corpus.
schma::UnicodePtr tokenString () const
 Returns the string to which this Token belongs as a UnicodePtr.
virtual unsigned int documentFrequency () const
 Returns the number of documents a Token occurs in.
bool inDoc (DocID docID) const
 Returns true if this Token occurs in a document.
const PositionListdocumentOccurrences (DocID docID)
 Returns a reference to all offsets of a Token in a particular document as a PositionList.
const OccurrenceMapallOccurrences () const
 Return a reference to the entire OccurrenceMap of a Token.
virtual std::vector< DocIDdocumentIDs () const
 Returns the DocIDs of all documents, in which this Token occurs.
virtual unsigned int numTokens () const
virtual TokenID id () const
 Returns the ID for this Token.

Protected Member Functions

 Token (const char *token)
 Token (schma::UnicodePtr tokenstring)

Protected Attributes

schma::UnicodePtr m_tokenstring
OccurrenceMap m_occurrences

Static Protected Attributes

static const int SPACE_BETWEEN_TWO_TOKENS = 2

Detailed Description

Virtual base class for all Token descendants.

A Token consists of a particular string (i.e. word type) and a map of its occurrences, i.e. documents and offsets, in the corpus. Tokens may be used in boolean expressions.

Definition at line 46 of file token.h.


Constructor & Destructor Documentation

PhraseHunter::Token::Token ( const char *  token  )  [inline, protected]

Definition at line 51 of file token.h.

PhraseHunter::Token::Token ( schma::UnicodePtr  tokenstring  )  [inline, protected]

Definition at line 53 of file token.h.

virtual PhraseHunter::Token::~Token (  )  [inline, virtual]

Definition at line 59 of file token.h.


Member Function Documentation

virtual size_t PhraseHunter::Token::length (  )  const [inline, virtual]

Return the real length (bytes) of a token string in the index.

Reimplemented in PhraseHunter::Phrase.

Definition at line 62 of file token.h.

References m_tokenstring, and SPACE_BETWEEN_TWO_TOKENS.

bool PhraseHunter::Token::isUniform (  )  const [inline]

Returns true if all occurrences in all files are expected to look *exactly* the same.

Definition at line 67 of file token.h.

virtual bool PhraseHunter::Token::isEmpty (  )  const [inline, virtual]

Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty.

Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::CorpusToken.

Definition at line 72 of file token.h.

References m_occurrences.

Referenced by PhraseHunter::Phrase::getAdjacent().

virtual unsigned int PhraseHunter::Token::corpusFrequency (  )  const [pure virtual]

Return the number of times this Token occurs in the corpus.

Implemented in PhraseHunter::EmptyToken, PhraseHunter::CorpusTokenBase, and PhraseHunter::MutableToken.

schma::UnicodePtr PhraseHunter::Token::tokenString (  )  const [inline]

Returns the string to which this Token belongs as a UnicodePtr.

Definition at line 78 of file token.h.

References m_tokenstring.

virtual unsigned int PhraseHunter::Token::documentFrequency (  )  const [inline, virtual]

Returns the number of documents a Token occurs in.

Reimplemented in PhraseHunter::LightCorpusToken.

Definition at line 81 of file token.h.

References m_occurrences.

bool PhraseHunter::Token::inDoc ( DocID  docID  )  const [inline]

Returns true if this Token occurs in a document.

Definition at line 84 of file token.h.

References m_occurrences.

Referenced by PhraseHunter::MutableToken::removeDocument().

const PositionList& PhraseHunter::Token::documentOccurrences ( DocID  docID  )  [inline]

Returns a reference to all offsets of a Token in a particular document as a PositionList.

Definition at line 87 of file token.h.

References m_occurrences.

const OccurrenceMap& PhraseHunter::Token::allOccurrences (  )  const [inline]

Return a reference to the entire OccurrenceMap of a Token.

Definition at line 90 of file token.h.

References m_occurrences.

std::vector< DocID > PhraseHunter::Token::documentIDs (  )  const [virtual]

Returns the DocIDs of all documents, in which this Token occurs.

Definition at line 33 of file token.cpp.

References m_occurrences.

virtual unsigned int PhraseHunter::Token::numTokens (  )  const [inline, virtual]

Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::Phrase.

Definition at line 94 of file token.h.

virtual TokenID PhraseHunter::Token::id (  )  const [inline, virtual]

Returns the ID for this Token.

Reimplemented in PhraseHunter::CorpusTokenBase.

Definition at line 96 of file token.h.

References PhraseHunter::InvalidTokenID.


Member Data Documentation

schma::UnicodePtr PhraseHunter::Token::m_tokenstring [protected]

Definition at line 49 of file token.h.

Referenced by PhraseHunter::Phrase::length(), length(), and tokenString().

OccurrenceMap PhraseHunter::Token::m_occurrences [protected]

Definition at line 50 of file token.h.

Referenced by PhraseHunter::MutableToken::addOccurrence(), allOccurrences(), documentFrequency(), documentIDs(), documentOccurrences(), inDoc(), PhraseHunter::CorpusToken::insertPositions(), isEmpty(), and PhraseHunter::MutableToken::removeDocument().

const int PhraseHunter::Token::SPACE_BETWEEN_TWO_TOKENS = 2 [static, protected]

Definition at line 56 of file token.h.

Referenced by length().


The documentation for this class was generated from the following files:
Generated on Thu Dec 21 16:14:44 2006 for The Phrasehunter by  doxygen 1.5.1