PhraseHunter::Token

PhraseHunter::Token Class Reference

Virtual base class for all Token descendants. More...

#include <token.h>

Inherits safe_bool< T >< >.

Inherited by PhraseHunter::CorpusTokenBase, PhraseHunter::EmptyToken, and PhraseHunter::MutableToken.

Inheritance diagram for PhraseHunter::Token:

Inheritance graph

[legend]Collaboration diagram for PhraseHunter::Token:

Collaboration graph

[legend]List of all members.


Public Member Functions
virtual	~Token ()
virtual size_t	length () const
	Return the real length (bytes) of a token string in the index.
bool	isUniform () const
	Returns true if all occurrences in all files are expected to look exactly the same.
virtual bool	isEmpty () const
	Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty.
virtual unsigned int	corpusFrequency () const=0
	Return the number of times this Token occurs in the corpus.
schma::UnicodePtr	tokenString () const
	Returns the string to which this Token belongs as a UnicodePtr.
virtual unsigned int	documentFrequency () const
	Returns the number of documents a Token occurs in.
bool	inDoc (DocID docID) const
	Returns true if this Token occurs in a document.
const PositionList &	documentOccurrences (DocID docID)
	Returns a reference to all offsets of a Token in a particular document as a PositionList.
const OccurrenceMap &	allOccurrences () const
	Return a reference to the entire OccurrenceMap of a Token.
virtual std::vector< DocID >	documentIDs () const
	Returns the DocIDs of all documents, in which this Token occurs.
virtual unsigned int	numTokens () const
virtual TokenID	id () const
	Returns the ID for this Token.
Protected Member Functions
	Token (const char *token)
	Token (schma::UnicodePtr tokenstring)
Protected Attributes
schma::UnicodePtr	m_tokenstring
OccurrenceMap	m_occurrences
Static Protected Attributes
static const int	SPACE_BETWEEN_TWO_TOKENS = 2

Detailed Description

Virtual base class for all Token descendants.

A Token consists of a particular string (i.e. word type) and a map of its occurrences, i.e. documents and offsets, in the corpus. Tokens may be used in boolean expressions.

Definition at line 46 of file token.h.

Constructor & Destructor Documentation

PhraseHunter::Token::Token ( const char * token ) [inline, protected]

Definition at line 51 of file token.h.

PhraseHunter::Token::Token ( schma::UnicodePtr tokenstring ) [inline, protected]

Definition at line 53 of file token.h.

virtual PhraseHunter::Token::~Token ( ) [inline, virtual]

Definition at line 59 of file token.h.

Member Function Documentation

virtual size_t PhraseHunter::Token::length ( ) const [inline, virtual]

Return the real length (bytes) of a token string in the index.

Reimplemented in PhraseHunter::Phrase.

Definition at line 62 of file token.h.

References m_tokenstring, and SPACE_BETWEEN_TWO_TOKENS.

bool PhraseHunter::Token::isUniform ( ) const [inline]

Returns true if all occurrences in all files are expected to look *exactly* the same.

Definition at line 67 of file token.h.

virtual bool PhraseHunter::Token::isEmpty ( ) const [inline, virtual]

Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty.

Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::CorpusToken.

Definition at line 72 of file token.h.

References m_occurrences.

Referenced by PhraseHunter::Phrase::getAdjacent().

virtual unsigned int PhraseHunter::Token::corpusFrequency ( ) const [pure virtual]

Return the number of times this Token occurs in the corpus.

Implemented in PhraseHunter::EmptyToken, PhraseHunter::CorpusTokenBase, and PhraseHunter::MutableToken.

schma::UnicodePtr PhraseHunter::Token::tokenString ( ) const [inline]

Returns the string to which this Token belongs as a UnicodePtr.

Definition at line 78 of file token.h.

References m_tokenstring.

virtual unsigned int PhraseHunter::Token::documentFrequency ( ) const [inline, virtual]

Returns the number of documents a Token occurs in.

Reimplemented in PhraseHunter::LightCorpusToken.

Definition at line 81 of file token.h.

References m_occurrences.

bool PhraseHunter::Token::inDoc ( DocID docID ) const [inline]

Returns true if this Token occurs in a document.

Definition at line 84 of file token.h.

References m_occurrences.

Referenced by PhraseHunter::MutableToken::removeDocument().

const PositionList& PhraseHunter::Token::documentOccurrences ( DocID docID ) [inline]

Returns a reference to all offsets of a Token in a particular document as a PositionList.

Definition at line 87 of file token.h.

References m_occurrences.

const OccurrenceMap& PhraseHunter::Token::allOccurrences ( ) const [inline]

Return a reference to the entire OccurrenceMap of a Token.

Definition at line 90 of file token.h.

References m_occurrences.

std::vector< DocID > PhraseHunter::Token::documentIDs ( ) const [virtual]

Returns the DocIDs of all documents, in which this Token occurs.

Definition at line 33 of file token.cpp.

References m_occurrences.

virtual unsigned int PhraseHunter::Token::numTokens ( ) const [inline, virtual]

Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::Phrase.

Definition at line 94 of file token.h.

virtual TokenID PhraseHunter::Token::id ( ) const [inline, virtual]

Returns the ID for this Token.

Reimplemented in PhraseHunter::CorpusTokenBase.

Definition at line 96 of file token.h.

References PhraseHunter::InvalidTokenID.

Member Data Documentation

schma::UnicodePtr PhraseHunter::Token::m_tokenstring [protected]

Definition at line 49 of file token.h.

Referenced by PhraseHunter::Phrase::length(), length(), and tokenString().

OccurrenceMap PhraseHunter::Token::m_occurrences [protected]

Definition at line 50 of file token.h.

Referenced by PhraseHunter::MutableToken::addOccurrence(), allOccurrences(), documentFrequency(), documentIDs(), documentOccurrences(), inDoc(), PhraseHunter::CorpusToken::insertPositions(), isEmpty(), and PhraseHunter::MutableToken::removeDocument().

const int PhraseHunter::Token::SPACE_BETWEEN_TWO_TOKENS = 2 [static, protected]

Definition at line 56 of file token.h.

Referenced by length().

The documentation for this class was generated from the following files:

phrasehunter/include/phrasehunter/token.h
phrasehunter/lib/token.cpp

Generated on Thu Dec 21 16:14:44 2006 for The Phrasehunter by

doxygen

1.5.1