#include <token.h>
Inherits safe_bool< T >< >.
Inherited by PhraseHunter::CorpusTokenBase, PhraseHunter::EmptyToken, and PhraseHunter::MutableToken.
Inheritance diagram for PhraseHunter::Token:
Public Member Functions | |
virtual | ~Token () |
virtual size_t | length () const |
Return the real length (bytes) of a token string in the index. | |
bool | isUniform () const |
Returns true if all occurrences in all files are expected to look *exactly* the same. | |
virtual bool | isEmpty () const |
Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty. | |
virtual unsigned int | corpusFrequency () const=0 |
Return the number of times this Token occurs in the corpus. | |
schma::UnicodePtr | tokenString () const |
Returns the string to which this Token belongs as a UnicodePtr. | |
virtual unsigned int | documentFrequency () const |
Returns the number of documents a Token occurs in. | |
bool | inDoc (DocID docID) const |
Returns true if this Token occurs in a document. | |
const PositionList & | documentOccurrences (DocID docID) |
Returns a reference to all offsets of a Token in a particular document as a PositionList. | |
const OccurrenceMap & | allOccurrences () const |
Return a reference to the entire OccurrenceMap of a Token. | |
virtual std::vector< DocID > | documentIDs () const |
Returns the DocIDs of all documents, in which this Token occurs. | |
virtual unsigned int | numTokens () const |
virtual TokenID | id () const |
Returns the ID for this Token. | |
Protected Member Functions | |
Token (const char *token) | |
Token (schma::UnicodePtr tokenstring) | |
Protected Attributes | |
schma::UnicodePtr | m_tokenstring |
OccurrenceMap | m_occurrences |
Static Protected Attributes | |
static const int | SPACE_BETWEEN_TWO_TOKENS = 2 |
A Token consists of a particular string (i.e. word type) and a map of its occurrences, i.e. documents and offsets, in the corpus. Tokens may be used in boolean expressions.
Definition at line 46 of file token.h.
PhraseHunter::Token::Token | ( | const char * | token | ) | [inline, protected] |
PhraseHunter::Token::Token | ( | schma::UnicodePtr | tokenstring | ) | [inline, protected] |
virtual size_t PhraseHunter::Token::length | ( | ) | const [inline, virtual] |
Return the real length (bytes) of a token string in the index.
Reimplemented in PhraseHunter::Phrase.
Definition at line 62 of file token.h.
References m_tokenstring, and SPACE_BETWEEN_TWO_TOKENS.
bool PhraseHunter::Token::isUniform | ( | ) | const [inline] |
virtual bool PhraseHunter::Token::isEmpty | ( | ) | const [inline, virtual] |
Returns true if there are no actual occurrences of this Token in a corpus, i.e. its occurrence map is empty.
Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::CorpusToken.
Definition at line 72 of file token.h.
References m_occurrences.
Referenced by PhraseHunter::Phrase::getAdjacent().
virtual unsigned int PhraseHunter::Token::corpusFrequency | ( | ) | const [pure virtual] |
Return the number of times this Token occurs in the corpus.
Implemented in PhraseHunter::EmptyToken, PhraseHunter::CorpusTokenBase, and PhraseHunter::MutableToken.
schma::UnicodePtr PhraseHunter::Token::tokenString | ( | ) | const [inline] |
Returns the string to which this Token belongs as a UnicodePtr.
Definition at line 78 of file token.h.
References m_tokenstring.
virtual unsigned int PhraseHunter::Token::documentFrequency | ( | ) | const [inline, virtual] |
Returns the number of documents a Token occurs in.
Reimplemented in PhraseHunter::LightCorpusToken.
Definition at line 81 of file token.h.
References m_occurrences.
bool PhraseHunter::Token::inDoc | ( | DocID | docID | ) | const [inline] |
Returns true if this Token occurs in a document.
Definition at line 84 of file token.h.
References m_occurrences.
Referenced by PhraseHunter::MutableToken::removeDocument().
const PositionList& PhraseHunter::Token::documentOccurrences | ( | DocID | docID | ) | [inline] |
Returns a reference to all offsets of a Token in a particular document as a PositionList.
Definition at line 87 of file token.h.
References m_occurrences.
const OccurrenceMap& PhraseHunter::Token::allOccurrences | ( | ) | const [inline] |
Return a reference to the entire OccurrenceMap of a Token.
Definition at line 90 of file token.h.
References m_occurrences.
std::vector< DocID > PhraseHunter::Token::documentIDs | ( | ) | const [virtual] |
Returns the DocIDs of all documents, in which this Token occurs.
Definition at line 33 of file token.cpp.
References m_occurrences.
virtual unsigned int PhraseHunter::Token::numTokens | ( | ) | const [inline, virtual] |
Reimplemented in PhraseHunter::EmptyToken, and PhraseHunter::Phrase.
virtual TokenID PhraseHunter::Token::id | ( | ) | const [inline, virtual] |
Returns the ID for this Token.
Reimplemented in PhraseHunter::CorpusTokenBase.
Definition at line 96 of file token.h.
References PhraseHunter::InvalidTokenID.
schma::UnicodePtr PhraseHunter::Token::m_tokenstring [protected] |
Definition at line 49 of file token.h.
Referenced by PhraseHunter::Phrase::length(), length(), and tokenString().
OccurrenceMap PhraseHunter::Token::m_occurrences [protected] |
Definition at line 50 of file token.h.
Referenced by PhraseHunter::MutableToken::addOccurrence(), allOccurrences(), documentFrequency(), documentIDs(), documentOccurrences(), inDoc(), PhraseHunter::CorpusToken::insertPositions(), isEmpty(), and PhraseHunter::MutableToken::removeDocument().
const int PhraseHunter::Token::SPACE_BETWEEN_TWO_TOKENS = 2 [static, protected] |