|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectws.qe.MatrixVector
This class contains the methods for creating matrices and calculating vectors
accordingly, based on the so-called "Local Association Clustering" and "Local
Metric Clusterring" algorithms.
The implementation of this class is supported by Jakarta Lucene, for more
information please see Jakarta
Lucene ( javadoc ).
Tools.queryExpansionResult(String, ArrayList, ArrayList, float[], String)| Field Summary | |
private int |
numDocs
The number of documents. |
private int |
numStems
The number of stems found in all the documents. |
private static java.lang.String |
stemIndexPath
The index directory for the documents of stems. |
private java.util.ArrayList |
stemSet
The set of the stems found in all the documents. |
private static java.lang.String |
tokenIndexPath
The index directory for the documents of tokens. |
private java.util.ArrayList |
tokenSet
The set of the tokens found in all the documents. |
| Constructor Summary | |
MatrixVector(java.lang.String stemIndexPath,
java.lang.String tokenIndexPath,
java.util.ArrayList tokenSet,
java.util.ArrayList stemSet)
Initializes a Matrix object. |
|
| Method Summary | |
private int |
Correlation_AC(java.util.Hashtable docStemMatrix,
int u,
int v)
This method calculates C(u, v), the so-called "unnormalized association corelation factor" based on the algorithm "Association Clustering". |
private static float |
Correlation_MC(java.util.ArrayList tokenListu,
int sizeu,
java.util.ArrayList tokenListv,
int sizev)
This method calculates C(u, v), the so-called "unnormalized metric corelation factor" based on the algorithm "Association Clustering". |
private static float |
distanceTokens_MC(int[] posListu,
int[] posListv)
This method calculates, in a document, the sum of the distance between all the tokens, which belong to two words. |
private static float |
distanceWords_MC(java.util.Hashtable docPosu,
java.util.Hashtable docPosv)
This method calculates the distance between two words in a document. |
protected java.util.Hashtable |
getDocStemMatrix_AC()
Creates the so-called "document-stem-matrix" based on the algorithm "Association Clustering". |
protected float[] |
getStemStemVector_AC(java.util.Hashtable docStemMatrix,
java.lang.String query)
This method calculates one "stem-stem-vector" for a given stemmed query in the normalized "stem-stem-matrix" based on the algorithm "Association Clustering". |
static float[] |
getTopStemStemVector_MC(java.lang.String query,
java.util.ArrayList tokenSet,
java.util.ArrayList stemSet,
int[] topStemsPosition)
This method calculates only part of the "stem-stem-vector" for a given stemmed query in the normalized "stem-stem-matrix" based on the algorithm "Metric Clustering". |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
private static java.lang.String stemIndexPath
Tools.indexDocs(String, String),
Tools.doIndexing(IndexWriter, File)private static java.lang.String tokenIndexPath
Tools.indexDocs(String, String),
Tools.doIndexing(IndexWriter, File)private java.util.ArrayList stemSet
Tools.getSet(String)private java.util.ArrayList tokenSet
Tools.getSet(String)private int numDocs
Google.getURLsList(JDialog, String, int),
Tools.WebTextTokenStem(String[], String, String, String, String),
Tools.UrlWebTextTokenStem(String, int)private int numStems
Tools.getSet(String)| Constructor Detail |
public MatrixVector(java.lang.String stemIndexPath,
java.lang.String tokenIndexPath,
java.util.ArrayList tokenSet,
java.util.ArrayList stemSet)
throws java.io.IOException
stemIndexPath - The index directory for the documents of stems.tokenIndexPath - The index directory for the documents of tokens.tokenSet - The set of the tokens found in all the documents.stemSet - The set of the stems found in all the documents.
java.io.IOExceptionTools.indexDocs(String, String),
Tools.getSet(String)| Method Detail |
protected java.util.Hashtable getDocStemMatrix_AC()
throws java.io.IOException
java.io.IOExceptionorg.apache.lucene.index.TermDocs
protected float[] getStemStemVector_AC(java.util.Hashtable docStemMatrix,
java.lang.String query)
throws java.io.IOException
docStemMatrix - The so-called "document-stem-matrix".query - The query, a English word, the same word delievered by
"Word"-inputfield.
java.io.IOExceptionCorrelation_AC(Hashtable, int, int),
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[]),
Tools.queryExpansionResult(String, ArrayList, ArrayList, float[], String)
private int Correlation_AC(java.util.Hashtable docStemMatrix,
int u,
int v)
throws java.io.IOException
docStemMatrix - The so-called "document-stem-matrix".u - The u-th row in "stem-stem-matrix".v - The v-th column in "stem-stem-matrix".
java.io.IOExceptiongetStemStemVector_AC(Hashtable, String)
public static float[] getTopStemStemVector_MC(java.lang.String query,
java.util.ArrayList tokenSet,
java.util.ArrayList stemSet,
int[] topStemsPosition)
throws java.io.IOException
query - The query, an English word, the same word delievered by
"Word"-inputfield.tokenSet - The set of the tokens found in all the documents.stemSet - The set of the stems found in all the documents.topStemsPosition - The positions of the stems in stem-set, these stems have top
values in "stem-stem-vector" based on the algorithm
"Association Clustering".
java.io.IOExceptionCorrelation_MC(ArrayList, int, ArrayList, int),
distanceWords_MC(Hashtable, Hashtable),
distanceTokens_MC(int[], int[]),
getStemStemVector_AC(Hashtable, String),
Tools.queryExpansionResult(String, ArrayList, ArrayList, float[], String)
private static float Correlation_MC(java.util.ArrayList tokenListu,
int sizeu,
java.util.ArrayList tokenListv,
int sizev)
throws java.io.IOException
tokenListu - A list of words for the u-th stem in "stem-stem-matrix".sizeu - The size of the list above.tokenListv - A list of words for the u-th stem in "stem-stem-matrix".sizev - The size of the list above.
java.io.IOExceptiongetTopStemStemVector_MC(String, ArrayList, ArrayList, int[]),
distanceTokens_MC(int[], int[]),
distanceWords_MC(Hashtable, Hashtable),
org.apache.lucene.index.TermPositions
private static float distanceWords_MC(java.util.Hashtable docPosu,
java.util.Hashtable docPosv)
docPosu - The hashtable saving all words and their position for the u-th
stem in "stem-stem-matrix".docPosv - The hashtable saving all words and their position for the v-th
stem in "stem-stem-matrix".
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[]),
distanceTokens_MC(int[], int[]),
distanceWords_MC(Hashtable, Hashtable)
private static float distanceTokens_MC(int[] posListu,
int[] posListv)
posListu - The Positions of all the tokens belonging to one word in a
document, this word is one of the words sharing the same u-th
stem in "stem-stem-matrix".posListv - The Positions of all the tokens belonging to one word in a
document, this word is one of the words sharing the same v-th
stem in "stem-stem-matrix".
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[]),
distanceTokens_MC(int[], int[]),
distanceWords_MC(Hashtable, Hashtable)
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||