|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectws.qe.MatrixVector
This class contains the methods for creating matrices and calculating vectors
accordingly, based on the so-called "Local Association Clustering" and "Local
Metric Clusterring" algorithms.
The implementation of this class is supported by Jakarta Lucene, for more
information please see Jakarta
Lucene ( javadoc ).
Tools.queryExpansionResult(String, ArrayList, ArrayList, float[], String)
Field Summary | |
private int |
numDocs
The number of documents. |
private int |
numStems
The number of stems found in all the documents. |
private static java.lang.String |
stemIndexPath
The index directory for the documents of stems. |
private java.util.ArrayList |
stemSet
The set of the stems found in all the documents. |
private static java.lang.String |
tokenIndexPath
The index directory for the documents of tokens. |
private java.util.ArrayList |
tokenSet
The set of the tokens found in all the documents. |
Constructor Summary | |
MatrixVector(java.lang.String stemIndexPath,
java.lang.String tokenIndexPath,
java.util.ArrayList tokenSet,
java.util.ArrayList stemSet)
Initializes a Matrix object. |
Method Summary | |
private int |
Correlation_AC(java.util.Hashtable docStemMatrix,
int u,
int v)
This method calculates C(u, v), the so-called "unnormalized association corelation factor" based on the algorithm "Association Clustering". |
private static float |
Correlation_MC(java.util.ArrayList tokenListu,
int sizeu,
java.util.ArrayList tokenListv,
int sizev)
This method calculates C(u, v), the so-called "unnormalized metric corelation factor" based on the algorithm "Association Clustering". |
private static float |
distanceTokens_MC(int[] posListu,
int[] posListv)
This method calculates, in a document, the sum of the distance between all the tokens, which belong to two words. |
private static float |
distanceWords_MC(java.util.Hashtable docPosu,
java.util.Hashtable docPosv)
This method calculates the distance between two words in a document. |
protected java.util.Hashtable |
getDocStemMatrix_AC()
Creates the so-called "document-stem-matrix" based on the algorithm "Association Clustering". |
protected float[] |
getStemStemVector_AC(java.util.Hashtable docStemMatrix,
java.lang.String query)
This method calculates one "stem-stem-vector" for a given stemmed query in the normalized "stem-stem-matrix" based on the algorithm "Association Clustering". |
static float[] |
getTopStemStemVector_MC(java.lang.String query,
java.util.ArrayList tokenSet,
java.util.ArrayList stemSet,
int[] topStemsPosition)
This method calculates only part of the "stem-stem-vector" for a given stemmed query in the normalized "stem-stem-matrix" based on the algorithm "Metric Clustering". |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
private static java.lang.String stemIndexPath
Tools.indexDocs(String, String)
,
Tools.doIndexing(IndexWriter, File)
private static java.lang.String tokenIndexPath
Tools.indexDocs(String, String)
,
Tools.doIndexing(IndexWriter, File)
private java.util.ArrayList stemSet
Tools.getSet(String)
private java.util.ArrayList tokenSet
Tools.getSet(String)
private int numDocs
Google.getURLsList(JDialog, String, int)
,
Tools.WebTextTokenStem(String[], String, String, String, String)
,
Tools.UrlWebTextTokenStem(String, int)
private int numStems
Tools.getSet(String)
Constructor Detail |
public MatrixVector(java.lang.String stemIndexPath, java.lang.String tokenIndexPath, java.util.ArrayList tokenSet, java.util.ArrayList stemSet) throws java.io.IOException
stemIndexPath
- The index directory for the documents of stems.tokenIndexPath
- The index directory for the documents of tokens.tokenSet
- The set of the tokens found in all the documents.stemSet
- The set of the stems found in all the documents.
java.io.IOException
Tools.indexDocs(String, String)
,
Tools.getSet(String)
Method Detail |
protected java.util.Hashtable getDocStemMatrix_AC() throws java.io.IOException
java.io.IOException
org.apache.lucene.index.TermDocs
protected float[] getStemStemVector_AC(java.util.Hashtable docStemMatrix, java.lang.String query) throws java.io.IOException
docStemMatrix
- The so-called "document-stem-matrix".query
- The query, a English word, the same word delievered by
"Word"-inputfield.
java.io.IOException
Correlation_AC(Hashtable, int, int)
,
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[])
,
Tools.queryExpansionResult(String, ArrayList, ArrayList, float[], String)
private int Correlation_AC(java.util.Hashtable docStemMatrix, int u, int v) throws java.io.IOException
docStemMatrix
- The so-called "document-stem-matrix".u
- The u-th row in "stem-stem-matrix".v
- The v-th column in "stem-stem-matrix".
java.io.IOException
getStemStemVector_AC(Hashtable, String)
public static float[] getTopStemStemVector_MC(java.lang.String query, java.util.ArrayList tokenSet, java.util.ArrayList stemSet, int[] topStemsPosition) throws java.io.IOException
query
- The query, an English word, the same word delievered by
"Word"-inputfield.tokenSet
- The set of the tokens found in all the documents.stemSet
- The set of the stems found in all the documents.topStemsPosition
- The positions of the stems in stem-set, these stems have top
values in "stem-stem-vector" based on the algorithm
"Association Clustering".
java.io.IOException
Correlation_MC(ArrayList, int, ArrayList, int)
,
distanceWords_MC(Hashtable, Hashtable)
,
distanceTokens_MC(int[], int[])
,
getStemStemVector_AC(Hashtable, String)
,
Tools.queryExpansionResult(String, ArrayList, ArrayList, float[], String)
private static float Correlation_MC(java.util.ArrayList tokenListu, int sizeu, java.util.ArrayList tokenListv, int sizev) throws java.io.IOException
tokenListu
- A list of words for the u-th stem in "stem-stem-matrix".sizeu
- The size of the list above.tokenListv
- A list of words for the u-th stem in "stem-stem-matrix".sizev
- The size of the list above.
java.io.IOException
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[])
,
distanceTokens_MC(int[], int[])
,
distanceWords_MC(Hashtable, Hashtable)
,
org.apache.lucene.index.TermPositions
private static float distanceWords_MC(java.util.Hashtable docPosu, java.util.Hashtable docPosv)
docPosu
- The hashtable saving all words and their position for the u-th
stem in "stem-stem-matrix".docPosv
- The hashtable saving all words and their position for the v-th
stem in "stem-stem-matrix".
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[])
,
distanceTokens_MC(int[], int[])
,
distanceWords_MC(Hashtable, Hashtable)
private static float distanceTokens_MC(int[] posListu, int[] posListv)
posListu
- The Positions of all the tokens belonging to one word in a
document, this word is one of the words sharing the same u-th
stem in "stem-stem-matrix".posListv
- The Positions of all the tokens belonging to one word in a
document, this word is one of the words sharing the same v-th
stem in "stem-stem-matrix".
getTopStemStemVector_MC(String, ArrayList, ArrayList, int[])
,
distanceTokens_MC(int[], int[])
,
distanceWords_MC(Hashtable, Hashtable)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |