Resources / stat_ml
Resources
-
Banjo
-
Cluto
-
GibbsLDA++GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is designed to analyze hidden/latent topic structures of large-scale datasets including large collections of text/Web documents. Note: This one does not run on ella.
-
LibSVM
-
Lucene
-
MALLET
-
Maximum Entropy Toolkit
-
PyLucene
-
PySVMLight
-
Reverend
-
SNoWThe SNoW (Sparse Network of Winnows) learning architecture is a multi-class classifier that is specifically tailored for large scale learning tasks and fpr domains in which the potential number of features taking part in decisions is very large, but may be unknown a priori. It learns a sparse network of linear functions in which the targets concepts (class labels) are represented as linear functions over a common feature space.
-
SVDLIBCSVDLIBC is a C library based on the SVDPACKC library. SVDLIBC offers a cleaned-up version of the code with a sane library interface and a front-end executable that performs matrix file type conversions, along with computing singular value decompositions. Currently the only SVDPACKC algorithm implemented in SVDLIBC is las2, because it seems to be consistently the fastest. This algorithm has the drawback that the low order singular values may be relatively imprecise, but that is not a problem for most users who only want the higher-order values or who can tolerate some imprecision.
-
Semantic VectorsSemantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package was created as part of a project by the University of Pittsburgh Office of Technology Management, to explore the potential for automatically matching related concepts in them technology management domain, e.g., mapping new technologies to potentatially interested licensors.
-
TiMBL
-
TinySVMTinySVM is an implementation of Support Vector Machines (SVMs) for the problem of pattern recognition. Support Vector Machines is a new generation learning algorithms based on recent advances in statistical learning theory, and applied to large number of real-world applications, such as text categorization, hand-written character recognition.
-
Weka
-
crf++
-
gensim
