Resources / stat_ml

 
 

Resources

  • Banjo
    Banjo is a software application and framework for structure learning of static and dynamic Bayesian networks.
    »
    « 2.0.1: index
     
  • Cluto
    CLUTO is a family of computationally efficient and high-quality data clustering and cluster analysis programs and libraries, that are well suited for low- and high-dimensional data sets.
  • GibbsLDA++
    GibbsLDA++ is a C/C++ implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling technique for parameter estimation and inference. It is very fast and is designed to analyze hidden/latent topic structures of large-scale datasets including large collections of text/Web documents. Note: This one does not run on ella.
  • LibSVM
    LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification
  • Lucene
    Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
    »
    « 3.0.1: index | 2.4.0: index | 2.3.2: index
     
  • MALLET
    MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
    »
    « 0.4: index
     
  • Maximum Entropy Toolkit
    The Maximum Entropy Toolkit provides a set of tools and library for constructing maximum entropy (maxent) model in either Python or C++.
  • PyLucene
    Python extension for accessing Java Lucene
  • PySVMLight
    A Python binding to the SVM-Light support vector machine library by Thorsten Joachims.
  • Reverend
    Reverend is a simple Bayesian classifier. It is designed to be easy to adapt and extend for your application.
  • SNoW
    The SNoW (Sparse Network of Winnows) learning architecture is a multi-class classifier that is specifically tailored for large scale learning tasks and fpr domains in which the potential number of features taking part in decisions is very large, but may be unknown a priori. It learns a sparse network of linear functions in which the targets concepts (class labels) are represented as linear functions over a common feature space.
  • SVDLIBC
    SVDLIBC is a C library based on the SVDPACKC library. SVDLIBC offers a cleaned-up version of the code with a sane library interface and a front-end executable that performs matrix file type conversions, along with computing singular value decompositions. Currently the only SVDPACKC algorithm implemented in SVDLIBC is las2, because it seems to be consistently the fastest. This algorithm has the drawback that the low order singular values may be relatively imprecise, but that is not a problem for most users who only want the higher-order values or who can tolerate some imprecision.
  • Semantic Vectors
    Semantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package was created as part of a project by the University of Pittsburgh Office of Technology Management, to explore the potential for automatically matching related concepts in them technology management domain, e.g., mapping new technologies to potentatially interested licensors.
    »
    « 1.10: index
     
  • TiMBL
    Tilburg Memory Based Learner
    »
    « 6.2.1: index | 6.1.2: index
     
  • TinySVM
    TinySVM is an implementation of Support Vector Machines (SVMs) for the problem of pattern recognition. Support Vector Machines is a new generation learning algorithms based on recent advances in statistical learning theory, and applied to large number of real-world applications, such as text categorization, hand-written character recognition.
  • Weka
    Weka is a collection of machine learning algorithms for data mining tasks.
    »
    « 3.6.1: index
     
  • crf++
    CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data.
  • gensim
    Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.