机器学习资源汇总

描述

以下是根据不同语言类型和应用领域收集的各类工具库,持续更新中。

C

通用机器学习

  • Recommender - 一个产品推荐的C语言库,利用了协同过滤.

计算机视觉

  • CCV - C-based/Cached/Core Computer Vision Library ,是一个现代化的计算机视觉库。

  • VLFeat - VLFeat 是开源的 computer vision algorithms库, 有 Matlab toolbox。

C++

计算机视觉

  • OpenCV - 最常用的视觉库。有 C++, C, Python 以及 Java 接口),支持Windows, Linux, Android and Mac OS。

  • DLib - DLib 有 C++ 和 Python 脸部识别和物体检测接口 。

  • EBLearn - Eblearn 是一个面向对象的 C++ 库,实现了各种机器学习模型。

  • VIGRA - VIGRA 是一个跨平台的机器视觉和机器学习库,可以处理任意维度的数据,有Python接口。

通用机器学习

  • MLPack - 可拓展的 C++ 机器学习库。

  • DLib - 设计为方便嵌入到其他系统中。

  • encog-cpp

  • shark

  • Vowpal Wabbit (VW) - A fast out-of-core learning system.

  • sofia-ml - fast incremental 算法套件.

  • Shogun - The Shogun Machine Learning Toolbox

  • Caffe - deep learning 框架,结构清晰,可读性好,速度快。

  • CXXNET - 精简的框架,核心代码不到 1000 行。

  • XGBoost - 为并行计算优化过的 gradient boosting library.

  • CUDA - This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]

  • Stan - A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling

  • BanditLib - A simple Multi-armed Bandit library.

  • Timbl - 实现了多个基于内存的算法,其中 IB1-IG (KNN分类算法)和 IGTree(决策树)在NLP中广泛应用.

自然语言处理

  • MIT Information Extraction Toolkit - C, C++, and Python 工具,用来命名实体识别和关系抽取。

  • CRF++ - 条件随机场的开源实现,可以用作分词,词性标注等。

  • CRFsuite - CRFsuite 是条件随机场的实现,可以用作词性标注等。

  • BLLIP Parser - 即Charniak-Johnson parser。

  • colibri-core - 一组C++ library, 命令行工具以及Python binding,高效实现了n-grams 和 skipgrams。

  • ucto - 多语言tokenizer,支持面向Unicode的正则表达式,支持 FoLiA 格式.

  • libfolia - C++ library for the FoLiA format

  • MeTA - MeTA : ModErn Text Analysis 从巨量文本中挖掘数据。

机器翻译

  • EGYPT (GIZA++)

  • Moses

  • pharaoh

  • SRILM

  • NiuTrans

  • jane

  • SAMT

语音识别

  • Kaldi - Kaldi是一个C ++工具,以Apache许可证V2.0发布。Kaldi适用于语音识别的研究。

Sequence Analysis

  • ToPS - This is an objected-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet.

Java

自然语言处理

  • Cortical.io - Retina: 此API执行复杂的NLP操作(消歧义,分类,流文本过滤等),快速、直观如同大脑一般。

  • CoreNLP - Stanford CoreNLP 提供了一组自然语言分析工具,可采取raw英语文本输入并给出单词的基本形式。

  • Stanford Parser - parser是一个程序,能分析出句子的语法结构。

  • Stanford POS Tagger - 词性标注器

  • Stanford Name Entity Recognizer - 斯坦福大学NER是一个Java实现的命名实体识别器。

  • Stanford Word Segmenter - 原始文本的token化是许多NLP任务的标准预处理步骤。

  • Tregex, Tsurgeon and Semgrex - Tregex是匹配树模式的工具,基于树的关系和正则表达式的节点匹配( short for "tree regular expressions")。

  • Stanford Phrasal: A Phrase-Based Translation System

  • Stanford English Tokenizer - Stanford Phrasal 是最先进的统计的基于短语的机器翻译系统,用Java编写。

  • Stanford Tokens Regex - A tokenizer divides text into a sequence of tokens, which roughly correspond to "words"

  • Stanford Temporal Tagger - SUTime 是识别和规范时间表达式的库。

  • Stanford SPIED - 从种子集开始,迭代使用模式,从未标注文本中习得实体。

  • Stanford Topic Modeling Toolbox - 主题建模工具,社会学家用它分析的数据集。

  • Twitter Text Java - Java实现的Twitter文本处理库。

  • MALLET - 基于Java的软件包,包括统计自然语言处理,文档分类,聚类,主题建模,信息提取,以及其它机器学习应用。

  • OpenNLP - 一个基于机器学习的自然语言处理的工具包。

  • LingPipe - 计算语言学工具包。

  • ClearTK - ClearTK提供了开发统计自然语言处理组件的框架,其建立在Apache UIMA之上。

  • Apache cTAKES - Apache 临床文本分析及知识提取系统(cTAKES)是从电子病历、临床文本中进行信息抽取的一个开源系统。

通用机器学习

  • aerosolve - Airbnb 从头开始设计的机器学习库,易用性好。

  • Datumbox - 机器学习和统计应用程序的快速开发框架。

  • ELKI - 数据挖掘工具. (非监督学习: 聚类, 离群点检测等.)

  • Encog - 先进的神经网络和机器学习框架。 Encog中包含用于创建各种网络,以及规范和处理数据的神经网络。 Encog训练采用多线程弹性的传播方式。 Encog还可以利用GPU的进一步加快处理时间。有基于GUI的工作台。

  • H2O - 机器学习引擎,支持Hadoop, Spark等分布式系统和个人电脑,可以通过R, Python, Scala, REST/JSON调用API。

  • htm.java - 通用机器学习库,使用 Numenta’s Cortical Learning Algorithm

  • java-deeplearning - 分布式深度学习平台 for Java, Clojure,Scala

  • JAVA-ML - Java通用机器学习库,所有算法统一接口。

  • JSAT - 具有很多分类,回归,聚类等机器学习算法。

  • Mahout - 分布式机器学习工具。

  • Meka - 一个开源实现的多标签分类和评估方法。基于weka扩展。

  • MLlib in Apache Spark - Spark分布式机器学习库

  • Neuroph - 轻量级Java神经网络框架

  • ORYX - Lambda Architecture Framework,使用Apache Spark和Apache Kafka实现实时大规模机器学习。

  • RankLib - 排序算法学习库。

  • Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.

  • SmileMiner - Statistical Machine Intelligence & Learning Engine

  • SystemML - 灵活的,可扩展的机器学习语言。

  • WalnutiQ - 面向对象的人脑模型

  • Weka - WEKA是机器学习算法用于数据挖掘任务的算法集合。

语音识别

  • CMU Sphinx - 开源工具包,用于语音识别,完全基于Java的语音识别库。

数据分析、可视化

  • Hadoop - Hadoop/HDFS

  • Spark - Spark 快速通用的大规模数据处理引擎。

  • Impala - 实时Hadoop查询。

  • DataMelt - 数学软件,包含数值计算,统计,符号计算,数据分析和数据可视化。

  • Dr. Michael Thomas Flanagan's Java Scientific Library

Deep Learning

  • Deeplearning4j - 可扩展的产业化的深度学习,利用并行的GPU。

Python

计算机视觉

  • Scikit-Image - Python中的图像处理算法的集合。

  • SimpleCV - 一个开源的计算机视觉框架,允许访问几个高性能计算机视觉库,如OpenCV。可以运行在Mac,Windows和Ubuntu Linux操作系统上。

  • Vigranumpy - 计算机视觉库VIGRA C++ 的Python绑定。

自然语言处理

  • NLTK - 构建与人类语言数据相关工作的Python程序的领先平台。

  • Pattern - 基于Python的Web挖掘模块。它有自然语言处理,机器学习等工具。

  • Quepy - 将自然语言问题转换成数据库查询语言。

  • TextBlob - 为普通的自然语言处理(NLP)任务提供一致的API。构建于NLTK和Pattern上,并很好地与两者交互。

  • YAlign - 句子对齐工具,从对照语料中抽取并行句子。

  • jieba - 中文分词工具

  • SnowNLP - 中文文本处理库。

  • loso - 中文分词工具

  • genius - 基于条件随机场的中文分词工具

  • KoNLPy - 韩语自然语言处理

  • nut - 自然语言理解工具

  • Rosetta - Text processing tools and wrappers (e.g. Vowpal Wabbit)

  • BLLIP Parser - BLLIP Natural Language Parser 的Python绑定(即 Charniak-Johnson parser)

  • PyNLPl - Python的自然语言处理库。还包含用于解析常见NLP格式的工具,如FoLiA, 以及 ARPA language models, Moses phrasetables, GIZA++ 对齐等。

  • python-ucto - ucto(面向unicode的基于规则的tokenizer)的Python 绑定

  • python-frog - Frog的Python 绑定。荷兰语的词性标注,lemmatisation,依存分析,NER。

  • python-zpar - ZPar的Python 绑定(英文的基于统计的词性标注, constiuency解析器和依赖解析器)

  • colibri-core - 高效提取 n-grams 和 skipgrams的C++库的Python 绑定

  • spaCy - 工业级 NLP with Python and Cython.

  • PyStanfordDependencies - 将 Penn Treebank tree转换到Stanford 依存树的Python接口.

通用机器学习

  • machine learning - 构建和 web-interface, programmatic-interface 兼容的支持向量机API. 相应的数据集存储到一个SQL数据库,然后生成用于预测的模型,存储到一个NoSQL的数据库。

  • XGBoost - eXtreme Gradient Boosting (Tree)库的Python 绑定

  • Featureforge一组工具,用于创建和测试机器学习的特征,具有与scikit-learn兼容的API

  • scikit-learn - 基于SciPy的机器学习的Python模块。

  • metric-learn - metric learning的Python模块

  • SimpleAI -实现了“人工智能现代方法”一书中描述的许多人工智能算法。它着重于提供一个易于使用的,文档良好的和经过测试的库。

  • astroML - 天文学机器学习和数据挖掘库。

  • graphlab-create - 基于disk-backed DataFrame的库,实现了各种机器学习模型(回归,聚类,推荐系统,图形分析等)。

  • BigML - 与外部服务器交流的库。

  • pattern - Web数据挖掘模块.

  • NuPIC - Numenta智能计算平台.

  • Pylearn2 - 基于 Theano的机器学习库。

  • keras - 基于 Theano的神经网络库

  • hebel - GPU加速的Python深度学习库。

  • Chainer - 灵活的神经网络架构

  • gensim - 易用的主题建模工具

  • topik - 主题建模工具包

  • PyBrain - Another Python Machine Learning Library.

  • Crab - 灵活的,快速的推荐引擎

  • python-recsys - 实现一个推荐系统的Python工具

  • Restricted Boltzmann Machines -受限玻尔兹曼机

  • CoverTree - Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

  • nilearn - NeuroImaging机器学习库

  • Shogun - Shogun Machine Learning Toolbox

  • Pyevolve - 遗传算法框架

  • Caffe - deep learning 框架,结构清晰,可读性好,速度快。

  • breze - 基于Theano 的深度神经网络

  • pyhsmm - 贝叶斯隐马尔可夫模型近似无监督的推理和显式时长隐半马尔可夫模型,专注于贝叶斯非参数扩展,the HDP-HMM and HDP-HSMM,大多是弱极限近似。

  • mrjob - 使得 Python 程序可以跑在 Hadoop上.

  • SKLL - 简化的scikit-learn接口,易于做实验

  • neurolab - https://github.com/zueve/neurolab

  • Spearmint - 贝叶斯算法的优化。方法见于论文: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012.

  • Pebl - 贝叶斯学习的Python环境

  • Theano - 优化GPU元编程代码,生成面向矩阵的优化的数学编译器

  • TensorFlow - 用数据流图进行数值计算的开源软件库

  • yahmm - 隐马尔可夫模型,用Cython实现

  • python-timbl - 包装了完整的TiMBL C ++编程接口. Timbl是一个精心制作的k最近邻机器学习工具包。

  • deap - 进化算法框架

  • pydeep - Python 深度学习

  • mlxtend - 对数据科学和机器学习任务非常有用的工具库。

  • neon - 高性能 深度学习框架

  • Optunity - 致力于自动化超参数优化过程,使用一个简单的,轻量级的API,以方便直接替换网格搜索。

  • Annoy - Approximate nearest neighbours implementation

  • skflow - TensorFlow的简化界面, 类似 Scikit Learn.

  • TPOT - 自动创建并利用genetic programming优化机器学习的管道。将它看作您的数据科学助理,自动化机器学习中大部分的枯燥工作。

数据分析、可视化

  • SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.

  • NumPy - A fundamental package for scientific computing with Python.

  • Numba - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.

  • NetworkX - A high-productivity software for complex networks.

  • Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.

  • Open Mining - Business Intelligence (BI) in Python (Pandas web interface)

  • PyMC - Markov Chain Monte Carlo sampling toolkit.

  • zipline - A Pythonic algorithmic trading library.

  • PyDy - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.

  • SymPy - A Python library for symbolic mathematics.

  • statsmodels - Statistical modeling and econometrics in Python.

  • astropy - A community Python library for Astronomy.

  • matplotlib - A Python 2D plotting library.

  • bokeh - Interactive Web Plotting for Python.

  • plotly - Collaborative web plotting for Python and matplotlib.

  • vincent - A Python to Vega translator.

  • d3py - A plottling library for Python, based on D3.js.

  • ggplot - Same API as ggplot2 for R.

  • ggfortify - Unified interface to ggplot2 popular R packages.

  • Kartograph.py - Rendering beautiful SVG maps in Python.

  • pygal - A Python SVG Charts Creator.

  • PyQtGraph - A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.

  • pycascading

  • Petrel - Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.

  • Blaze - NumPy and Pandas interface to Big Data.

  • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.

  • windML - A Python Framework for Wind Energy Analysis and Prediction

  • vispy - GPU-based high-performance interactive OpenGL 2D/3D data visualization library

  • cerebro2 A web-based visualization and debugging platform for NuPIC.

  • NuPIC Studio An all-in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool!

  • SparklingPandas Pandas on PySpark (POPS)

  • Seaborn - A python visualization library based on matplotlib

  • bqplot - An API for plotting in Jupyter (IPython)

Common Lisp

通用机器学习

  • mgl - Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes

  • mgl-gpr - Evolutionary algorithms

  • cl-libsvm - Wrapper for the libsvm support vector machine library

Clojure

自然语言处理

  • Clojure-openNLP - Natural Language Processing in Clojure (opennlp)

  • Infections-clj - Rails-like inflection library for Clojure and ClojureScript

通用机器学习

  • Touchstone - Clojure A/B testing library

  • Clojush - he Push programming language and the PushGP genetic programming system implemented in Clojure

  • Infer - Inference and machine learning in clojure

  • Clj-ML - A machine learning library for Clojure built on top of Weka and friends

  • Encog - Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets)

  • Fungp - A genetic programming library for Clojure

  • Statistiker - Basic Machine Learning algorithms in Clojure.

  • clortex - General Machine Learning library using Numenta’s Cortical Learning Algorithm

  • comportex - Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm

数据分析、可视化

  • Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.

  • PigPen - Map-Reduce for Clojure.

  • Envision - Clojure Data Visualisation library, based on Statistiker and D3

Matlab

计算机视觉

  • Contourlets - MATLAB source code that implements the contourlet transform and its utility functions.

  • Shearlets - MATLAB code for shearlet transform

  • Curvelets - The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles.

  • Bandlets - MATLAB code for bandlet transform

  • mexopencv - Collection and a development kit of MATLAB mex functions for OpenCV library

自然语言处理

  • NLP - An NLP library for Matlab

通用机器学习

  • t-Distributed Stochastic Neighbor Embedding - t-SNE是一个获奖的技术,可以降维,尤其适合高维数据可视化

  • Spider - The spider有望成为matlab里机器学习中的完整的面向对象环境。

  • LibSVM - 著名的支持向量机库。

  • LibLinear - A Library for Large Linear Classification

  • Caffe - deep learning 框架,结构清晰,可读性好,速度快。

  • Pattern Recognition Toolbox - Matlab机器学习中一个完整的面向对象的环境。

  • Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly with MATLAB.致力于自动化超参数优化的,一个简单的,轻量级的API库,方便直接替换网格搜索。 Optunity是用Python编写的,但与MATLAB的无缝连接。

数据分析、可视化

  • matlab_gbl - MatlabBGL is a Matlab package for working with graphs.

  • gamic - Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL's mex functions.

.NET

计算机视觉

  • OpenCVDotNet - A wrapper for the OpenCV project to be used with .NET applications.

  • Emgu CV - Cross platform wrapper of OpenCV which can be compiled in Mono to e run on Windows, Linus, Mac OS X, iOS, and Android.

  • AForge.NET - Open source C# framework for developers and researchers in the fields of Computer Vision and Artificial Intelligence. Development has now shifted to GitHub.

  • Accord.NET - Together with AForge.NET, this library can provide image processing and computer vision algorithms to Windows, Windows RT and Windows Phone. Some components are also available for Java and Android.

自然语言处理

  • Stanford.NLP for .NET - A full port of Stanford NLP packages to .NET and also available precompiled as a NuGet package.

通用机器学习

  • Accord-Framework - 一个完整的框架,可以用于机器学习,计算机视觉,computer audition, 信号处理,统计应用等。.

  • Accord.MachineLearning - Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework.

  • DiffSharp - An automatic differentiation (AD) library providing exact and efficient derivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products) for machine learning and optimization applications. Operations can be nested to any level, meaning that you can compute exact higher-order derivatives and differentiate functions that are internally making use of differentiation, for applications such as hyperparameter optimization.

  • Vulpes - Deep belief and deep learning implementation written in F# and leverages CUDA GPU execution with Alea.cuBase.

  • Encog - An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.

  • Neural Network Designer - DBMS management system and designer for neural networks. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feed back. The chat bots can even scrape the internet for information to return in their output as well as to use for learning.

数据分析、可视化

  • numl - numl is a machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering.

  • Math.NET Numerics - Numerical foundation of the Math.NET project, aiming to provide methods and algorithms for numerical computations in science, engineering and every day use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac; Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCL Portable Profiles 47 and 344; Android/iOS with Xamarin.

  • Sho - Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.

Ruby

自然语言处理

  • Treat - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby

  • Ruby Linguistics - Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independent front end, a module for mapping language codes into language names, and a module which contains various English-language utilities.

  • Stemmer - Expose libstemmer_c to Ruby

  • Ruby Wordnet - This library is a Ruby interface to WordNet

  • Raspel - raspell is an interface binding for ruby

  • UEA Stemmer - Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing

  • Twitter-text-rb - A library that does auto linking and extraction of usernames, lists and hashtags in tweets

通用机器学习

  • Ruby Machine Learning - Some Machine Learning algorithms, implemented in Ruby

  • Machine Learning Ruby

  • jRuby Mahout - JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby.

  • CardMagic-Classifier - A general classifier module to allow Bayesian and other types of classifications.

数据分析、可视化

  • rsruby - Ruby - R bridge

  • data-visualization-ruby - Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby

  • ruby-plot - gnuplot wrapper for ruby, especially for plotting roc curves into svg files

  • plot-rb - A plotting library in Ruby built on top of Vega and D3.

  • scruffy - A beautiful graphing toolkit for Ruby

  • SciRuby

  • Glean - A data management tool for humans

  • Bioruby

  • Arel

Misc

  • Big Data For Chimps

  • Listof - Community based data collection, packed in gem. Get list of pretty much anything (stop words, countries, non words) in txt, json or hash. Demo/Search for a list

R

通用机器学习

  • ahaz - ahaz: Regularization for semiparametric additive hazards regression

  • arules - arules: Mining Association Rules and Frequent Itemsets

  • bigrf - bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets

  • bigRR - bigRR: Generalized Ridge Regression (with special advantage for p >> n cases)

  • bmrm - bmrm: Bundle Methods for Regularized Risk Minimization Package

  • Boruta - Boruta: A wrapper algorithm for all-relevant feature selection

  • bst - bst: Gradient Boosting

  • C50 - C50: C5.0 Decision Trees and Rule-Based Models

  • caret - Classification and Regression Training: Unified interface to ~150 ML algorithms in R.

  • caretEnsemble - caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models.

  • Clever Algorithms For Machine Learning

  • CORElearn - CORElearn: Classification, regression, feature evaluation and ordinal evaluation

  • CoxBoost - CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks

  • Cubist - Cubist: Rule- and Instance-Based Regression Modeling

  • e1071 - e1071: Misc Functions of the Department of Statistics (e1071), TU Wien

  • earth - earth: Multivariate Adaptive Regression Spline Models

  • elasticnet - elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA

  • ElemStatLearn - ElemStatLearn: Data sets, functions and examples from the book: "The Elements of Statistical Learning, Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman

  • evtree - evtree: Evolutionary Learning of Globally Optimal Trees

  • fpc - fpc: Flexible procedures for clustering

  • frbs - frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks

  • GAMBoost - GAMBoost: Generalized linear and additive models by likelihood based boosting

  • gamboostLSS - gamboostLSS: Boosting Methods for GAMLSS

  • gbm - gbm: Generalized Boosted Regression Models

  • glmnet - glmnet: Lasso and elastic-net regularized generalized linear models

  • glmpath - glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model

  • GMMBoost - GMMBoost: Likelihood-based Boosting for Generalized mixed models

  • grplasso - grplasso: Fitting user specified models with Group Lasso penalty

  • grpreg - grpreg: Regularization paths for regression models with grouped covariates

  • h2o - A framework for fast, parallel, and distributed machine learning algorithms at scale -- Deeplearning, Random forests, GBM, KMeans, PCA, GLM

  • hda - hda: Heteroscedastic Discriminant Analysis

  • Introduction to Statistical Learning

  • ipred - ipred: Improved Predictors

  • kernlab - kernlab: Kernel-based Machine Learning Lab

  • klaR - klaR: Classification and visualization

  • lars - lars: Least Angle Regression, Lasso and Forward Stagewise

  • lasso2 - lasso2: L1 constrained estimation aka ‘lasso’

  • LiblineaR - LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library

  • LogicReg - LogicReg: Logic Regression

  • Machine Learning For Hackers

  • maptree - maptree: Mapping, pruning, and graphing tree models

  • mboost - mboost: Model-Based Boosting

  • medley - medley: Blending regression models, using a greedy stepwise approach

  • mlr - mlr: Machine Learning in R

  • mvpart - mvpart: Multivariate partitioning

  • ncvreg - ncvreg: Regularization paths for SCAD- and MCP-penalized regression models

  • nnet - nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models

  • oblique.tree - oblique.tree: Oblique Trees for Classification Data

  • pamr - pamr: Pam: prediction analysis for microarrays

  • party - party: A Laboratory for Recursive Partytioning

  • partykit - partykit: A Toolkit for Recursive Partytioning

  • penalized - penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model

  • penalizedLDA - penalizedLDA: Penalized classification using Fisher's linear discriminant

  • penalizedSVM - penalizedSVM: Feature Selection SVM using penalty functions

  • quantregForest - quantregForest: Quantile Regression Forests

  • randomForest - randomForest: Breiman and Cutler's random forests for classification and regression

  • randomForestSRC - randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC)

  • rattle - rattle: Graphical user interface for data mining in R

  • rda - rda: Shrunken Centroids Regularized Discriminant Analysis

  • rdetools - rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces

  • REEMtree - REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data

  • relaxo - relaxo: Relaxed Lasso

  • rgenoud - rgenoud: R version of GENetic Optimization Using Derivatives

  • rgp - rgp: R genetic programming framework

  • Rmalschains - Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R

  • rminer - rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression

  • ROCR - ROCR: Visualizing the performance of scoring classifiers

  • RoughSets - RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories

  • rpart - rpart: Recursive Partitioning and Regression Trees

  • RPMM - RPMM: Recursively Partitioned Mixture Model

  • RSNNS - RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)

  • RWeka - RWeka: R/Weka interface

  • RXshrink - RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression

  • sda - sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection

  • SDDA - SDDA: Stepwise Diagonal Discriminant Analysis

  • SuperLearner and subsemble - Multi-algorithm ensemble learning packages.

  • svmpath - svmpath: svmpath: the SVM Path algorithm

  • tgp - tgp: Bayesian treed Gaussian process models

  • tree - tree: Classification and regression trees

  • varSelRF - varSelRF: Variable selection using random forests

  • XGBoost.R - R binding for eXtreme Gradient Boosting (Tree) Library

  • Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.

数据分析、可视化

  • ggplot2 - A data visualization package based on the grammar of graphics.

Scala

自然语言处理

  • ScalaNLP - ScalaNLP is a suite of machine learning and numerical computing libraries.

  • Breeze - Breeze is a numerical processing library for Scala.

  • Chalk - Chalk is a natural language processing library.

  • FACTORIE - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.

数据分析、可视化

  • MLlib in Apache Spark - Distributed machine learning library in Spark

  • Scalding - A Scala API for Cascading

  • Summing Bird - Streaming MapReduce with Scalding and Storm

  • Algebird - Abstract Algebra for Scala

  • xerial - Data management utilities for Scala

  • simmer - Reduce your data. A unix filter for algebird-powered aggregation.

  • PredictionIO - PredictionIO, a machine learning server for software developers and data engineers.

  • BIDMat - CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.

  • Wolfe Declarative Machine Learning

通用机器学习

  • Conjecture - Scalable Machine Learning in Scalding

  • brushfire - Distributed decision tree ensemble learning in Scala

  • ganitha - scalding powered machine learning

  • adam - A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.

  • bioscala - Bioinformatics for the Scala programming language

  • BIDMach - CPU and GPU-accelerated Machine Learning Library.

  • Figaro - a Scala library for constructing probabilistic models.

  • H2O Sparkling Water - H2O and Spark interoperability.

打开APP阅读更多精彩内容
声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉

全部0条评论

快来发表一下你的评论吧 !

×
20
完善资料,
赚取积分