Rosetta – Text processing tools and wrappers (e.g. Vowpal Wabbit)
BLLIP Parser – Python bindings for the BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
PyNLPl – Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for FoLiA, but also ARPA language models, Moses phrasetables, GIZA++ alignments.
python-ucto – Python binding to ucto (a unicode-aware rule-based tokenizer for various languages)
python-frog – Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
python-zpar – Python bindings for ZPar, a statistical part-of-speech-tagger, constiuency parser, and dependency parser for English.
colibri-core – Python binding to C++ library for extracting and working with with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
spaCy – Industrial strength NLP with Python and Cython.
auto_ml – Automated machine learning pipelines for analytics and production. Handles some standard feature engineering, feature selection, model selection, model tuning, ensembling, and advanced scoring, in addition to logging output for analysts trying to understand their datasets.
machine learning – automated build consisting of a web-interface, and set of programmatic-interface API, for support vector machines. Corresponding dataset(s) are stored into a SQL database, then generated model(s) used for prediction(s), are stored into a NoSQL datastore.
XGBoost – Python bindings for eXtreme Gradient Boosting (Tree) Library
SimpleAI Python implementation of many of the artificial intelligence algorithms described on the book “Artificial Intelligence, a Modern Approach”. It focuses on providing an easy to use, well documented and tested library.
astroML – Machine Learning and Data Mining for Astronomy.
graphlab-create – A library with various machine learning models (regression, clustering, recommender systems, graph analytics, etc.) implemented on top of a disk-backed DataFrame.
Caffe – A deep learning framework developed with cleanliness, readability, and speed in mind.
breze – Theano based library for deep and recurrent neural networks
pyhsmm – library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
mrjob – A library to let Python program run on Hadoop.
SKLL – A wrapper around scikit-learn that makes it simpler to conduct experiments.
Spearmint – Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012.
skflow – Simplified interface for TensorFlow, mimicking Scikit Learn.
TPOT – Tool that automatically creates and optimizes machine learning pipelines using genetic programming. Consider it your personal data science assistant, automating a tedious part of machine learning.
pgmpy A python library for working with Probabilistic Graphical Models.
DIGITS – The Deep Learning GPU Training System (DIGITS) is a web application for training deep learning models.
Orange – Open source data visualization and data analysis for novices and experts.
milk – Machine learning toolkit focused on supervised classification.
TFLearn – Deep learning library featuring a higher-level API for TensorFlow.
REP – an IPython-based environment for conducting data-driven research in a consistent and reproducible way. REP is not trying to substitute scikit-learn, but extends it and provides better user experience.
rgf_python – Python bindings for Regularized Greedy Forest (Tree) Library.
gym – OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
skbayes – Python package for Bayesian Machine Learning with scikit-learn API
Data Analysis / Data Visualization
SciPy – A Python-based ecosystem of open-source software for mathematics, science, and engineering.
NumPy – A fundamental package for scientific computing with Python.
Numba – Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
NetworkX – A high-productivity software for complex networks.
igraph – binding to igraph library – General purpose graph library
Pandas – A library providing high-performance, easy-to-use data structures and data analysis tools.
Open Mining – Business Intelligence (BI) in Python (Pandas web interface)
group-lasso – Some experiments with the coordinate descent algorithm used in the (Sparse) Group Lasso model
jProcessing – Kanji / Hiragana / Katakana to Romaji Converter. Edict Dictionary & parallel sentences Search. Sentence Similarity between two JP Sentences. Sentiment Analysis of Japanese Text. Run Cabocha(ISO–8859-1 configured) in Python.
You can leave a comment or email us at [email protected]
If you want to contribute, please email us.
Topics can be: