Machine Learning | Learn for Master - Part 3
• ### Best articles to learn deep learning

A Step by Step Backpropagation Example

Background

Backpropagation is a common method for training a neural network. There is no shortage of papers online that attempt to explain how backpropagation works, but few that include an example with actual numbers. This post is my attempt to explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they understand backpropagation correctly.

If this kind of thing interests you, you should sign up for my newsletter where I post about AI-related projects that I’m working on.

• ### DUMMY VARIABLE TRAP IN REGRESSION MODELS

DUMMY VARIABLE TRAP IN REGRESSION MODELS

Using categorical data in Multiple Regression Models is a powerful method to include non-numeric data types into a regression model. Categorical data refers to data values which represent categories – data values with a fixed and unordered number of values, for instance gender (male/female) or season (summer/winder/spring/fall). In a regression model, these values can be represented by dummy variables – variables containing values such as 1 or 0 representing the presence or absence of the categorical value.

By including dummy variable in a regression model however, one should be careful of the Dummy Variable Trap.

• ### 四篇应该仔细读的关于文本分析的tutorial类文章

四篇应该仔细读的关于文本分析的tutorial类文章

这四篇文章经常被提及到，现原文出自：http://blog.sciencenet.cn/blog-611051-535693.html
对文本分析进行详细深入介绍的肯定不只这四篇，这是本人目前读过的，其他比较好的tutorial类文章欢迎大家推荐补充。

第一篇：详细介绍了离散数据的参数估计方法，而不是像大多数教材中使用的Gaussian分布作为例子进行介绍。个人觉得最值得一读的地方是它使用Gibbs采样对LDA进行推断，其中相关公式的推导非常详细，是许多人了解LDA及其他相关topic model的必读文献。
@TECHREPORT{Hei09,
author = {Heinrich, Gregor},
title = {Parameter Estimation for Text Analysis},
institution = {vsonix GmbH and University of Leipzig},
year = {2009},
type = {Technical Report Version 2.9},
abstract = {Presents parameter estimation methods common with discrete probability
distributions, which is of particular interest in text modeling.
Starting with maximum likelihood, a posteriori and Bayesian estimation,
central concepts like conjugate distributions and Bayesian networks
are reviewed. As an application, the model of latent Dirichlet allocation
(LDA) is explained in detail with a full derivation of an aaproximate
inference algorithm based on Gibbs sampling,

• ### Resources for article extraction from HTML pages

Here are some good resources to learn how to extract articles from html pages.

Research papers and Articles for article extraction from HTML pages

• ### Good blogs about LDA topic model

I have read some great articles about LDA. In particular, I like the posts about LDA gensim example. Gensim is popular library for text mining. It is written in Python and it is easy to use. Here are some good posts that are helpful to learn LDA.

If you are the author, and if you don’t want me to include your post here, please let me know, I will delete it.

Introduction to Latent Dirichlet Allocation

Introduction

Suppose you have the following set of sentences:

• I like to eat broccoli and bananas.
• ### Chi Square test for feature selection

Feature selection is an important problem in Machine learning. There are many feature selection methods available such as mutual information, information gain, and chi square test. In this post, I will use simple examples to describe how to conduct feature selection using chi square test. I will show that it is easy to use Spark or MapReduce to conduct chi square test based feature selection on large scale data set.

Problem Statement

Suppose there are N instances, and two classes: positive and negative.  Given a feature X, we can use Chi Square Test to evaluate its importance to distinguish the class.

• ### Good blogs to learn machine learning and data sciense

• Occam’s Razor by Avinash Kaushik, examining web analytics and Digital Marketing.
• OpenGardens, Data Science for Internet of Things (IoT), by Ajit Jaokar.
• O’reilly Radar O’Reilly Radar, a wide range of research topics and books.
• Observational Epidemiology A college professor and a statistical consultant offer their comments, observations and thoughts on applied statistics, higher education and epidemiology.
• Overcoming bias By Robin Hanson and Eliezer Yudkowsky. Present Statistical analysis in reflections on honesty, signaling, disagreement, forecasting and the far future.
• Probability &
• ### Parse libsvm data for spark MLlib

LibSVM data format is widely used in Machine Learning. Spark MLlib is a powerful tool to train large scale machine learning models.  If your data is well formatted in LibSVM, it is straightforward to use the loadLibSVMFile  method to transfer your data into an Rdd.

`val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")`

However, in certain cases, your data is not well formatted in LibSVM.  For example, you may have different models, and each model has its own labeled data. Suppose your data is stored into HDFS, and each line looks like this: (model_key, training_instance_in_livsvm_format).

In this case,

• ### Parameter Server 资料汇总

parameter server 介绍
作者：Superjom
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。
看看李沐的文章 《Parameter Server for Distributed Machine Learning》里面有包含他的框架的一些介绍。

概念：
参数服务器是个编程框架，用于方便分布式并行程序的编写，其中重点是对大规模参数的分布式存储和协同的支持。

工业界需要训练大型的机器学习模型，一些广泛使用的特定的模型在规模上的两个特点：
1. 参数很大，超过单个机器的容纳能力（比如大型Logistic Regression和神经网络）
2. 训练数据巨大，需要分布式并行提速（大数据）

这种需求下，当前类似MapReduce的框架并不能很好适合。

参数服务器就类似于MapReduce，是大规模机器学习在不断使用过程中，抽象出来的框架之一。重点支持的就是参数的分布式，毕竟巨大的模型其实就是巨大的参数。

Parameter Server(Mli)
—————————-
架构：
集群中的节点可以分为计算节点和参数服务节点两种。其中，计算节点负责对分配到自己本地的训练数据（块）计算学习，并更新对应的参数；参数服务节点采用分布式存储的方式，各自存储全局参数的一部分，并作为服务方接受计算节点的参数查询和更新请求。

简而言之吧，计算节点负责干活和更新参数，参数服务节点则负责存储参数。

冗余和恢复：
类似MapReduce，每个参数在参数服务器的集群中都在多个不同节点上备份（3个也是极好的），这样当出现节点失效时，冗余的参数依旧能够保证服务的有效性。当有新的节点插入时，把原先失效节点的参数从冗余参数那边复制过来，失效节点的接班人就加入队伍了。

并行计算：
并行计算这部分主要在计算节点上进行。 类似于MapReduce，分配任务时，会将数据拆分给每个worker节点。
参数服务器在开始学习前，也会把大规模的训练数据拆分到每个计算节点上。单个计算节点就对本地数据进行学习就可以了。学习完毕再把参数的更新梯度上传给对应的参数服务节点进行更新。

详细的流程：

1.
分发训练数据 -> 节点1
节点2
节点3

节点i

节点N

2.