Machine Learning | Learn for Master
  • Good Articles to learn how to implement a neural network 1

    This series of post will list some good articles about how to implement a neural network. Thanks for the authors for the excellent work. 
    If you are the author and you don’t want your articles listed here. Please email to learn4master, we will remove it from the site. 
     

    How to implement a neural network Part 1

     From: http://peterroelants.github.io/posts/neural_network_implementation_part01/

    This page is part of a 5 (+2) parts tutorial on how to implement a simple neural network model. You can find the links to the rest of the tutorial here:

     

    The tutorials are generated from Python 2 IPython Notebook files,

    [Read More...]
  • Good blogs to learn machine learning and data sciense

    • Occam’s Razor by Avinash Kaushik, examining web analytics and Digital Marketing.
    • OpenGardens, Data Science for Internet of Things (IoT), by Ajit Jaokar.
    • O’reilly Radar O’Reilly Radar, a wide range of research topics and books.
    • Observational Epidemiology A college professor and a statistical consultant offer their comments, observations and thoughts on applied statistics, higher education and epidemiology.
    • Overcoming bias By Robin Hanson and Eliezer Yudkowsky. Present Statistical analysis in reflections on honesty, signaling, disagreement, forecasting and the far future.
    • Probability &
    [Read More...]
  • Parameter Server 资料汇总

    此处输入图片的描述

    parameter server 介绍
    作者:Superjom
    链接:https://www.zhihu.com/question/26998075/answer/40577680
    来源:知乎
    著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
    看看李沐的文章 《Parameter Server for Distributed Machine Learning》里面有包含他的框架的一些介绍。
    后面有看到微软研究院 project Adam的论文,大体思路比较相似,但论文中细节比较丰富,也会互补的一些信息描述下。

    概念:
    参数服务器是个编程框架,用于方便分布式并行程序的编写,其中重点是对大规模参数的分布式存储和协同的支持。

    工业界需要训练大型的机器学习模型,一些广泛使用的特定的模型在规模上的两个特点:
    1. 参数很大,超过单个机器的容纳能力(比如大型Logistic Regression和神经网络)
    2. 训练数据巨大,需要分布式并行提速(大数据)

    这种需求下,当前类似MapReduce的框架并不能很好适合。
    因此需要自己实现分布式并行程序,其实在Hadoop出来之前,对于大规模数据的处理,都需要自己写分布式的程序(MPI)。 之后这方面的工作流程被Google的工程师总结和抽象成MapReduce框架,大一统了。

    参数服务器就类似于MapReduce,是大规模机器学习在不断使用过程中,抽象出来的框架之一。重点支持的就是参数的分布式,毕竟巨大的模型其实就是巨大的参数。

    Parameter Server(Mli)
    —————————-
    架构:
    集群中的节点可以分为计算节点和参数服务节点两种。其中,计算节点负责对分配到自己本地的训练数据(块)计算学习,并更新对应的参数;参数服务节点采用分布式存储的方式,各自存储全局参数的一部分,并作为服务方接受计算节点的参数查询和更新请求。

    简而言之吧,计算节点负责干活和更新参数,参数服务节点则负责存储参数。

    冗余和恢复:
    类似MapReduce,每个参数在参数服务器的集群中都在多个不同节点上备份(3个也是极好的),这样当出现节点失效时,冗余的参数依旧能够保证服务的有效性。当有新的节点插入时,把原先失效节点的参数从冗余参数那边复制过来,失效节点的接班人就加入队伍了。

    并行计算:
    并行计算这部分主要在计算节点上进行。 类似于MapReduce,分配任务时,会将数据拆分给每个worker节点。
    参数服务器在开始学习前,也会把大规模的训练数据拆分到每个计算节点上。单个计算节点就对本地数据进行学习就可以了。学习完毕再把参数的更新梯度上传给对应的参数服务节点进行更新。

    详细的流程:

    1.
    分发训练数据 -> 节点1
    节点2
    节点3

    节点i

    节点N

    2.

    [Read More...]
  • Popular Python libraries for Data Science and Machine Learning

    Python is almost a-must-have skill for data scientist, as you can see many data scientist positions require python programming skills. This post introduces some of the most popular python modules for data science. They are widely used to conducted projects related to data mining and machine learning, and normal data analysis.

    1. SciPy. SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. It provides a wide range of algorithms and mathematical tools for data scientist. 

    2. NumPy. NumPy is the fundamental package for scientific computing with Python. 

    [Read More...]
  • 8 kills you should learn to be a data scientist

    8 types of data science jobs with a breakdown of the 8 skills you need to get the job

    Data Scientists get assigned different names in different organizations. Contrary to popular belief, data science is not entirely about numbers, though it is a lot about them. A statistician, an astrologer, a survey designer, a biostatistician all play a data scientist’s role at some point without being known as one.

    There are a number of programming languages and software applications that support data analysis functions and they require different levels of programming skills. The following section explores different types of data scientists and corresponding functions performed by them:

    7 Types of Data Scientist
    1) Data Scientist as Statistician

    This is data analysis in the traditional sense.

    [Read More...]
  • Data Scientice and Machine Learning Interview Questions

    Here are some Data Science and Machine Learning related Interview Questions asked by big companies such as Facebook, Amazon, Microsoft, Yelp, Pinterest, Square, Google, Glassdoor and Groupon.  I also post an article that briefly describes the popular machine learning interview questions.

    1. Given a coin you don’t know it’s fair or unfair. Throw it 6 times and get 1 tail and 5 head. Determine whether it’s fair or not. What’s your confidence value?

    2. Given Amazon data, how to predict which users are going to be top shoppers in this holiday season.

    3.

    [Read More...]
  • Popular Machine Learning Interview Questions

    We list some popular Questions related to Machine Learning. You should prepare them if you are looking for jobs related to Machine Learning Engineers, Data Scientist or Research Scientist related to Machine Learning. 

    I put the questions into three categories: Machine Learning Theories, Machine Learning Algorithms and Machine Learning Tools. 

    Machine Learning Theories

    When we talk about machine learning theories, we often refer to machine learning models such as Support Vector Machines, Decision Trees, Logistic Regression, Topic Models, Bayesian Networks and Deep Learning. 

    Here are some books that must be read:

    [Read More...]