all | Learn for Master
  • Most Popular Deep Learning Projects

    Top Deep Learning Projects

    A list of popular github projects related to deep learning (ranked by stars).

    Last Update: 2016.08.09

    Project Name
    Stars
    Description

    TensorFlow
    29622
    Computation using data flow graphs for scalable machine learning.

    Caffe
    11799
    Caffe: a fast open framework for deep learning.

    Neural Style
    10148
    Torch implementation of neural style algorithm.

    Deep Dream
    9042
    Deep Dream.

    Keras
    7502
    Deep Learning library for Python. Convnets, recurrent neural networks, and more. Runs on Theano and TensorFlow.

    Roc AlphaGo
    7170
    An independent, student-led replication of DeepMind’s 2016 Nature publication,

    [Read More...]
  • Good articles to learn Convolution Neural Networks

    Understanding Convolutions

    from: https://colah.github.io/posts/2014-07-Understanding-Convolutions/

    neural networks, convolutional neural networks, convolution, math, probability

    In a previous post, we built up an understanding of convolutional neural networks, without referring to any significant mathematics. To go further, however, we need to understand convolutions.

    If we just wanted to understand convolutional neural networks, it might suffice to roughly understand convolutions. But the aim of this series is to bring us to the frontier of convolutional neural networks and explore new options.

    [Read More...]
  • Machine learning in 10 pictures

    Machine learning in 10 pictures

     from: http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html
     
    I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

    1. Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11. Test and training error as a function of model complexity.

    2. Under and overfitting: PRML Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve.

    [Read More...]
  • Good resources to learn how to use websocket push api in python

    How to connect to poloniex.com websocket api using a python library

    The problem:

    I am trying to connect to wss://api.poloniex.com and subscribe to ticker. I can’t find any working example in python. I have tried to use autobahn/twisted and websocket-client 0.32.0.
    The purpose of this is to get real time ticker data and store it in a mysql database.

    The solution:

    What you are trying to accomplish can be done by using WAMP, specifically by using the WAMP modules of the autobahn library (that you are already trying to use).

    [Read More...]
  • Best examples to learn machine learning

     

    Here are some good examples to learn machine learning and data science using python pandas.

    The following resources are from https://github.com/savarin/pyconuk-introtutorial

    The tutorial will start with data manipulation using pandas – loading data, and cleaning data. We’ll then use scikit-learn to make predictions. By the end of the session, we would have worked on the Kaggle Titanic competition from start to finish, through a number of iterations in an increasing order of sophistication. We’ll also have a brief discussion on cross-validation and making visualisations.

    [Read More...]
  • feature engineering in PySpark

    We often need to do feature transformation to build a training data set before training a model. 

    Here are some good examples to show how to transform your data, especially if you need to derive new features from other columns using

    spark data frame. 

    Encode and assemble multiple features in PySpark

    Encode and assemble multiple features in PySpark

    First of all StringIndexer.

    Next OneHotEncoder:

    VectorIndexer and VectorAssembler:

    Finally you can wrap all of that using pipelines:

    Arguably it is much robust and clean approach than writing everything from scratch.

    [Read More...]
  • Good resources to learn auto trade backtest

    Research Backtesting Environments in Python with pandas
    By Michael Halls-Moore on January 16th, 2014

    from: https://www.quantstart.com/articles/Research-Backtesting-Environments-in-Python-with-pandas

    Backtesting is the research process of applying a trading strategy idea to historical data in order to ascertain past performance. In particular, a backtester makes no guarantee about the future performance of the strategy. They are however an essential component of the strategy pipeline research process, allowing strategies to be filtered out before being placed into production.

    In this article (and those that follow it) a basic object-oriented backtesting system written in Python will be outlined. This early system will primarily be a “teaching aid”,

    [Read More...]
  • visualize iris dataset using python

    This notebook demos Python data visualizations on the Iris dataset

    from: https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations

    This Python 3 environment comes with many helpful analytics libraries installed. It is defined by the kaggle/python docker image

    We’ll use three libraries for this tutorial: pandas, matplotlib, and seaborn.

    Press “Fork” at the top-right of this screen to run this notebook yourself and build each of the examples.

    In [1]:

    Out[1]:

     
    Id
    SepalLengthCm
    SepalWidthCm
    PetalLengthCm
    PetalWidthCm
    Species

    0
    1
    5.1
    3.5
    1.4
    0.2
    Iris-setosa

    1
    2
    4.9
    3.0
    1.4
    0.2
    Iris-setosa

    2
    3
    4.7
    3.2
    1.3
    0.2
    Iris-setosa

    3
    4
    4.6
    3.1
    1.5
    0.2
    Iris-setosa

    4
    5
    5.0
    3.6
    1.4
    0.2
    Iris-setosa

    In [2]:

    Out[2]:

    In [3]:

    Out[3]:

     

    In [4]:

    Out[4]:

     

    In [5]:

    Out[5]:

     

    In [6]:

    Out[6]:

     

    In [7]:

     

    In [8]:

    Out[8]:

     

    In [9]:

    Out[9]:

     

    In [10]:

    Out[10]:

     

    In [11]:

    Out[11]:

     

    In [12]:

    Out[12]:

     

    In [13]:

    Out[13]:

     

    In [14]:

    Out[14]:

     

    In [15]:

    Out[15]:

     

     

    Wrapping Up

    I hope you enjoyed this quick introduction to some of the quick,

    [Read More...]
  • Adding Multiple Columns to Spark DataFrames

    Adding Multiple Columns to Spark DataFrames

    from: https://p058.github.io/spark/2017/01/08/spark-dataframes.html

    I have been using spark’s dataframe API for quite sometime and often I would want to add many columns to a dataframe(for ex : Creating more features from existing features for a machine learning model) and find it hard to write many withColumn statements. So I monkey patched spark dataframe to make it easy to add multiple columns to spark dataframe.

    First lets create a udf_wrapper decorator to keep the code concise

    Lets create a spark dataframe with columns, user_id, app_usage (app and number of sessions of each app),

    [Read More...]
  • 如何在 Kaggle 首战中进入前 10%

    如何在 Kaggle 首战中进入前 10%

    from: https://dnc1994.com/2016/04/rank-10-percent-in-first-kaggle-competition/

    Introduction

    Kaggle 是目前最大的 Data Scientist 聚集地。很多公司会拿出自家的数据并提供奖金,在 Kaggle 上组织数据竞赛。我最近完成了第一次比赛,在 2125 个参赛队伍中排名第 98 位(~ 5%)。因为是第一次参赛,所以对这个成绩我已经很满意了。在 Kaggle 上一次比赛的结果除了排名以外,还会显示的就是 Prize Winner,10% 或是 25% 这三档。所以刚刚接触 Kaggle 的人很多都会以 25% 或是 10% 为目标。在本文中,我试图根据自己第一次比赛的经验和从其他 Kaggler 那里学到的知识,为刚刚听说 Kaggle 想要参赛的新手提供一些切实可行的冲刺 10% 的指导。

    本文的英文版见这里

    Kaggle Profile

    Kaggler 绝大多数都是用 Python 和 R 这两门语言的。因为我主要使用 Python,所以本文提到的例子都会根据 Python 来。不过 R 的用户应该也能不费力地了解到工具背后的思想。

    首先简单介绍一些关于 Kaggle 比赛的知识:

    • 不同比赛有不同的任务,分类、回归、推荐、排序等。比赛开始后训练集和测试集就会开放下载。
    • 比赛通常持续 2 ~ 3 个月,每个队伍每天可以提交的次数有限,通常为 5 次。
    • 比赛结束前一周是一个 Deadline,在这之后不能再组队,也不能再新加入比赛。所以想要参加比赛请务必在这一 Deadline 之前有过至少一次有效的提交。
    • 一般情况下在提交后会立刻得到得分的反馈。不同比赛会采取不同的评分基准,可以在分数栏最上方看到使用的评分方法。
    • 反馈的分数是基于测试集的一部分计算的,剩下的另一部分会被用于计算最终的结果。所以最后排名会变动。
    • LB 指的就是在 Leaderboard 得到的分数,由上,有 Public LB 和 Private LB 之分。
    • 自己做的 Cross Validation 得到的分数一般称为 CV 或是 Local CV。一般来说 CV 的结果比 LB 要可靠。
    • 新手可以从比赛的 Forum 和 Scripts 中找到许多有用的经验和洞见。不要吝啬提问,Kaggler 都很热情。

    那么就开始吧!

    P.S.

    [Read More...]
Page 1 of 2912345...101520...Last »