data science | Learn for Master
  • Best examples to learn machine learning

     

    Here are some good examples to learn machine learning and data science using python pandas.

    The following resources are from https://github.com/savarin/pyconuk-introtutorial

    The tutorial will start with data manipulation using pandas – loading data, and cleaning data. We’ll then use scikit-learn to make predictions. By the end of the session, we would have worked on the Kaggle Titanic competition from start to finish, through a number of iterations in an increasing order of sophistication. We’ll also have a brief discussion on cross-validation and making visualisations.

    [Read More...]
  • feature engineering in PySpark

    We often need to do feature transformation to build a training data set before training a model. 

    Here are some good examples to show how to transform your data, especially if you need to derive new features from other columns using

    spark data frame. 

    Encode and assemble multiple features in PySpark

    Encode and assemble multiple features in PySpark

    First of all StringIndexer.

    Next OneHotEncoder:

    VectorIndexer and VectorAssembler:

    Finally you can wrap all of that using pipelines:

    Arguably it is much robust and clean approach than writing everything from scratch.

    [Read More...]
  • Good resources to learn auto trade backtest

    Research Backtesting Environments in Python with pandas
    By Michael Halls-Moore on January 16th, 2014

    from: https://www.quantstart.com/articles/Research-Backtesting-Environments-in-Python-with-pandas

    Backtesting is the research process of applying a trading strategy idea to historical data in order to ascertain past performance. In particular, a backtester makes no guarantee about the future performance of the strategy. They are however an essential component of the strategy pipeline research process, allowing strategies to be filtered out before being placed into production.

    In this article (and those that follow it) a basic object-oriented backtesting system written in Python will be outlined. This early system will primarily be a “teaching aid”,

    [Read More...]
  • Good blogs to learn machine learning and data sciense

    • Occam’s Razor by Avinash Kaushik, examining web analytics and Digital Marketing.
    • OpenGardens, Data Science for Internet of Things (IoT), by Ajit Jaokar.
    • O’reilly Radar O’Reilly Radar, a wide range of research topics and books.
    • Observational Epidemiology A college professor and a statistical consultant offer their comments, observations and thoughts on applied statistics, higher education and epidemiology.
    • Overcoming bias By Robin Hanson and Eliezer Yudkowsky. Present Statistical analysis in reflections on honesty, signaling, disagreement, forecasting and the far future.
    • Probability &
    [Read More...]
  • Popular Python libraries for Data Science and Machine Learning

    Python is almost a-must-have skill for data scientist, as you can see many data scientist positions require python programming skills. This post introduces some of the most popular python modules for data science. They are widely used to conducted projects related to data mining and machine learning, and normal data analysis.

    1. SciPy. SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. It provides a wide range of algorithms and mathematical tools for data scientist. 

    2. NumPy. NumPy is the fundamental package for scientific computing with Python. 

    [Read More...]
  • 8 kills you should learn to be a data scientist

    8 types of data science jobs with a breakdown of the 8 skills you need to get the job

    Data Scientists get assigned different names in different organizations. Contrary to popular belief, data science is not entirely about numbers, though it is a lot about them. A statistician, an astrologer, a survey designer, a biostatistician all play a data scientist’s role at some point without being known as one.

    There are a number of programming languages and software applications that support data analysis functions and they require different levels of programming skills. The following section explores different types of data scientists and corresponding functions performed by them:

    7 Types of Data Scientist
    1) Data Scientist as Statistician

    This is data analysis in the traditional sense.

    [Read More...]