Popular Machine Learning Interview Questions
We list some popular Questions related to Machine Learning. You should prepare them if you are looking for jobs related to Machine Learning Engineers, Data Scientist or Research Scientist related to Machine Learning.
I put the questions into three categories: Machine Learning Theories, Machine Learning Algorithms and Machine Learning Tools.
Machine Learning Theories
When we talk about machine learning theories, we often refer to machine learning models such as Support Vector Machines, Decision Trees, Logistic Regression, Topic Models, Bayesian Networks and Deep Learning.
Here are some books that must be read:
 Pattern Recognition and Machine Learning
 The Elements of Statistical Learning: Data Mining, Inference, and Prediction
 Data Mining: Concepts and Techniques
Supervised learning vs unsupervised learning
These models can be put into two high level categories called supervised learning and unsupervised learning.
Some typical question are:
What is supervised learning and unsupervised learning?
What kind of problems can be solved by supervised learning or unsupervised learning?
Give some examples for supervised learning and unsupervised learning.
Please be noted that they are often the introductory questions before some more specific and harder questions.
Generative and Discriminative Models
The models in Machine Learning can also be classified as generative and discriminative models. So you need to prepare the questions:
What is the difference between a Generative and Discriminative Model?
A simple answer is:
1 
Given input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(yx)  which you should read as "the probability of y given x". 
To answer this question perfectly, it is necessary to read some articles and try to come up with some examples to fully understand the difference. Here are some resources:
1. On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive bayes.
2. What are the differences between generative and discriminative machine learning
3. Machine Learning: Generative and Discriminative Models
4. Pattern Recognition and Machine Learning
5. Machine Learning: A Probabilistic Perspective
6. An Introduction to Statistical Learning: with Applications in R
If you can answer the above questions correctly, now it is time to come to questions related to some specific models:
 What is
XXX Model
? ( XXX can be Support Vector Machines, Logistic Regression, Decision Trees, Random Forest, Beyesian Networks, Topic Models, Deep Learning, KMeans, KNN, Collaborative Filtering, Markov networks, Hidden Markov Model, e.g.) I will write more articles to describe each of these models.  Given a specific model, talk about its assumptions, what type problems it tries to solve, why it does preform well in this problem while performs badly for another?
 Is the model prone to overfitting? If so – how do you overcome this?
 How do we examine the data to test whether whether a assumption is satisfied for a model? For example, to use linear regression, the data should be norm distributed. So you need to know QQ plot, and Residual Plot analysis.
 Does the model has a random component or will the same training data always generate the same model? How do we deal with random effects in training
 What types of data (numerical, categorical etc…) can the model handle?
 How do you handle missing data?
 How interpretable is the model?
 What alternative models might we use for the same type of problem that this one attempts to solve, and how does it compare to those?
 How fast is prediction compared to other models? How fast is training compared to other models?
 Does the model have any parameters and thus require tuning? How do we do parameter tuning?
Machine Learning Algorithms
Some people may think that machine learning algorithms are equivalent to machine learning theories. In this post, I use the term machine learning algorithms to refer to the process to train a model using the input data, and use the model to make predictions on new data.
Gradient Decent Algorithm
The most famous algorithm is Gradient Decent Algorithm.
The paradigm of training models is usually first define a loss function of some parameters, then we use the input data to optimize (minimize) the loss function. The common method to solve a minimization problem is to use Gradient Decent Algorithm
to search a local optimized solution.
To use Gradient Decent, you have to run through ALL the samples in your training set to do a single update for a parameter in a particular iteration. This is too inefficient if the dataset is large. A faster method called Stochastic Gradient Decent (SGD) is used to update the parameters using one or small sample of the datasets for each iteration.
See this article for the difference between GD and SGD.
SGD is an important algorithm as it is used to train a wide range of machine models such linear regression, logistic regression, and deep learning.
 Logistic Regression and Gradient Descent
 Optimization: Stochastic Gradient Descent
 Stochastic gradient methods for machine learning
Expectation–maximization (EM) algorithm
EM algorithm is popular method that should be understood if you want to apply for an machine learning related job. It is often used to solve many important problems such as KMeans, Latent variable related models such as Hidden Markov Models and Topic Models. Here are some resources:
 A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
 ExpectationMaximization Algorithm and Applications
 A Note on the ExpectationMaximization (EM) Algorithm
Gibbs Sampling
Gibbs sampling algorithm is often used to train a topic model such a LDA. Read the following article.
GIBBS SAMPLING FOR THE UNINITIATED
Algorithms to train a Decision Tree
Here some import questions:
 How do you split the node
 How to pruning the tree
Algorithms for Association Rule Mining
Association Rule Mining aims to identify a patten
such as customer who buy X, and Y tend to buy Z
. {X, Y} => Z
, from the dataset.
Bagging and Boosting
This is used to train Random Forest
Graph Related Algorithms
 PageRank
 HITS Algorithm
General Algorithms
There are other algorithm that should be familiar with. They are actually general algorithms in Computer Science.
 Random Sampling
 Reservoir Sampling
 Weighed Sampling
 BitMap
 Bloom Filtering
 Trie
 AC Algorithm
 Deep First Search
 Bread First Search
 Shortest Path
Machine Learning Tools
 Do you have any research experience in machine learning or a related field?
 What tools and environments have you used to train and assess models?
 Do you have experience with Spark ML or another platform for building machine learning models using very large datasets?
These books are very helpful to master machine learning skills:

Abhiram Sharma