Chi Square test is a popular method for feature selection. In this post, I describe how to use Spark to implement the chi square test algorithm for feature selection.

Implement Chi Square test for feature selection using Spark¶

We have described how to use chi square test for feature selection. Here, I use an example to show how to use Spark to implement the chi square test algorithm for feature selection. This can make the algorithm scalable to very large dataset

Given the following data in VW format:

label instanceWeight | feature1 feature2 feature3 feature4 …

