Lesson 8: Machine Learning - Classification, Dimesionality Reduction

You have already started Machine Learning when you performed the linear regression analysis, but let’s talk about Machine Learning in general first, then, as promised earlier, we’ll move to our larger dataset.

Machine Learning

(Learn from data and make decisions)

Supervised Learning

(Predictive Model)


Unsupervised Learning

(Non-predicitve Model)

Dimensionality Reduction

Supervised Learning:

Use training set with correct inputs and outputs to predict outputs for test data inputs.


  • Inputs(X): Features
  • Outputs(y): binary or multiple classes


  • Inputs(X): Independent Variable
  • Outputs(y): Dependent Variable (Continuous)

Unupervised Learning:

Find patterns among inputs (features), no labels in data


  • Find groups within data (Example: Phylogeny tree)

Dimensionality Reduction:

  • Find a lower dimension representation of higher dimensional data

We built a linear regression model in the last lesson, and Classification and Dimensionaloty Reduction component of the ML lesson is currently available as Kaggle Notebook on Tumor Classification between AML and ALL and finding top genes contributing to the classification.

Next, we will explore Clustering.