What is Dimensionality Reduction?

Share This Post

Table of Contents

A lot of Machine Learning problems consist of hundreds to thousands of features. Having such a large number of features poses certain problems.

This problem is also sometimes known as The Curse of Dimensionality and Dimensionality Reduction or Dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.

In other words, the goal is to take something that is very high dimensional and get it down to something that is easier to work with, without losing much of the information.

Why Dimensionality Reduction?

We are living in a time where the connections between the different devices have increased because they have more sensors and measuring technologies that control some actions. That makes the features we should analyze bigger every time and more unintelligible.
These techniques help us to reduce the quantity of relevant information that we should save so they reduce a lot the storage costs.
Large dimensions are difficult to train on, it needs more computational power and time.
In most datasets we find a high quantity or repeated data, columns that have just one value or which variance is so small that are not able to give the needed information for the model learning. The reduction of dimensionality helps us to filter this unnecessary information.
One of the most important things is the human eye. We do not have the same capabilities as a machine so it’s necessary to adapt the data to be understood through our senses. This algorithm makes it easier to plot in two or three dimensions our data distribution.
Multicollinearity. The detection of the redundant information is important to delete the unnecessary one. It happens many times that you can find variables represented in different units of measure (Example: m and cm). These variables with such strong correlation are not useful for model efficiency and model learning.

Data architecture characteristics & principles

Real Dimension vs Apparent Dimension

Real dimension of data generally is not equal to the apparent dimension of our dataset.
Degrees of freedom and restrictions

gL4iPfgl2GlFPZClUi1AA2ENHaRJcuQhXFieQRJlVWDjIw6BL 5v9 Qz3ZK8YK94JUhe G97rmkMRCSYWDQ OuvX7qyBS8AtvGMwZIZye6JhGU3dbMm8fS Sp AIuj0QugilOl

Projection vs Manifold Learning

Projection: This technique deals with projecting every data point which is in high dimension, into a subspace suitable lower-dimensional space in a way which approximately preserves the distances between the points.

For instance the figure below, the points in 3D are projected onto a 2D plane. This is a lower-dimensional (2D) subspace of the high-dimensional (3D) space and the axes correspond to new features z1 and z2 (the coordinates of the projections on the plane).

UTsyTTwj6QQVrlLgsWrfwSYDWLdpfeUnok6qPJHCq7tYOO22u nM72Tr ArWb5V1vyf0RQv5d4tIxrrd1GGPQN5luENa xt8MMPflsxTX9ZNs OzWD42gn lHs3r9AAMzCCIrQoO

Manifold Learning: Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

fuEPVbX6PHSRO1xiZNTi2TVJy6t1lgePdOkYKwo6hrSIi4in4bEGb7cGFwf7 7MwbbrnIyEIIuO7zcj6zZOVNNZlYesNWTVMVoJ4tthUPiRKktNnT8dy7MK eJEdeZRXUD4djsaQ

Linear vs Nonlinear

Linear subspaces may be inefficient for some datasets. If the data is embedded on a manifold, we should capture the structure(unfolding).

oCyW4kaM1FWSKTFxQ Nm4B zz5hUqWIn4riKtfwMMxAMA2jH2ITPHg9h

PCA – Principal Component Analysis

The idea behind PCA is very simple:

Identify a Hyperplane that lies closest to the data
Project the data onto the hyperplane.

Variance

WUqAx2ZlIYh0msy 4SonxTYGWw2xZ7drxUDHKf CdCnjjJi8Sb R21ahdOVM4ISciNXAHcKzAsZQms5PctDqgC6xEqrhOqiEbAHwkdtIwTngc46c fLf4DegO jOs8evqGNd D1r

Variance visualization

PwI3zjBiKQZ9pOMtllv45gz7u4ybRjw8B89ZeD BydPgrEbyiWH5nsNyoikFz RkPryKPtVS8rWOLw1diMcU17vQgDPxN

Variance maximization

0iAeY37ZqXwQzZ7X7l14MetV6z0T9HLpBPrljvOZ 1d3gn9FAU4iDr8v39yaQMa3rYYbfIUPal 0J8uzx

PCA is a variance maximizer. It projects the original data onto the directions where variance is maximum.

qVscTU4V3hBYijuaqNg6NKwRHNTQioyuCsNNq1pomF2PEd4gjuS7yuyu sMomB1nOOAyWKrsqey GQwif9bzS91NMFCs9MnpRAjnqUWJDdNrsg68RyPpKh0AyHWzo9ovQwhXu3yD

In this technique, variables are transformed into a new set of variables, which are linear combinations of original variables. These new set of variables are known as principal components.

They are obtained in such a way that the first principle component accounts for most of the possible variation of original data after which each succeeding component has the highest possible variance.

3rd International Summer School on Artificial Intelligence and Games

Principal Component

The axis that explains the maximum amount of variance in the training set is called the principal components.

The axis orthogonal to this axis is called the second principal component.

Thus in 2D, there will be 2 principal components. However, for higher dimensions, PCA would find a third component orthogonal to the other two components and so on.

Contextual Related Posts

Author

David Suárez

View all posts

Estimate Your Project

Request

Get our Book: Software Architecture Metrics

Have a challenging project?

We Can Work On It Together

What is Dimensionality Reduction?

Why Dimensionality Reduction?

Real Dimension vs Apparent Dimension