Dimensionality reduction for 2d and 3d visualisation

For this, 2 algorithms in particular, T-SNE and PCA, are easy to use because they are already implemented in sklearn.

First, a dataset has to be loaded, in this case lets use a simple dataset from sklearn:

In [1]:

%matplotlib notebook

from sklearn.datasets import load_digits

digits = load_digits()
print("Data:")
print(digits.data)
print("Maximum Value:")
print(digits.data.max())
print("Normalized Data:")
print(digits.data/digits.data.max())


data = (digits.data/digits.data.max())[:500]
labels = digits.target[:500]

print(labels)

Data:
[[  0.   0.   5. ...,   0.   0.   0.]
 [  0.   0.   0. ...,  10.   0.   0.]
 [  0.   0.   0. ...,  16.   9.   0.]
 ..., 
 [  0.   0.   1. ...,   6.   0.   0.]
 [  0.   0.   2. ...,  12.   0.   0.]
 [  0.   0.  10. ...,  12.   1.   0.]]
Maximum Value:
16.0
Normalized Data:
[[ 0.      0.      0.3125 ...,  0.      0.      0.    ]
 [ 0.      0.      0.     ...,  0.625   0.      0.    ]
 [ 0.      0.      0.     ...,  1.      0.5625  0.    ]
 ..., 
 [ 0.      0.      0.0625 ...,  0.375   0.      0.    ]
 [ 0.      0.      0.125  ...,  0.75    0.      0.    ]
 [ 0.      0.      0.625  ...,  0.75    0.0625  0.    ]]
[0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0
 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9
 5 2 8 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4
 4 7 2 8 2 2 5 7 9 5 4 8 8 4 9 0 8 9 8 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2
 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8 2 0 0 1 7 6 3 2 1 7 3 1 3 9 1
 7 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4 4 7 2 8 2 2 5 5 4 8 8 4 9 0 8 9 8 0 1 2
 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9
 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8
 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4 4 7 2
 8 2 2 5 7 9 5 4 8 8 4 9 0 8 9 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2
 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9 5 2 8 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7
 6 8 4 3 1 4 0 5 3 6 9 6 1 7 5 4 4 7 2]

Now we need to do dimensionality reduction on the data:

In [2]:

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

twod_pca_data = TSNE(n_components=2, perplexity=100.0).fit_transform(data)
threed_pca_data = TSNE(n_components=3, perplexity=100.0).fit_transform(data)

2D visualisation

Now we try a simple visualisation for two dimensional data, where every class has its own color.

In [4]:

import matplotlib.pyplot as plt

for label in set(digits.target_names):
    data_for_label = twod_pca_data[labels == label]
    plt.scatter(data_for_label[:, 0], data_for_label[:, 1], label=str(label))
plt.legend()
plt.tight_layout()
plt.show()

3D visualisation

We can do the same, using the data that has been reduced to three dimensions, to generate a 3D plot.

This is a bit more complicated codewise, but if you have it done once, its very easy to replicate.

In [5]:

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(15, 10))
ax = fig.add_subplot(111, projection='3d')
print(ax)
for label in set(digits.target_names):
    data_for_label = threed_pca_data[labels == label]
    ax.scatter(data_for_label[:, 0], data_for_label[:, 1], data_for_label[:, 2], label=str(label), s=300)
plt.legend()
plt.tight_layout()
plt.show()

Axes(0.125,0.11;0.775x0.77)

HOME

NEWS

TEAM

RESEARCH

PROJECTS

BLOG

ARCHIVE

DIMENSIONALITY REDUCTION

TAGS