Dimensionality reduction

From WikiEducator
Jump to: navigation, search

Dimensionality reduction


Data are usually represented by vectors in a d-dimensional space, meaning that each point in this space (each sample) is described by d variables. Human beings are very good in establishing differences and similarities between things, because of the complexity of our visual system. One of the basic concepts in Data Mining is "taking a look" to the data, but this is not a simple issue as we only can represent 2D or 3D graphics using computer screens and printers. This means that the d original variables must be transformed into two or three components which are used for data visualization using . This process is called Multidimensional Scaling [1]. Actually, more than two or three variables can be used for representing each point in a 2D or 3D graphic, because points can be drawn using different colors and shapes.

The basic idea of multidimensional scaling is that the concept of "distance" between elements is maintained, that is, elements that are close to each other in the original d-dimensional space appear also close in the 2D or 3D space. Of course, the concept of "distance" must be properly defined, according to the particularities of the available data.