Feature extraction is a special form of dimensionality reduction. The main goal is to obtain a reduced subset of variables for representing each sample in the data set, which are supposed to be the most representative ones according to some criterion.
The most common tool for feature extraction is Principal Components Analysis  (PCA). PCA is a linear transformation for obtaining the directions (components) which better explain the variance of the original data. Each component is a linear combination of the original variables. The number of useful components is usually lower than the number of variables of the original data set. Only those components with eigenvalue larger than 1 are considered to be relevant (each original variable contributes with variance 1). For each component, the correlation with the original variables is computed, in order to determine which variables have stronger influence.
Each component "explains" a particular behavior of original data variance, ordered according to their eigenvalue. The first two components can be used to generate a 2D plot, although PCA does not pursue data separation, as Linear Discriminant Analysis  (LDA) does. PCA can be used to reduce dimensionality for improving clustering performance, while LDA is used for classification.