From WikiEducator
Jump to: navigation, search

BREAD: Basic REsources for Analyzing Data

Most research today uses both qualitative and quantitative analysis for improving the "understanding" of the world we live in, collecting data from almost any real life context. BREAD means to be a basic set of open educational resources for analyzing such data, with the aim of identifying the right tools and methodologies, thus avoiding the classical mistakes when moving from a qualitative approach to a quantitative one. BREAD is what one should know (or, at least, to be aware of) about Statistics [1] and Data Mining [2], including:

  • What are Qualitative and Quantitative analysis?

Qualitative analysis is oriented towards explaining the world through exploration; this means providing answers to "how" and "why" things happen. On the other hand, Quantitative analysis tries to provide answers to "what", "where" and "when", using measurable elements and pursuing conclusive results.

  • Which are the basic tools of Qualitative analysis [3]?
  • Which are the basic tools of Quantitative analysis [4]?

There is a whole branch of Mathematics dedicated to Quantitative analysis, and in the rare case of you don't know about it, it is called Statistics [5]. Following from the development of computers and algorithms, Data Mining [6] is about extracting useful knowledge from (hopefully) huge amounts of data.

  • What is Data Mining with respect to Statistics?

According to Gregory Piatetsky-Shapiro [7], Statistics is at the core of Data Mining - helping to distinguish between random noise and significant findings, and providing a theory for estimating probabilities of predictions, etc. However Data Mining is more than Statistics. Data Mining covers the entire process of data analysis, including data cleaning and preparation and visualization of the results, and how to produce predictions in real-time, etc.

  • What are the basic statistical tools for analyzing data?

Statistics is about description and inference. Descriptive statistics [8] is used to describe the basic features of data. Statistical inference [9] is used to answer questions about the collected data: hypothesis testing, estimation, correlation and regression.

  • Which statistical tests [10] are available for accepting or rejecting hypothesis?
  • Which are the steps of the Data Mining process?

Basic bibliography

Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. (1996) "From Data Mining to Knowledge Discovery in Databases", AI Magazine 17(3), pp. 37-54 [11]

for any comments or questions, please contact jminguillona<at>uoc<dot>edu