Data Mining vs OLAP
Both data mining and OLAP are two of the common Business Intelligence (BI) technologies. Business intelligence refers to computer-based methods for identifying and extracting useful information from business data. Data mining is the field of computer science which, deals with extracting interesting patterns from large sets of data. It combines many methods from artificial intelligence, statistics and database management. OLAP (online analytical processing) as the name suggest is a compilation of ways to query multi-dimensional databases.
Data mining is also known as Knowledge Discovery in data (KDD). As mentioned above, it is a field of computer science, which deals with extraction of previously unknown and interesting information from raw data. Due to the exponential growth of data, especially in areas such as business, data mining has become very important tool to convert this large wealth of data in to business intelligence, as manual extraction of patterns has become seemingly impossible in the past few decades. For example, it is currently been used for various applications such as social network analysis, fraud detection and marketing. Data mining usually deals with following four tasks: clustering, classification, regression, and association. Clustering is identifying similar groups from unstructured data. Classification is learning rules that can be applied to new data and will typically include following steps: preprocessing of data, designing modeling, learning/feature selection and evaluation/validation. Regression is finding functions with minimal error to model data. And association is looking for relationships between variables. Data mining is usually used to answer questions like what are the main products that might help to obtain high profit next year in Wal-Mart.
OLAP is a class of systems, which provide answers to multi-dimensional queries. Typically OLAP is used for marketing, budgeting, forecasting and similar applications. It goes without saying that the databases used for OLAP are configured for complex and ad-hoc queries with a quick performance in mind. Typically a matrix is used to display the output of an OLAP. The rows and columns are formed by the dimensions of the query. They often use methods of aggregation on multiple tables to obtain summaries. For example, it can be used to find out about the sales of this year in Wal-Mart compared to last year? What is the prediction on the sales in the next quarter? What can be said about the trend by looking at the percentage change?
Although it is obvious that Data mining and OLAP are similar because they operate on data to gain intelligence, the main difference comes from how they operate on data. OLAP tools provides multidimensional data analysis and they provide summaries of the data but contrastingly, data mining focuses on ratios, patterns and influences in the set of data. That is an OLAP deal with aggregation, which boils down to the operation of data via “addition” but data mining corresponds to “division”. Other notable difference is that while data mining tools model data and return actionable rules, OLAP will conduct comparison and contrast techniques along business dimension in real time.