Data mining vs Data Warehousing
Data Mining and Data Warehousing are both very powerful and popular techniques for analyzing data. Users who are inclined toward statistics use Data Mining. They utilize statistical models to look for hidden patterns in data. Data miners are interested in finding useful relationships between different data elements, which is ultimately profitable for businesses. But on the other hand, data experts who can analyze dimensions of the business directly tend to use Data warehouses.
Data mining is also known as Knowledge Discovery in data (KDD). As mentioned above, it is a field of computer science, which deals with extraction of previously unknown and interesting information from raw data. Due to the exponential growth of data, especially in areas such as business, data mining has become very important tool to convert this large wealth of data in to business intelligence, as manual extraction of patterns has become seemingly impossible in the past few decades. For example, it is currently been used for various applications such as social network analysis, fraud detection and marketing. Data mining usually deals with following four tasks: clustering, classification, regression, and association. Clustering is identifying similar groups from unstructured data. Classification is learning rules that can be applied to new data and will typically include following steps: preprocessing of data, designing modeling, learning/feature selection and Evaluation/validation. Regression is finding functions with minimal error to model data. And association is looking for relationships between variables. Data mining is usually used to answer questions like what are the main products that might help to obtain high profit next year in Wal-Mart?
As mentioned above, Data warehousing is also used for analyzing data, but by different sets of users and a slightly different goal in mind. For example, when it comes to the retail sector, Data warehousing users are more concerned with what kinds of purchases are popular among customers, so the results of the analysis can help the customer by improving the customer experience. But Data miners first conjecture a hypothesis such as which customers buy a certain type of product and analyze the data to test the hypothesis. Data warehousing could be carried out by a major retailer who initially stocks its stores with the same sizes of products to later find out that New York stores sells smaller size inventory much faster than in Chicago stores. So, by looking at this result the retailer can stock the New York store with smaller sizes compared to Chicago stores.
So, as you can clearly see, these two types of analysis appear to be of the same nature to the naked eye. Both do concern about increasing profits based on the historical data. But of course, there are key differences. In simple terms, Data Mining and Data Warehousing are dedicated to furnishing different types of analytics, but definitely for different types of users. In other words, Data Mining looks for correlations, patters to support a statistical hypothesis. But, Data Warehousing answers a comparatively broader question and it slices and dices data from there onwards to recognize ways of improvement in the future.
Leave a Reply