KDD vs Data mining
KDD (Knowledge Discovery in Databases) is a field of computer science, which includes the tools and theories to help humans in extracting useful and previously unknown information (i.e. knowledge) from large collections of digitized data. KDD consists of several steps, and Data Mining is one of them. Data Mining is application of a specific algorithm in order to extract patterns from data. Nonetheless, KDD and Data Mining are used interchangeably.
What is KDD?
As mentioned above, KDD is a field of computer science, which deals with extraction of previously unknown and interesting information from raw data. KDD is the whole process of trying to make sense of data by developing appropriate methods or techniques. This process deal with the mapping of low-level data into other forms those are more compact, abstract and useful. This is achieved by creating short reports, modeling the process of generating data and developing predictive models that can predict future cases. Due to the exponential growth of data, especially in areas such as business, KDD has become a very important process to convert this large wealth of data in to business intelligence, as manual extraction of patterns has become seemingly impossible in the past few decades. For example, it is currently been used for various applications such as social network analysis, fraud detection, science, investment, manufacturing, telecommunications, data cleaning, sports, information retrieval and largely for marketing. KDD is usually used to answer questions like what are the main products that might help to obtain high profit next year in Wal-Mart?. This process has several steps. It starts with developing an understanding of the application domain and the goal and then creating a target dataset. This is followed by cleaning, preprocessing, reduction and projection of data. Next step is using Data Mining (explained below) to identify pattern. Finally, discovered knowledge is consolidates by visualizing and/or interpreting.
What is Data Mining?
As mentioned above, Data Mining is only a step within the overall KDD process. There are two major Data Mining goals as defined by the goal of the application, and they are namely verification or discovery. Verification is verifying the user’s hypothesis about data, while discovery is automatically finding interesting patterns. There are four major data mining task: clustering, classification, regression, and association (summarization). Clustering is identifying similar groups from unstructured data. Classification is learning rules that can be applied to new data. Regression is finding functions with minimal error to model data. And association is looking for relationships between variables. Then, the specific data mining algorithm needs to be selected. Depending on the goal, different algorithms like linear regression, logistic regression, decision trees and Naïve Bayes can be selected. Then patterns of interest in one or more representational forms are searched. Finally, models are evaluated either using predictive accuracy or understandability.
What is the difference between KDD and Data mining?
Although, the two terms KDD and Data Mining are heavily used interchangeably, they refer to two related yet slightly different concepts. KDD is the overall process of extracting knowledge from data while Data Mining is a step inside the KDD process, which deals with identifying patterns in data. In other words, Data Mining is only the application of a specific algorithm based on the overall goal of the KDD process.