The main difference between data mining and data warehousing is that data mining is the process of identifying patterns from a huge amount of data while data warehousing is the process of integrating data from multiple data sources into a central location.
Data mining is the process of discovering patterns in large data sets. It uses various techniques such as classification, regression, etc. to take business decisions. On the other hand, data warehousing is the process of extracting, transforming and loading data from multiple data sources to the data warehouse. Data mining techniques can be applied to a data warehouse to discover useful patterns.
Key Areas Covered
1. What is Data Mining
– Definition, Functionality
2. What is Data Warehousing
– Definition, Functionality
3. Difference Between Data Mining and Data Warehousing
– Comparison of Key Differences
Key Terms
Data Mining, Data Warehousing, Data
What is Data Mining
Data mining is the process of discovering the patterns in a large dataset. In other words, data mining extracts new patterns, relationships among data entities. The mined data should be new, correct and should have a potential usage.
The process of extracting useful information from data involves several steps. The first step is data selection. Data comes from multiple sources and have multiple formats. Therefore, all the data is integrated and stored in a single location called a data warehouse. The second step is preprocessing. It involves summarizing, normalization and aggregation. These transformations help to make data suitable for data mining. The third step is data mining. It uses techniques or algorithms such as clustering, regression, classification to extract patterns of the data. The fourth step is pattern evaluation. It checks the accuracy of the obtained output. The final step is to represent the outcomes using graphs.
The main techniques to perform data mining are anomaly detection, association rule mining, clustering, classification, and regression. Firstly, anomaly detection helps to identify unusual patterns to understand the variation in data. Secondly, association rule mining helps to find interesting association patterns among variables. Thirdly, clustering identifies classes in data which are similar to each other. Fourthly, classification identifies the classes to which an observation belongs to. Finally, regressions help to find the relationship among variables. These are main techniques used in data mining.
What is Data Warehousing
In a business organization, data is in various databases. First, data from multiple sources are extracted and transformed. Then, they are loaded into a central location called a data warehouse. Data warehousing is the process of loading data from various data sources into a data warehouse. Then various strategies can be applied to analyze data to support end users to take business decisions. Moreover, the data in the data warehouse can be divided into data marts. These data marts have data for a particular set of users. For example, the human resource department can use their data mart. The sales department can use the sales mart and so on.
Data warehouses are subject oriented, integrated, time variant and nonvolatile. A data warehouse is subject oriented. It gives knowledge about a subject than the ongoing operations. It is integrated because it consolidates data from various data sources. The warehouse data provides information with respect to a specific time period. So, it is time variant. Finally, it provides non-volatility because, after loading data into the warehouse, the data should not be deleted or updated. In brief, data warehousing is beneficial for making decisions for the organization.
Difference Between Data Mining and Data Warehousing
Definition
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data warehousing is the process of extracting, transforming and loading data from multiple data sources to a central location called a data warehouse.
Process
In data mining, the data is analyzed regularly. The data is stored periodically in data warehousing.
Data
Data mining analyzes a sample of data while data warehousing stores a huge amount of data.
Usage
Data mining discovers patterns in data for better decision making. On the other hand, data warehousing provides a mechanism for an organization to store a huge amount of data.
Conclusion
The difference between data mining and data warehousing is that data mining is the process of identifying patterns from a huge amount of data while data warehousing is the process of integrating data from multiple data sources into a central location. Usually, engineers perform data warehousing, and business users perform data mining with the help of engineers.
Reference:
1. Data Mining Using R | Data Mining Tutorial for Beginners | R Tutorial for Beginners | Edureka, Edureka!, 8 Nov. 2017, Available here.
2. Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Warehousing | Edureka, Edureka!, 22 June 2017, Available here.
Image Courtesy:
1. “Data Mining” By Arbeck – Own work (CC BY 3.0) via Commons Wikimedia
2. “Data warehouse overview” By Hhultgren – Own work (Public Domain) via Commons Wikimedia
Leave a Reply