Have you ever heard of the term Data Mining? Maybe you have also heard of data science. So what is the definition and method of data mining, what is its function and what are the stages in data mining? Everything will be discussed in more detail here
Understanding Data Mining
Data mining is a process of dredging or collecting important information from large data. The data mining process often uses statistical, mathematical methods, to utilize artificial intelligence technology.
Alternative names are Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, and others.
If you look at the picture in the KDD process, many concepts and techniques are used in the data mining process. The process requires several steps to get the desired data.
The KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation.
Data Mining Function
Data mining has a lot of functions, for the main function itself, there are two; That is descriptive function and predictive function. For other functions will be discussed below
The description function in data mining is a function to understand more about the observed data. By carrying out a process, it is expected to know the behavior of the data. This data can later be used to determine the characteristics of the data in question.
By using the descriptive data mining function, you can later find certain patterns hidden in data. In other words, if the pattern is repeated and valuable, the characteristics of a data can be known.
Prediction function is a function of how a process will find a certain pattern from the data. These patterns can be identified from the various variables in the data.
That’s why this one function is said to be a predictive function as well as doing predictive analysis. This function can also be used to predict a certain variable that does not exist in the data.
So this function makes it easy and profitable for anyone who needs accurate predictions to make these important things better.
Other data mining functions are: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.
- Multidimensional concept descriptionCharacterization and discrimination, Or serves to Generalize, summarize and differentiate data characteristics, etc.
- Frequent patterns, association, correlation
- Classification and prediction, Build a model (function) that describes and differentiates a class or concept for future prediction. For example, Classifying countries by (climate), or classifying cars by (gas mileage)
- Cluster analysis, Create group data to form a new class. For example, Maximizing intra-class similarity & minimizing inter-class similarity
- Outlier analysisData objects that do not conform to the general behavior of the data, Useful in fraud detection, analysis of rare events.
- Trend and evolution analysisTrends and deviations: e.g. Regression analysis or Mining Sequential pattern mining: e.g., Digital cameras, or Periodicity analysis and Similarity-based analysis.
- Other pattern-directed or statistical analyzes
Data Mining Method
In collecting information of course there are methods, these methods will help in the process of finding data. Data mining will provide planning from idea to final implementation.
1. Data retrieval process
How is the data collection process? Previously, it was explained about KDD or Knowledge discovery (mining) in databases. With this KDD you can carry out the data retrieval process.
These processes or stages start with raw data and end with processed knowledge or information. So the process is as follows:
- Data Cleansing, The process by which incomplete, error-containing and inconsistent data are removed from the data collection. Also know data lifecycle management to know about data processing.
- Data IntegrationData integration process where iterative will be combined.
- SelectionThe process of selecting or selecting data relevant to the analysis to be received from the existing data collection.
- Data TransformationThe process of transforming the data that has been selected into the form of a mining procedure by means and data aggression.
- Data MiningThe most important process in which various techniques are applied to extract various potential patterns to obtain useful data.
- Pattern EvolutionA process in which interesting patterns that have previously been found are identified based on a given measure
- Knowledge PresentationThis is the last stage of the process. In this case, a visualization technique is used which aims to help the user understand and interpret the results of data mining.
2. Techniques in the Data Mining Process
There are various techniques used in the data mining process. What are the techniques that can be used in the Data Mining process?
- Predictive Modeling, There are two techniques, namely Classification and Value Prediction
- Database SegmentationPartition the database into the same number of segments, clusters, or records
- link analysis, A technique for establishing relationships between individual records or a set of records in a database.
- deviation detection, A technique for identifying outliers that express a deviation from a previously known expectation.
- Nearest Neighbor, That is a technique that predicts clustering, this technique itself is the oldest technique used in data mining.
- Clusteringis a technique for classifying data based on the criteria for each data.
- Decision Tree, is a next-generation technique, where this technique is a predictive model that can be described as a tree. Each node in the tree structure represents a question that is used to classify data.
Problems in Data Mining
It is not an easy matter to collect information and perform data mining which will be useful in the future. There are many problems that can be encountered when mining data. What are the problems in data mining?
1. Mining Methodology
- Mining different types of knowledge from different data types
- Performance: efficiency, effectiveness and scalability
- Pattern evaluation: attraction problem
- Entering background knowledge
- Handling noise and incomplete data
- Parallel, distributed and incremental mining methods
- Integration of discovered knowledge with existing: knowledge fusion
2. User interaction
- Data mining query language and ad-hoc mining
- Expression and visualization of data mining results
- Interactive knowledge mining at multiple levels of abstraction
3. Applications and social impacts
- Domain specific data mining & invisible data mining
- Protection of data security, integrity and privacy
Example of Data Mining Application
Data mining can be used in various sectors, starting from the business sector, management, finance and so on. The following is an example of the application of data mining in several sectors:
1. Market Analysis and Management
In the marketing sector data mining is usually used for target marketing, customer relationship management (CRM), market analysis, cross selling, market segmentation.
- Marketing target, For example finding a “model” customer group that has the same characteristics: interests, income level, spending habits, etc. or determine customer buying patterns over time.
- Market traffic analysisFinding relationships / relationships between sales products, & predictions based on these associations.
- Customer profilingWhat types of customers buy what products (grouping or classification)
- Customer needs analysisFor example identification of the best products for various customer groups, Predicting what factors will attract new customers, Provision of summary information, Multidimensional summary reports, Statistical summary information (data center trends and variations)
2. Corporate Analysis & Risk Management
Application of Data mining in the enterprise sector is usually used for prediction, customer retention, better underwriting, quality control, competitive analysis.
- Financial planning and asset evaluationFor example cash flow analysis and prediction, contingency claims analysis for evaluating assets, cross-sectional and time series analysis (financial ratios, trend analysis, etc.)
- Planning Resource planningFor example summarizing and comparing resources and expenses
- CompetitionFor example monitoring competitors and market direction, categorizing customers into classes and class-based pricing procedures, and setting pricing strategies in highly competitive markets.
3. Fraud Detection & Mining Unusual Patterns
Data mining also serves to find and detect fraud in a system. By using mini data, you will be able to see millions of incoming transactions.
- Approach: Clustering & model construction for fraud, outlier analysis
- Application: Healthcare, retail, credit card services, telecomm. For example Auto insurance, Money laundering, Health insurance, Telecommunications, Analysis of patterns that deviate from the expected norm, Retail industry, etc.
That’s some information about data mining, you can learn about data mining to get and collect useful information/data for the future.