数据挖掘:Data Mining Techniques
该文本介绍了多种机器学习和深度学习的核心概念及其应用。分类(Classification)基于属性预测类别,聚类(Clustering)根据相似性分组,关联规则发现(Association Rule Discovery)基于物品集合的关联性,顺序模式发现(Sequential Pattern Discovery)预测事件的顺序依赖关系,回归(Regression)预测连续值,异常检测(Deviation/Anomaly Detection)识别异常行为。此外,文本还提到了深度学习(Deep Learning)和图学习(Graph Learning)作为机器学习的前沿领域。
文章目录
- 分类分析
- 聚类分析
- 关联规则挖掘
- 序列模式挖掘
- 回归分析
- 异常检测
- 深度学习
- 图学习
Classification 分类
Given a collection of records (training set)
Each record comprises a collection of attributes; among these attributes, one is designated as the class.
Establish a model that class attribute is dependent on the features of other attributes.
The objective is to classify novel records into the most accurate class possible.
该测试集用于评估模型的准确性。通常,给定的数据集会被划分为训练集和测试集,其中训练集用于构建模型,测试集用于验证模型的性能。

Clustering 聚类
Considering a collection of data points, each characterized by a unique set of attributes, and employing a defined measure of similarity among them, the objective is to group data points into clusters such that
-
Data points within the same cluster are more similar to each other.
-
Data points in distinct clusters are less similar to each other.
-
Measures of Similarity:
- Euclidean Distance when attributes are continuous.
- Other Problem-specific Measures, including Cosine Similarity, Hamming Distance, and Gaussian Distance among others.

Association Rule Discovery 关联规则发现
- Given a set of records, each containing a certain number of items from a given collection;
- Generate dependency rules that can predict the occurrence of an item based on the occurrences of other items.

Sequential Pattern Discovery 顺序模式发现
- Given a collection of objects, each object is linked to its own timeline of events. Identify rules that elucidate significant sequential dependencies among different events.
- Rules emerge primarily through the identification of patterns. Events within these patterns are subject to timing constraints.
Regression 回归
- Estimate the value of a continuous dependent variable using the values of independent variables, under the assumption of linear or nonlinear dependency relationships. Y = aX + b
- Well-researched in statistics and neural network domains.
- Illustrative examples include:
- Estimating sales figures for new products based on advertising budgets.
- Forecasting wind speeds as a function of temperature, humidity, and atmospheric pressure.
- Time series analysis for stock market index prediction.
偏差与异常检测的识别与处理,该技术在数据处理中具有重要应用价值
- Identify significant anomalies in normal behavior patterns.
- Applications include:
- Credit Card Fraud Detection System 信用卡欺诈检测系统
- Network Intrusion Detection System 网络入侵检测系统
- Applications include:
