Advertisement

数据挖掘:Data Mining Techniques

阅读量:

该文本介绍了多种机器学习和深度学习的核心概念及其应用。分类(Classification)基于属性预测类别,聚类(Clustering)根据相似性分组,关联规则发现(Association Rule Discovery)基于物品集合的关联性,顺序模式发现(Sequential Pattern Discovery)预测事件的顺序依赖关系,回归(Regression)预测连续值,异常检测(Deviation/Anomaly Detection)识别异常行为。此外,文本还提到了深度学习(Deep Learning)和图学习(Graph Learning)作为机器学习的前沿领域。

文章目录

  • 分类分析
  • 聚类分析
  • 关联规则挖掘
  • 序列模式挖掘
  • 回归分析
  • 异常检测
  • 深度学习
  • 图学习
Classification 分类

Given a collection of records (training set)

Each record comprises a collection of attributes; among these attributes, one is designated as the class.

Establish a model that class attribute is dependent on the features of other attributes.

The objective is to classify novel records into the most accurate class possible.

该测试集用于评估模型的准确性。通常,给定的数据集会被划分为训练集和测试集,其中训练集用于构建模型,测试集用于验证模型的性能。

在这里插入图片描述
Clustering 聚类

Considering a collection of data points, each characterized by a unique set of attributes, and employing a defined measure of similarity among them, the objective is to group data points into clusters such that

  • Data points within the same cluster are more similar to each other.

  • Data points in distinct clusters are less similar to each other.

  • Measures of Similarity:

    • Euclidean Distance when attributes are continuous.
    • Other Problem-specific Measures, including Cosine Similarity, Hamming Distance, and Gaussian Distance among others.
在这里插入图片描述
Association Rule Discovery 关联规则发现
  • Given a set of records, each containing a certain number of items from a given collection;
  • Generate dependency rules that can predict the occurrence of an item based on the occurrences of other items.
在这里插入图片描述
Sequential Pattern Discovery 顺序模式发现
  • Given a collection of objects, each object is linked to its own timeline of events. Identify rules that elucidate significant sequential dependencies among different events.
  • Rules emerge primarily through the identification of patterns. Events within these patterns are subject to timing constraints.
Regression 回归
  • Estimate the value of a continuous dependent variable using the values of independent variables, under the assumption of linear or nonlinear dependency relationships. Y = aX + b
  • Well-researched in statistics and neural network domains.
  • Illustrative examples include:
    • Estimating sales figures for new products based on advertising budgets.
    • Forecasting wind speeds as a function of temperature, humidity, and atmospheric pressure.
    • Time series analysis for stock market index prediction.

偏差与异常检测的识别与处理,该技术在数据处理中具有重要应用价值

  • Identify significant anomalies in normal behavior patterns.
    • Applications include:
      • Credit Card Fraud Detection System 信用卡欺诈检测系统
      • Network Intrusion Detection System 网络入侵检测系统
Deep Learning
Graph Learning

全部评论 (0)

还没有任何评论哟~