Chapter01.introduction(Machine Learning)
Chapter01.introduction(Machine Learning)
文章目录
- Chapter01.introduction(机器学习)
-
第一节 欢迎
-
- 第一节中的机器学习
- 第一节中的示例
-
2 What is Machine Learning
-
- 2.1 Definition
- 2.2 Application
-
第3章 监督学习
-
4 Unsupervised Learning
-
- 4.2 Application
- 4.3 鸡尾酒问题算法
- 4.4 习题
-
5 Test
1 Welcome
1.1 Machine Learning
- Grew out of in AI
- New capbility for computer
1.2 example
- Data extraction
随着自动化与互联网的发展而产生的大规模数据集合。
例如:直升机自动驾驶系统、手写字符识别技术以及大部分自然语言处理(NLP)领域和计算机视觉领域的研究。
2 What is Machine Learning
2.1 Definition
What is Machine Learning?
- Machine Learning is offered. Arthur Samuel characterized it as follows: The area of study focuses on enabling computers to learn without explicit programming. This represents a more traditional and less formal characterization.
With respect to a computer program learning from experience E, there exists a class of tasks T and a corresponding performance measure P. Such a computer program learns from experience E if its performance on tasks within T, as evaluated by P, enhances with experience E.
Example: playing checkers.
- E equals the experience gained from playing numerous games of checkers.
- T is defined as the task of playing checkers.
- P represents the probability that the program will win the next game.
2.2 Application
In general, any machine learning problem can be categorized into one of two broad classifications.
- Supervised learning
- Unsupervised learning
3 Supervised Learning
In supervised learning, we are provided with a dataset and already know what our correct outputs should be, having formed the notion of a relationship existing between inputs and outputs.
Supervised learning cases are divided into regressive models and classificatory models.
- Within the context of regression problems, our goal is aimed at forecasting outcomes within an uninterrupted outcome space. This essentially involves connecting the inputs with a continuous mathematical function.
- In the realm of classification tasks, our objective shifts towards estimating class labels. To elaborate, this entails mapping input variables with distinct class assignments.
Example 1:
Based on data regarding house sizes in the real estate market, aim to forecast their prices. Price as a function of house size yields continuous outputs, which classifies this as a regression task.
By transforming this instance into a classification problem, we can have our output indicate whether the house "sells for more or less than the asking price." Here, we categorize the houses into two distinct groups based on their prices.
Example 2:
(a) Regression** - Within the context of an individual depicted in an image, our objective is to estimate their age using the information provided in that image.
(b)分类任务 - 对于患有肿瘤的患者而言,我们必须确定其肿瘤性质是否为恶性或良性的特征。
3.1 预测房产价格
- 收集房价数据集

- 拟合一条直线

- 拟合一条曲线

3.2 definition
-
Supervised Learning
“right anwers” given -
Regression : Predict comtinous valued output(price)
3.3 预测乳腺癌良性还是恶性

- 标记数据集

- 给出肿瘤位置,对肿瘤分类

Classification
Discrete valued output(0 or 1)
Discrete valued output(0, 2 ,3…)

此例中只有一个特征(属性):tumor size
在实际中会有多个属性
已知年龄、肿瘤大小

标记数据集

给出一个数据

分类

Other feature
Clump Thickness
Uniformity of Cell Size
Uniformity of Cell Shape
…
3.4 总结
- Supervised learning :在supervised learning(监督学习)中,在给定的数据集中每一个样本的情况下(即对于每一个输入实例),我们希望算法能够通过学习过程准确地进行预测并确定出正确的答案。
- Regression problem :在regression problem(回归问题)中,在这种情况下(即针对连续型的目标变量),我们的目标是使算法能够通过建立数学模型来预测出一个连续数值结果。
- Classification problem :在classification problem(分类问题)中,在这种情况下(即针对离散型的目标变量),我们的目标是使算法能够通过建立分类模型来预测出明确的类别归属。
3.5 习题

答案:3,2
4 Unsupervised Learning
- 无监督学习使我们能够面对那些缺乏明确结果预期的问题。
- 我们可以从数据中推导出结构,在这种情况下变量之间的关系尚不明确。
- 通过基于数据变量间的关系进行聚类分析而无需预设结果。
- 在无监督学习中没有基于预测结果的反馈机制。
- 无监督学习特别适用于探索性数据分析,在这种分析中发现数据集内的潜在模式和关系至关重要。
Example:
- Cluster分析:对1,000,000种不同基因进行处理,并通过自动识别的方法将这些基因按照与某些变量相关的相似性或关联性分组。
- 非Cluster分析:"鸡尾酒-party算法"允许你在混乱环境中发现结构(即从大量声音中区分出个体声音和音乐成分)。
4.1 Introduction
- 没有标签

- 聚类算法

4.2 Application
- Google search

- DNA


- Others

4.3 鸡尾酒问题算法


代码
[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
svd Function:singer value decomposition 奇异值分解
内置在octave了
4.4 习题

答案:2,3
5 Test
Introduction section, Total score: 5. A computer program is said to learn from experience E in relation to a task T and a performance measure P when it achieves the specified performance criterion.
performance on T, as measured by P, improves with experience E.
Suppose we feed a learning algorithm a lot of historical weather
data, and have it learn to predict weather. What would be a
reasonable choice for P?
The probability of it correctly predicting a future date’s weather.
None of these.
The procedure of the algorithm analyzing massive quantities of historical meteorological data.
The weather prediction task.
假设你在从事天气预报的工作,并希望预测明天下午5点是否会下雨。你希望使用一种学习算法来解决这个问题吗?你是将其视为分类问题还是回归问题?
- Classification
- Regression
- If you're involved in stock market prediction, your task is to determine whether a specific company will prevail in a patent infringement suit (as derived from data of businesses that were required to defend against analogous lawsuits). Should this scenario be classified as a classification or regression issue?
- Regression
- Classification
Among the problems listed below, some are most effectively handled by employing a supervised learning algorithm while others can be appropriately managed using an unsupervised learning approach. Which of these questions would you identify as suitable for applying supervised learning? (select all that apply) For each scenario, it is assumed that an appropriate dataset is available for your algorithm to learn from.
难点:分清楚聚类和离散型分类的区别
Given a dataset comprising the responses of 1,000 medical individuals to an experimental drug (including metrics such as treatment efficacy, potential adverse effects, etc.), analyze whether distinct patient categories or 'types' emerge based on their responses to the medication, and identify what these categories might be.
Considering the comprehensive nature of the dataset, which consists of detailed medical records from individuals diagnosed with heart disease, our objective is to investigate whether distinct patient groups exist so that personalized treatment strategies can be tailored accordingly.
Within the agricultural sector, utilizing historical data from the past 50 years, acquire knowledge and skills to forecast future crop yields.
Review the webpage and evaluate its content to determine if it should be classified as child-friendly or adult. Consider factors such as non-pornographic material and appropriate content for minors when making this assessment.
5.Which of these is a reasonable definition of machine learning?
Machine learning is a domain where computers gain capacity to learn without explicit programming.
Machine learning is learned from labeled data.
Machine learning is dedicated to enabling intelligent actions by robots.
Machine learning represents the science behind computer algorithms.
