Statistics with R-Linear Regression-Week 2-Introduction to linear regression

阅读量：

Plot the data

分析两个变量之间的关系：其中一个是因变量（因数），另一个是自变量（控制因素），使用plot功能生成散点图。

复制代码

 ggplot(mlb11, mapping = aes(x = runs, y = at_bats)) +

    
   geom_point()

通过图形可以看出（如图所示），数据呈现明显的正线性关系）。然而为了更精确地计算出相关系数R值，则需要进一步的数据统计分析工作。

复制代码

 mlb11 %>%

    
   summarise(cor(runs, at_bats))

cor(runs, at_bats)

0.610627
1 row

R=0.61

Sum of squared residuals

在上一节的学习中发现两个变量之间存在正向关系且高度相关。接下来将重点探讨离群点及其对模型的影响以及残差分析。

用这个命令可以得到最小残差值，并根据最小二乘法得到回归模型

复制代码

    plot_ss(x = at_bats, y = runs, data = mlb11)

Call:
lm(formula = y ~ x, data = pts)

Coefficients:
(Intercept) x
-2789.2429 0.6305

Sum of Squares: 123721.9

如果想要看到残差的平方分布情况，可以加入一个showSquares = TRUE

复制代码

    plot_ss(x = at_bats, y = runs, data = mlb11, showSquares = TRUE)

得到图形如下

The linear model

采用最小残差平方的方法确实能够获得回归模型。然而这种做法会非常耗时费力，并且容易出现误差。下面的命令指令能够生成回归模型。

复制代码

    m1 <- lm(runs ~ at_bats, data = mlb11)

lm意味着y~x，输出lm得到我们需要的全部数据

复制代码

    summary(m1)

Residuals

Coefficients

Residual standard error

Multiple R-squared Adjusted R-squared

F-statistic p-value

Prediction and prediction errors

刚刚, 我们接着绘制了散点图, 并通过使用lm函数得出了回归直线. 现在, 我们将它们整合到图形中

复制代码

 ggplot(data = mlb11, aes(x = at_bats, y = runs)) +

    
   geom_point() +
    
   stat_smooth(method = "lm", se = FALSE)

stat_smooth creates the line by fitting a linear model.

method = lm，用了最小残差法，

se：standard Error，可以显示在回归线周围，这里省略了（对比）

这条线可用于预测任意给定的x值对应的y值。然而，在预测超出数据范围的x值时（即称为extrapolation），结果往往不够准确而不建议采用

Model diagnostics模型验证

要验证模型是否可信，我们需要检测

linearity
nearly normal residuals
constant variability

Linearity（线性）

我们通过散点图确认了变量间的线性关联；此外，我们还需借助残差分布图和拟合优度（fitted or predicted values）来进一步验证模型的适用性。

复制代码

 ggplot(data = m1, aes(x = .fitted, y = .resid)) +

    
   geom_point() +
    
   geom_hline(yintercept = 0, linetype = "dashed") +
    
   xlab("Fitted values") +
    
   ylab("Residuals")

用于验证的模型为m1；横轴表示自变量的预测值；纵轴显示观测值与预测值之间的差异；中间线位于y=0的位置；以虚线形式绘制该中间线。

Nearly normal residuals （残差正态分布）

对残差值做一个分布直方图

复制代码

 ggplot(data = m1, aes(x = .resid)) +

    
   geom_histogram(binwidth = 25) +
    
   xlab("Residuals")

或者一个正态概率图

复制代码

 ggplot(data = m1, aes(sample = .resid)) +

    
   stat_qq()

we assigned the sample to represent the residuals rather than x. Additionally, we implemented the statistical technique known as qq, which stands for quantile-quantile or Q-Q plot, a common approach in analyzing normal probability distributions.

Constant variability（方差不变）

用linearity的残差图，随着x值增大，残差围绕0均匀分布

全部评论 (0)

还没有任何评论哟~

Statistics with R-Linear Regression-Week 2-Introduction to linear regression

Plotthedata 要得到两个变量的关系，一个是因变量，一个是自变量，通过plot功能画出散点图 ggplotmlb11,mapping=aesx=runs,y=atbats+ geompoint...

An introduction to Linear Regression

AnIntroductiontoLinearRegression Intent supervisedlearning: \regressioncontinious \classficationdisc...

C1 Week 1:(2)Linear Regression with One Variable

写在前面：该笔记为学习吴恩达团队在coursera新开设的机器学习课程而记录，新版本采用python进行授课且相较于旧版有小部分改动官方网址：https://www.coursera.org/spe...

C1 Week 2:(1)Linear Regression with Multiple Variables

写在前面：该笔记为学习吴恩达团队在coursera新开设的机器学习课程而记录，新版本采用python进行授课且相较于旧版有小部分改动官方网址：<https://www.coursera.org/sp...

Machine Learning Week 2 Linear Regression with multiple variables in Matlab or Octave

多变量线性回归程序实现 MatlaborOctave 第一步：变量归一化用函数mean和std求均值和标准差 mu=meanX; sigma=stdX; fori=1:n Xnorm:,i=Xnor...

重学Statistics， Cha14 Simple Linear Regression

14.1SimpleLinearRegressionModel SimpleLinearRegressionModel:y=β0+β1x+ε β0β1arereferredtoasparameters...

Statistics with R--Introduction to data--Week 1 Introduction to R and RStudio--Note

下载后，最先要做的是下载是哪个经常用的函数包 RPackages statsr:fordatafilesandfunctionsusedinthiscourse dplyr:fordatawrangl...

Week 1 Linear regression (线性回归）

Week1Linearregression线性回归 Ihavebeenlearningthestatisticsforaboutninemonths.Ninemonthsago,Ididn’tbeli...

Statistics with R--Introduction to data--week 3 Probability--Notes

目的：用R模拟，验证每一次科比每一次投篮是不是独立事件首先，拿到科比的133次投篮数据，数steak数（遇到miss就算一个） calcstreak函数是用来数steak的，并根据结果画出直方图但...

Bayesian Linear Regression : R Language

利用最小二乘法的策略进行线性拟合时往往会因为过拟合现象而使测试结果误差偏大，尽管我们可以利用正则化手段加以约束，但我们也可以通过贝叶斯概率进行线性拟合来解决这个问题。简单来说，其基本思想就是利用贝叶...

是否确定退出登录?

Statistics with R-Linear Regression-Week 2-Introduction to linear regression

Plot the data

Sum of squared residuals

The linear model

Prediction and prediction errors

Model diagnostics模型验证

Linearity（线性）

Nearly normal residuals （残差正态分布）

Constant variability（方差不变）

全部评论 (0)

相关文章推荐

Statistics with R-Linear Regression-Week 2-Introduction to linear regression

An introduction to Linear Regression

C1 Week 1:(2)Linear Regression with One Variable

C1 Week 2:(1)Linear Regression with Multiple Variables

Machine Learning Week 2 Linear Regression with multiple variables in Matlab or Octave

重学Statistics， Cha14 Simple Linear Regression

Statistics with R--Introduction to data--Week 1 Introduction to R and RStudio--Note

Week 1 Linear regression (线性回归）

Statistics with R--Introduction to data--week 3 Probability--Notes

Bayesian Linear Regression : R Language