Hypothesis with R and Understanding of P-value and confidence-interval

阅读量：

Hypothesis with R and Understanding of P-value and confidence-interval

Hypothesis with R
- 数据集介绍
  - 数据展示
  - 使用t-test（small samples）进行双尾假设检验

Hypothesis with R

数据集说明

基于Galton数据集，检验儿子和女儿与母亲身高的相关性

复制代码

    library("AzureML")
    ws <- workspace()
    galton <- download.datasets(ws, "GaltonFamilies.csv")
    head(galton)

The first 6 rows of the data and the columns:

复制代码

    dim(galton)

939 rows and 0 columns (attributes)

数据可视化

画直方图展示分别展示母亲与儿子，母亲与女儿的身高关系

复制代码

    hist.plot = function(df, col, bw, max, min){
       ggplot(df, aes_string(col)) + geom_histogram( binwidth = bw ) + xlim(min,max)
    }
    
    hist.family = function(df, col1, col2, num.bin = 30){
       require(ggplot2)
       require(gridExtra)
       ## compute bin width
       max = max(c(df[, col1], df[, col2]))
       min = min(c(df[, col1], df[, col2]))
       bin.width = (max - min)/num.bin
       ## create a first histogram
       p1 = hist.plot(df, col1, bin.width, max, min)
       p1 = p1 + geom_vline(xintercept = mean(df[, col1]), color = 'red', size = 1)
       ## create a first histogram
       p2 = hist.plot(df, col2, bin.width, max, min)
       p2 = p2 + geom_vline(xintercept = mean(df[, col2]), color = 'red', size = 1)
       ## stack the plot
       grid.arrange(p1,p2, nrow = 2, ncol = 1)
    }
    
    sons = galton[galton$gender=='male',]
    hist.family(sons,'childHeight','mother')

在画图中，使用geom_vline()来定位均值进行对比。结果如下：
儿子与母亲

女儿与母亲

观察到男性子女与母亲的身高重叠区域较小；相比之下，则是女性子女的身高分布与母亲高度相似；基于此我们提出零假设：母系身高均值相等（即 μ₁ - μ₂ = 0）；替代假设则为 μ₁ ≠ μ₂

使用t-test（small samples）进行双边假设检验

复制代码

    ##H0:  there is no significant difference between the means
    families.test <- function(df, col1, col2, paired = TRUE){
    t.test(df[,col1],df[,col2],paired=paired)
    }
    
    hist.family.conf <- function(df, col1, col2, num.bin = 30, paired=FALSE){
    require(ggplot2)
    require(gridExtra)
    
    max = max(c(df[,col1], df[,col2]))
    min = min(c(df[,col1], df[,col2]))
    bin.width = (max-min)/num.bin
    
    mean1 <- mean(df[,col1])
    mean2 <- mean(df[,col2])
    t <- t.test(df[,col1],df[,col2],paired=paired)
    pv1 <- mean2 + t$conf.int[1]
    pv2 <- mean2 + t$conf.int[2]
    ## plot a histogram
    p1 <- hist.plot(df,col1,bin.width,max,min)
    p1 <- p1 + geom_vline(xintercept = mean1,
                        color = 'red', size = 1) + 
             geom_vline(xintercept = pv1,
                        color = 'red', size = 1, linetype = 2)  + 
             geom_vline(xintercept = pv2,
                        color = 'red', size = 1, linetype =2) 
      
    ## A simple boxplot
    p2 <-  hist.plot(df, col2, bin.width, max, min)
    p2 <- p2 + geom_vline(xintercept = mean2,
                        color = 'red', size = 1.5)
    
    ## Now stack the plots
    grid.arrange(p1, p2, nrow = 2)
    
    print(t)
    }
    hist.family.conf(sons,'mother','childHeight')

儿子-母亲身高均值差为0检验结果：

对于置信区间与p-value的理解：
假设身高差服从自由度为 $k-1$ 的 $t$ 分布，则在 $son-mother$ 检验案例中（即 $儿子$ 与 $母亲$ 身高的差异分析），95%的置信区间为 $[-5.514, -4.887]$ （基于该 $t$ 分布计算得出），意味着 $\mu_1-\mu_2$ （即两种群体均值之差）落在该区间内的概率为95%。进一步计算 $\mu_1-\mu_2=0$ 的概率（基于该 $t$ 分布），结果仅为 $<2.2E-16$ （远低于0.05显著性水平 $\alpha$ ）。因此我们有足够的证据拒绝null hypothesis（零假设），从而推断儿子身高与母亲身高的均值不相同。

同样的方法得到daughter-mother身高均值检验结果：

miu1与miu2之间的差异值落在区间[-0.25, 0.34]的概率达到95%。根据所采用的t分布模型进行计算得到的p值为0.7701显著高于设定的显著性水平α=0.05，在此情况下我们有足够的统计学证据支持null hypothesis成立

通过计算得出p值时是以零假设成立为基础的概率度量，在统计学中它仅仅起到接受或拒绝假设的作用，并不能直接反映所得到结论的重要性。

全部评论 (0)

还没有任何评论哟~

Hypothesis with R and Understanding of P-value and confidence-interval

HypothesiswithRandUnderstandingofPvalueandconfidenceinterval HypothesiswithR 数据集说明数据可视化使用ttest（sma...

用故事讲清楚统计学的Confidence Interval（置信区间）and Hypothesis Test

所谓计量经济学，就是用统计手段去研究经济学问题。作为统计最大的一个部分，统计推论StatisticalInference是必不可少的过程。让我们回顾一下在之前的统计课程中，我们提到的Statistic...

P Laguna/ A database for evaluation of algorithms for measurement of QT and other waveform interval

空间从七个数据库中选择出105个ECG记录，手工或自动标记P,QRS,T,U起始点和位置。可用于波形定位和边界定位算法的评估。收藏于20110621 来自于百度空间

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

FixMatch:SimplifyingSemiSupervisedLearningwithConsistencyandConfidence 摘要 1引言 2FixMatch 2.1Backgroun...

Inspecting and Understanding Neural Networks with Integ

作者：禅与计算机程序设计艺术 1.简介随着深度学习在计算机视觉、自然语言处理、语音识别等领域中的应用越来越广泛，许多研究者也试图理解深度神经网络背后的机制并进一步改善神经网络模型的性能。

Understanding the geometry of intelligence and cognitio

作者：禅与计算机程序设计艺术 1.简介近年来，随着深度学习技术的不断提升，计算机视觉领域取得了令人惊艳的成就。其中，人脸识别技术发挥着越来越大的作用，在很多应用场景中被广泛采用，比如手机相机里面的人...

Understanding the Geometry of Perception and Action thr

作者：禅与计算机程序设计艺术 1.简介理解人类的感知与行为之间的几何形态关系一直是计算机视觉领域研究的一个热点。随着物体识别、人脸识别等技术的不断发展，传统的人类视觉系统也越来越落后于机器视觉系统，...

解读FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

还是Googleresearch出品论文：<https://arxiv.org/abs/2001.07685 官方代码：<https://github.com/googleresearch/fixm...

Understanding the Three Types of Hypervisors and How Th

作者：禅与计算机程序设计艺术 1.简介 Hypervisor是指一种仿真或虚拟化技术，它可以帮助虚拟机运行在物理服务器上，并且通过硬件加速、资源隔离等方式实现性能提升。hypervisor有三种主要类...

DOM CSS: Understanding the Intersection of HTML and Style

DOMCSS:UnderstandingtheIntersectionofHTMLandStyle Introduction TheDocumentObjectModelDOMandCascading...

是否确定退出登录?

Hypothesis with R and Understanding of P-value and confidence-interval

Hypothesis with R and Understanding of P-value and confidence-interval

Hypothesis with R

数据集说明

数据可视化

使用t-test（small samples）进行双边假设检验

全部评论 (0)

相关文章推荐

Hypothesis with R and Understanding of P-value and confidence-interval

用故事讲清楚统计学的Confidence Interval（置信区间）and Hypothesis Test

P Laguna/ A database for evaluation of algorithms for measurement of QT and other waveform interval

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Inspecting and Understanding Neural Networks with Integ

Understanding the geometry of intelligence and cognitio

Understanding the Geometry of Perception and Action thr

解读FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Understanding the Three Types of Hypervisors and How Th

DOM CSS: Understanding the Intersection of HTML and Style