3 几种离散型变量的分布及其应用

主要介绍两种离散型分布及其应用：二项分布和泊松分布。对应课本第六章：几种离散型变量的分布及其应用。

3.1 二项分布

又称：伯努利分布。

3.1.1 总体率的区间估计

例6-2。直接使用二项分布计算。13名输卵管结扎的育龄妇女术后，有6人受孕，求95%可信区间。

binom.test(x = 6, n = 13, conf.level = 0.95)
## 
##  Exact binomial test
## 
## data:  6 and 13
## number of successes = 6, number of trials = 13, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.1922324 0.7486545
## sample estimates:
## probability of success 
##              0.4615385

由结果可知，可信区间为(0.1922324,0.7486545)。

例6-3。100人有55人有效，求可信区间。近似正态法，则95%可信区间为的计算方法类似于：均值±1.96*标准误

按课本公式6-8计算Sp:

# 计算Sp
Sp <- sqrt((0.55 * (1-0.55)) / 100)
Sp
## [1] 0.04974937

计算95%可信区间：

0.55+1.96*Sp
## [1] 0.6475088
0.55-1.96*Sp
## [1] 0.4524912

由结果可知，可信区间为(0.4524912,0.6475088)。

或者直接使用prop.test()函数（结果略有不同，非常接近）：

prop.test(x=55,n=100,correct = F)
## 
##  1-sample proportions test without continuity correction
## 
## data:  55 out of 100, null probability 0.5
## X-squared = 1, df = 1, p-value = 0.3173
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4524460 0.6438546
## sample estimates:
##    p 
## 0.55

3.1.2 样本率和总体率的比较

例6-4。使用直接法，或者精确二项检验。已知受孕率为0.55，10名育龄妇女有9人受孕，问A方法的受孕率是否高于B方法？

binom.test(x = 9, n = 10, p = 0.55, alternative = "greater")
## 
##  Exact binomial test
## 
## data:  9 and 10
## number of successes = 9, number of trials = 10, p-value = 0.02326
## alternative hypothesis: true probability of success is greater than 0.55
## 95 percent confidence interval:
##  0.6058367 1.0000000
## sample estimates:
## probability of success 
##                    0.9

结论：拒绝H0，接受H1，可认为A方法的受孕率高于B方法。

例6-5。使用直接法，或者精确二项检验。10人患病，9人有效，问甲、乙两种药物疗效是否不同？

binom.test(x=9, n=10, p=0.6, alternative = "two.sided")
## 
##  Exact binomial test
## 
## data:  9 and 10
## number of successes = 9, number of trials = 10, p-value = 0.05865
## alternative hypothesis: true probability of success is not equal to 0.6
## 95 percent confidence interval:
##  0.5549839 0.9974714
## sample estimates:
## probability of success 
##                    0.9

结论：不拒绝H0，尚不能认为两种药物的疗效不同。

例6-6。180名患者治愈117人，问新方法是否比常规方法好？这个是正态近似法，可以根据公式6-13计算u值，然后查表。

或者直接用prop.test()：

prop.test(x=117,n=180,p=0.45, alternative = "greater",correct = F)
## 
##  1-sample proportions test without continuity correction
## 
## data:  117 out of 180, null probability 0.45
## X-squared = 29.091, df = 1, p-value = 3.453e-08
## alternative hypothesis: true p is greater than 0.45
## 95 percent confidence interval:
##  0.5896943 1.0000000
## sample estimates:
##    p 
## 0.65

结论：拒绝H0，接受H1，即新疗法比常规疗法效果好。

3.1.3 两样本率的比较

例6-7。研究颈椎病的发病有没有差异。

准备数据，矩阵或者table格式：

t67 <- matrix(c(36,84,22,88), ncol = 2,byrow = T,
              dimnames = list(c("male","female"),c("success","failure")))
t67
##        success failure
## male        36      84
## female      22      88

进行两样本率的比较：

prop.test(t67,correct = F)
## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  t67
## X-squared = 3.0433, df = 1, p-value = 0.08107
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.01095102  0.21095102
## sample estimates:
## prop 1 prop 2 
##    0.3    0.2

结论：不拒绝H0，尚不能认为该职业人群颈椎病的发病有性别差异。

例6-8。分析家族集聚性。

x <- c(26,10,28,18)
p <- c(0.13265,0.38235,0.36735,0.11765)

进行卡方检验：

chisq.test(x=x,p=p)
## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 42.949, df = 3, p-value = 2.523e-09

结论：拒绝H0，接受H1，认为此种疾病存在家族集聚性。

3.2 泊松分布

描述小概率事件的发生规律。

3.2.1 总体均数的区间估计

例6-10。某工厂在环境监测中，对一实施技术改造的生产车间做空气中粉尘浓度的监测，1立升空气中测得粉尘粒子数为21，假定粉尘分布均匀，试估计该车间平均每立方空气中所含粉尘颗粒数的95%和99%可信区间。

直接法，也就是精确泊松检验。

# 95%可信区间
poisson.test(x=21)
## 
##  Exact Poisson test
## 
## data:  21 time base: 1
## number of events = 21, time base = 1, p-value < 2.2e-16
## alternative hypothesis: true event rate is not equal to 1
## 95 percent confidence interval:
##  12.99933 32.10073
## sample estimates:
## event rate 
##         21

# 99%可信区间
poisson.test(x=21, conf.level = 0.99)
## 
##  Exact Poisson test
## 
## data:  21 time base: 1
## number of events = 21, time base = 1, p-value < 2.2e-16
## alternative hypothesis: true event rate is not equal to 1
## 99 percent confidence interval:
##  11.06923 35.94628
## sample estimates:
## event rate 
##         21

例6-11。正态近似法。直接根据公式（6-18）计算。

# 95%可信区间
68-1.96*sqrt(68)
## [1] 51.83743
68+1.96*sqrt(68)
## [1] 84.16257

# 99%可信区间
68-2.58*sqrt(68)
## [1] 46.72477
68-2.58*sqrt(68)
## [1] 46.72477

3.2.2 样本均数和总体均数的比较

例6-12。直接法。探讨母亲吸烟是否会增加小孩先心病的发病风险。

poisson.test(x = 4, T=120, r=0.008,alternative = "greater")
## 
##  Exact Poisson test
## 
## data:  4 time base: 120
## number of events = 4, time base = 120, p-value = 0.01663
## alternative hypothesis: true event rate is greater than 0.008
## 95 percent confidence interval:
##  0.01138599        Inf
## sample estimates:
## event rate 
## 0.03333333
# 或者
1-ppois(q=4-1,lambda = 120*0.008)
## [1] 0.01663305

结论：拒绝H0，接受H1，认为母亲吸烟会增加小孩先心病的发病风险。

例6-13。正态近似法。

prop.test(x = 123, n=25000, p=0.003,alternative = "greater",correct = F)
## 
##  1-sample proportions test without continuity correction
## 
## data:  123 out of 25000, null probability 0.003
## X-squared = 30.812, df = 1, p-value = 1.421e-08
## alternative hypothesis: true p is greater than 0.003
## 95 percent confidence interval:
##  0.004243748 1.000000000
## sample estimates:
##       p 
## 0.00492

结论：拒绝H0，接受H1，认为有亲缘血统婚配关系的后代其精神发育不全的发生率高于一般人群。

3.2.3 两个样本均数的比较（有问题）

例6-14。两种纯净水分别抽检1ml，分别培养出大肠杆菌4个和7个，问有无差别？

poisson.test(c(4,7),c(1,1))
## 
##  Comparison of Poisson rates
## 
## data:  c(4, 7) time base: c(1, 1)
## count1 = 4, expected count1 = 5.5, p-value = 0.5488
## alternative hypothesis: true rate ratio is not equal to 1
## 95 percent confidence interval:
##  0.1226664 2.2477580
## sample estimates:
## rate ratio 
##  0.5714286

结论：不拒绝H0，尚不能认为有差别。

例6-15。分析一种罕见的非传染性疾病发病的地域差异。

poisson.test(c(32,12),c(4,3))
## 
##  Comparison of Poisson rates
## 
## data:  c(32, 12) time base: c(4, 3)
## count1 = 32, expected count1 = 25.143, p-value = 0.04653
## alternative hypothesis: true rate ratio is not equal to 1
## 95 percent confidence interval:
##  1.002761 4.264145
## sample estimates:
## rate ratio 
##          2

结论：拒绝H0，接受H1，可认为该疾病的发病存在地域性差异。