s q x_observe x_true
1 Male Chinese 108.40180 105
2 Male Chinese 118.55324 105
3 Male Chinese 106.22300 105
4 Male Chinese 93.34974 105
5 Male Chinese 99.61207 105
6 Male Chinese 106.41466 105
7 Male Chinese 86.60831 105
8 Male Chinese 97.82226 105
9 Male Chinese 109.24610 105
10 Male Chinese 102.48240 105
11 Male English 72.83765 80
12 Male English 68.16789 80
13 Male English 73.91391 80
14 Male English 83.33698 80
15 Male English 71.99952 80
16 Male English 87.73515 80
17 Male English 81.87990 80
18 Male English 88.16211 80
19 Male English 53.36006 80
20 Male English 87.37134 80
21 Female Chinese 102.30977 101
22 Female Chinese 92.93527 101
23 Female Chinese 103.93785 101
24 Female Chinese 94.64800 101
25 Female Chinese 96.57304 101
26 Female Chinese 104.57913 101
27 Female Chinese 95.55705 101
28 Female Chinese 88.27712 101
29 Female Chinese 99.58412 101
30 Female Chinese 112.70983 101
31 Female English 84.90379 90
32 Female English 86.13183 90
33 Female English 71.40795 90
34 Female English 82.29537 90
35 Female English 76.74490 90
36 Female English 91.51879 90
37 Female English 72.88005 90
38 Female English 87.69148 90
39 Female English 71.29789 90
40 Female English 102.46125 90
最后一列总体值看不到
想研究智商成绩在性别、语言上的区别
也就是性别、语言对智商能不能部分地预测
回归的表述:成绩=常数截距+性别预测增值+语言预测增值+二者交互增值+预测残差
先理解一下真实的参数值,四组的总体
Call:
lm(formula = x_true ~ 0 + s:q)
Coefficients:
sFemale:q English sMale:q English sFemale:qChinese sMale:qChinese
90 80 101 105
先理解一下真实的参数值,四组的总体差异被拆解成三个差异分项
Call:
lm(formula = x_true ~ 1 + s + q + s:q)
Coefficients:
(Intercept) sMale qChinese sMale:qChinese
90 -10 11 14
基于观测值的预测会受到随机误差扰动,
四组组内均值的观测
Call:
lm(formula = x_observe ~ 0 + s:q)
Coefficients:
sFemale:q English sMale:q English sFemale:qChinese sMale:qChinese
82.73 76.88 99.11 102.87
四组三个差异分项的观测
Call:
lm(formula = x_observe ~ 1 + s + q + s:q)
Coefficients:
(Intercept) sMale qChinese sMale:qChinese
82.733 -5.857 16.378 9.617
考虑这种扰动之后,各个预测的95%把握置信区间上下界
如果这个区间不包括0,表示有大于95%的把握确信这个系数有预测作用,无论大小。
但是,交互项的存在使得解读变得困难。比如,男生-女生的效应包括sMale + (1/2) sMale:qChinese
2.5 % 97.5 %
(Intercept) 76.710620 88.756036
sMale -14.374274 2.660517
qChinese 7.860394 24.895185
sMale:qChinese -2.428299 21.662534
ANOVA的表述:成绩的波动=性别预测增值的波动+语言预测增值的波动+残差波动
“波动”的操作化定义是Sum of Squares(SS),一列数与其均值的差距平方和
Df表示每个波动项吸收抽样误差波动的理论比例
Sum Sq表示每个波动项解释波动的观测比例,
如果和Df的理论比例极端不匹配(F比1大很多),就支持其中包含由不同总体带入的非随机波动
最后一列Pr(>F)是一般的假设检验中反映F极端程度的p值
Analysis of Variance Table
Response: x_observe
Df Sum Sq Mean Sq F value Pr(>F)
s 1 11.0 11.0 0.1246 0.7261
q 1 4488.6 4488.6 50.8985 2.190e-08 ***
s:q 1 231.2 231.2 2.6219 0.1141
Residuals 36 3174.8 88.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1