s q x_observe x_true
1 Male Chinese 108.14496 105
2 Male Chinese 108.09389 105
3 Male Chinese 103.96771 105
4 Male Chinese 106.03703 105
5 Male Chinese 97.77368 105
6 Male Chinese 104.05958 105
7 Male Chinese 109.32603 105
8 Male Chinese 89.91419 105
9 Male Chinese 112.88127 105
10 Male Chinese 103.03040 105
11 Male English 101.71382 80
12 Male English 92.62139 80
13 Male English 69.63017 80
14 Male English 88.48965 80
15 Male English 84.41235 80
16 Male English 85.66448 80
17 Male English 74.50572 80
18 Male English 87.30249 80
19 Male English 84.51486 80
20 Male English 74.98211 80
21 Female Chinese 104.62559 101
22 Female Chinese 106.62011 101
23 Female Chinese 92.47780 101
24 Female Chinese 110.88588 101
25 Female Chinese 103.71868 101
26 Female Chinese 97.13806 101
27 Female Chinese 112.15624 101
28 Female Chinese 92.64551 101
29 Female Chinese 92.22782 101
30 Female Chinese 98.11580 101
31 Female English 92.61760 90
32 Female English 79.16152 90
33 Female English 79.15314 90
34 Female English 91.61179 90
35 Female English 92.03364 90
36 Female English 65.12303 90
37 Female English 93.96194 90
38 Female English 89.80092 90
39 Female English 95.70645 90
40 Female English 92.73166 90
最后一列总体值看不到
想研究智商成绩在性别、语言上的区别
也就是性别、语言对智商能不能部分地预测
回归的表述:成绩=常数截距+性别预测增值+语言预测增值+二者交互增值+预测残差
先理解一下真实的参数值,四组的总体
Call:
lm(formula = x_true ~ 0 + s:q)
Coefficients:
sFemale:q English sMale:q English sFemale:qChinese sMale:qChinese
90 80 101 105
先理解一下真实的参数值,四组的总体差异被拆解成三个差异分项
Call:
lm(formula = x_true ~ 1 + s + q + s:q)
Coefficients:
(Intercept) sMale qChinese sMale:qChinese
90 -10 11 14
基于观测值的预测会受到随机误差扰动,
四组组内均值的观测
Call:
lm(formula = x_observe ~ 0 + s:q)
Coefficients:
sFemale:q English sMale:q English sFemale:qChinese sMale:qChinese
87.19 84.38 101.06 104.32
四组三个差异分项的观测
Call:
lm(formula = x_observe ~ 1 + s + q + s:q)
Coefficients:
(Intercept) sMale qChinese sMale:qChinese
87.190 -2.806 13.871 6.068
考虑这种扰动之后,各个预测的95%把握置信区间上下界
如果这个区间不包括0,表示有大于95%的把握确信这个系数有预测作用,无论大小。
但是,交互项的存在使得解读变得困难。比如,男生-女生的效应包括sMale + (1/2) sMale:qChinese
2.5 % 97.5 %
(Intercept) 81.800386 92.579951
sMale -10.428768 4.815838
qChinese 6.248677 21.493284
sMale:qChinese -4.711374 16.847756
ANOVA的表述:成绩的波动=性别预测增值的波动+语言预测增值的波动+残差波动
“波动”的操作化定义是Sum of Squares(SS),一列数与其均值的差距平方和
Df表示每个波动项吸收抽样误差波动的理论比例
Sum Sq表示每个波动项解释波动的观测比例,
如果和Df的理论比例极端不匹配(F比1大很多),就支持其中包含由不同总体带入的非随机波动
最后一列Pr(>F)是一般的假设检验中反映F极端程度的p值
Analysis of Variance Table
Response: x_observe
Df Sum Sq Mean Sq F value Pr(>F)
s 1 0.52 0.52 0.0073 0.9322
q 1 2857.82 2857.82 40.4639 2.294e-07 ***
s:q 1 92.06 92.06 1.3034 0.2611
Residuals 36 2542.55 70.63
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1