s        q x_observe x_true
1    Male Chinese  108.14496    105
2    Male Chinese  108.09389    105
3    Male Chinese  103.96771    105
4    Male Chinese  106.03703    105
5    Male Chinese   97.77368    105
6    Male Chinese  104.05958    105
7    Male Chinese  109.32603    105
8    Male Chinese   89.91419    105
9    Male Chinese  112.88127    105
10   Male Chinese  103.03040    105
11   Male  English 101.71382     80
12   Male  English  92.62139     80
13   Male  English  69.63017     80
14   Male  English  88.48965     80
15   Male  English  84.41235     80
16   Male  English  85.66448     80
17   Male  English  74.50572     80
18   Male  English  87.30249     80
19   Male  English  84.51486     80
20   Male  English  74.98211     80
21 Female Chinese  104.62559    101
22 Female Chinese  106.62011    101
23 Female Chinese   92.47780    101
24 Female Chinese  110.88588    101
25 Female Chinese  103.71868    101
26 Female Chinese   97.13806    101
27 Female Chinese  112.15624    101
28 Female Chinese   92.64551    101
29 Female Chinese   92.22782    101
30 Female Chinese   98.11580    101
31 Female  English  92.61760     90
32 Female  English  79.16152     90
33 Female  English  79.15314     90
34 Female  English  91.61179     90
35 Female  English  92.03364     90
36 Female  English  65.12303     90
37 Female  English  93.96194     90
38 Female  English  89.80092     90
39 Female  English  95.70645     90
40 Female  English  92.73166     90

最后一列总体值看不到
想研究智商成绩在性别、语言上的区别
也就是性别、语言对智商能不能部分地预测

回归的表述:成绩=常数截距+性别预测增值+语言预测增值+二者交互增值+预测残差

先理解一下真实的参数值,四组的总体

Call:
lm(formula = x_true ~ 0 + s:q)

Coefficients:
sFemale:q English    sMale:q English  sFemale:qChinese     sMale:qChinese   
               90                 80                101                105  


先理解一下真实的参数值,四组的总体差异被拆解成三个差异分项

Call:
lm(formula = x_true ~ 1 + s + q + s:q)

Coefficients:
    (Intercept)            sMale        qChinese   sMale:qChinese   
             90              -10               11               14  


基于观测值的预测会受到随机误差扰动,

四组组内均值的观测

Call:
lm(formula = x_observe ~ 0 + s:q)

Coefficients:
sFemale:q English    sMale:q English  sFemale:qChinese     sMale:qChinese   
            87.19              84.38             101.06             104.32  


四组三个差异分项的观测

Call:
lm(formula = x_observe ~ 1 + s + q + s:q)

Coefficients:
    (Intercept)            sMale        qChinese   sMale:qChinese   
         87.190           -2.806           13.871            6.068  


考虑这种扰动之后,各个预测的95%把握置信区间上下界
如果这个区间不包括0,表示有大于95%的把握确信这个系数有预测作用,无论大小。
但是,交互项的存在使得解读变得困难。比如,男生-女生的效应包括sMale + (1/2) sMale:qChinese
                     2.5 %    97.5 %
(Intercept)      81.800386 92.579951
sMale           -10.428768  4.815838
qChinese          6.248677 21.493284
sMale:qChinese   -4.711374 16.847756

ANOVA的表述:成绩的波动=性别预测增值的波动+语言预测增值的波动+残差波动
“波动”的操作化定义是Sum of Squares(SS),一列数与其均值的差距平方和
Df表示每个波动项吸收抽样误差波动的理论比例
Sum Sq表示每个波动项解释波动的观测比例,
如果和Df的理论比例极端不匹配(F比1大很多),就支持其中包含由不同总体带入的非随机波动
最后一列Pr(>F)是一般的假设检验中反映F极端程度的p值

Analysis of Variance Table

Response: x_observe
          Df  Sum Sq Mean Sq F value    Pr(>F)    
s          1    0.52    0.52  0.0073    0.9322    
q          1 2857.82 2857.82 40.4639 2.294e-07 ***
s:q        1   92.06   92.06  1.3034    0.2611    
Residuals 36 2542.55   70.63                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1