Usuari:Szpku.lixiaoxu
De WikiTraba
Xiaoxu LI szpku.lixiaoxu@gmail.com Shenzhen Graduate School of Peking University, Guangdong, China
test latex No s'ha pogut entendre (Hi ha hagut una errada en la conversió cap el format PNG; verifiqueu la instaŀlació de ''latex'', ''dvips'', ''gs'' i ''convert''.): L^AT_EX
[edita] Regression of Inputted Data
[edita] 回归分析课件
[edita] 输入参数
[edita] 练习
请观察[tex]\vec{X}_2[/tex]加入前后,回归方程
- [tex]\vec{Y}=\beta_1\vec{X}_1+\vec{\epsilon}[/tex]
与
- [tex]\vec{Y}=\beta_1\vec{X}_1+\beta_2\vec{X}_2+\vec{\epsilon}[/tex]
的[tex]R^2[/tex]的变化。
[edita] 两个与DV相关极小的IV却能极好地预测DV
三个角度分别为 89,89,177.9
[edita] 两个与DV高正相关的IV却出现负回归系数
三个角度分别为 5,2.6,2.6
[edita] 两个不相关的IV对DV的预测能力([tex]R^2[/tex])可以相加
第三个角度为90
[edita] ([tex]R^2_1+R^2_2-R^2_{12}[/tex])从0变大再变小甚至变负的情形
零:三个角度分别为:60,45,90
正:三个角度分别为:60,45,45
负:三个角度分别为:60,45,15.1
与Redundancy的关系: Cohen & Cohen (2003, p. 76)
[edita] 结果
[edita] R 代码
cy1 <- 89; ## \angle YX_1
cy2 <- 89; ## \angle YX_2
c12 <- 177.9; ## \angle X_1X_2
N <- 100;
rawdata=TRUE;
S <- matrix(rep(1,9),3);
S[1,2]<-S[2,1]<-cos(cy1/180*pi);
S[1,3]<-S[3,1]<-cos(cy2/180*pi);
S[2,3]<-S[3,2]<-cos(c12/180*pi);
require(MASS);## install.packages('MASS');
x<-mvrnorm(n=N,mu=c(0,0,0),Sigma=S,empirical= TRUE);
Y<-x[,1];X_1<-x[,2];X_2<-x[,3];
colnames(x)<-colnames(S)<-rownames(S)<-c('Y','X_1','X_2');
R2<-matrix(rep(NA,3),nrow=3);
colnames(R2)<-c('R^2');
rownames(R2)<-c('Y = b_1*X_1 + e','Y =b_2*X_2 + e','Y =b_1*X_1 + b_2*X_2 + e');
lm1 <- lm(Y~0+X_1);
lm2 <- lm(Y~0+X_2);
lm12 <- lm(Y~0+X_1+X_2);
R2[,1] <- c( summary(lm1)$r.squared, summary(lm2)$r.squared, summary(lm12)$r.squared);
R2
R2[1,1]+R2[2,1]-R2[3,1]
summary(lm1)
summary(lm2)
summary(lm12)
cat('\ncorr')
S
cat('\nraw data')
if (rawdata) (x);
[edita] 方差分析教学课件
理解ANOVA是在作回归
[edita] 总体与抽样次数设定
ANOVA在R里头和回归是同一个东西,几句话讲完。麻烦的是模拟数据:
[edita] 可观测的数据及其影响的统计结论
每个观测到的智商成绩都包含随机的成分,所以实际观测到的数据不是各组总体的均值。
[edita] R代码
把下面的代码copy paste到R中,或者将语句贴到Rweb(中文UTF-8码),结合#号后的注解理解方差分析
## 数据:智商抽样,error是个体随机偏差(学名抽样误差,不一定是误出来的,但总是抽出来的)
##4组被试依次性别如下,考试卷子语言如下
(s_group = as.factor(c('Male','Male','Female','Female')));
(q_group = as.factor(c('Chinese','English','Chinese','English')));
## 男生总体用中文卷智商为105,女生总体用中文卷智商为101,男生总体用E文卷子智商为80,女生总体用E文卷智商为90
x_true_group = c(105,80,101,90);
## 组内总体标准差
sigma = 10;
##每组找n(=10)个被试;
n = 10; s = rep(s_group,each=n); q =rep(q_group,each=n);x_true = rep(x_true_group,each=n); data.frame(s,q,x_true)
## 但是每个被试都有随机抽样带来的偏差,标准差是15
error = rnorm(4*n,mean=0,sd=sigma);
x_observe = x_true + error;
## 所以最终看到的是如下
data.frame(s,q,x_observe)
boxplot(x_observe~s*q)
##以上都是数据模拟部分,如果已经有数据,直接完成下面的步骤
##linear model模型,就是回归了;
model = lm(x_observe ~ 1 + s + q + s:q ) ## ~符号后面的1表示截距,s:q表示交互作用项
anova(model)
plot(model,which=c(1,2))
[edita] Online calculator for critical values, cumulative probabilities, and critical noncentral parameters
[edita] Input
[edita] Results of critical statistic, cumulative probability, and critical noncentral parameter
[edita] z and noncentral distributions (chi-square, t, and F)
[edita] Noncentral chi-square
Let [tex]Z_i[/tex],i=0,1,2,... denote a series of independent random variables of standard normal distribution.
- [tex]V = \sum_{i=1}^{df}{Z_i}^2[/tex]
will be a random variable of [tex]\chi^2[/tex] distribution with df degrees of freedom. For any given series of constants [tex]\mu_i[/tex],i=1,2,...,df,
- [tex]\sum_{i=1}^{df}(Z_i+\mu_i)^2[/tex]
will be a random variable of the respective noncentral [tex]\chi^2[/tex] distribution with the same df and the distinct noncetral parameter
- [tex]ncp = \sum_{i=1}^{df}{\mu_i}^2[/tex]
It is different from the random variable [tex]V + ncp = \sum_{i=1}^{df}{Z_i}^2 + \sum_{i=1}^{df}{\mu_i}^2[/tex] of the respective central [tex]\chi^2[/tex] distribution with a central drift.
[edita] Noncentral t
For any given constant [tex]\mu_0[/tex],
- [tex] \frac{Z_0+\mu_0}{\sqrt{V/df }} =[/tex][tex]\frac{Z_0+\mu_0}{\sqrt{\sum_{i=1}^{df}{Z_i}^2/df}}[/tex]
is a random variable of noncentral t-distribution with noncentrality parameter
- [tex]ncp=\mu_0[/tex],
which is different from [tex]\frac{Z_0}{\sqrt{V/df\ }}+\mu_0 [/tex], the central t-distributed random variable drifted with the same mean.
If df on this display is set to [tex]\infty[/tex] (Inf in R) and noncentral parameter set to 0, a standard normal distribution will be plotted and critical z score calculated.
[edita] Noncentral F
The noncentral parameter of F is only defined on its numerator. The noncentral F distributed
- [tex]\frac{\sum_{i=1}^{df_1}(Z_i+\mu_i)^2/df_1}{\sum_{i=df_1+1}^{df_1+df_2}Z_i^2/df_2}[/tex]
with noncentral parameter
- ncp=[tex]\sum_{i=1}^{df_1}\mu_i^2[/tex]
is different from the central F distributed random variable plus the respective constant [tex]\frac{\sum_{i=1}^{df_1}Z_i^2/df_1}{\sum_{i=df_1+1}^{df_1+df_2}Z_i^2/df_2}+\frac{\sum_{i=1}^{df_1}\mu_i^2}{df_1}[/tex] .
[edita] Confidence interval of standardized effect size by noncentral parameters
Confidence interval of unstandardized effect size like difference of means [tex](\mu_1-\mu_2)[/tex] can be found in common statistics textbooks and software, while confidence intervals of standardized effect size, especially Cohen's [tex]\tilde{d}:=\frac{\mu_1-\mu_2}{\sigma}[/tex] and [tex]\tilde{f}^2:=\frac{SS(\mu_1,\mu_2,...,\mu_K)}{K \cdot \sigma^2}[/tex], rely on the calculation of confidence intervals of noncentral parameters (ncp).
A common method to find confident interval limits of ncp is to solve the critical ncp value for marginal extreme quantile. The ncp parameter of the black curve in the above diagram could be directly adopted. For example, [tex]\left(-\infty,8.968\right)[/tex] can be 97.5% one-way confidence interval of ncp if observed [tex]t_{df=4}=5.1[/tex], while change quantile from .025 to .975, we shall find that the two-way interval (1.139, 8.968) can be of 95% confidence level.
[edita] T test for mean difference of single group or two related groups
In case of single group, M ([tex]\mu[/tex]) denotes the sample (population) mean of single group , and SD ([tex]\sigma[/tex]) denotes the sample (population) standard deviation. N is the sample size of the group. T test is used for the hypothesis on the difference between mean and a baseline [tex]\mu_{baseline}[/tex]. Usually, [tex]\mu_{baseline}[/tex] is zero, while not necessary. In case of two related groups, the single group is constructed by difference in each pair of samples, while SD ([tex]\sigma[/tex]) denotes the sample (population) standard deviation of differences rather than within original two groups.
- [tex]t:=\frac{M}{SD/\sqrt{N}}=\frac{\sqrt{N}\frac{M-\mu}{\sigma} + \sqrt{N}\frac{\mu-\mu_{baseline}}{\sigma}}{\frac{SD}{\sigma}}[/tex]
- [tex]ncp=\sqrt{N}\frac{\mu-\mu_{baseline}}{\sigma}[/tex] and Cohen's [tex]d:=\frac{M-\mu_{baseline}}{SD}[/tex] is the point estimate of [tex]\frac{\mu-\mu_{baseline}}{\sigma}[/tex].
So,
- [tex]\tilde{d}=\frac{ncp}{\sqrt{N}}[/tex].
[edita] T test for mean difference between two independent groups
[tex]n_1[/tex] or [tex]n_2[/tex] is sample size within the respective group.
- [tex]t:=\frac{M_1-M_2}{SD_{within}/\sqrt{\frac{n_1 n_2}{n_1+n_2}}}[/tex], wherein [tex]SD_{within}:=\sqrt{\frac{SS_{within}}{df_{within}}}=\sqrt{\frac{(n_1-1)SD_1^2+(n_2-1)SD_2^2}{n_1+n_2-2}}[/tex].
- [tex]ncp=\sqrt{\frac{n_1 n_2}{n_1+n_2}}\frac{\mu_1-\mu_2}{\sigma}[/tex] and Cohen's [tex]d:=\frac{M_1-M_2}{SD_{within}}[/tex] is the point estimate of [tex]\frac{\mu_1-\mu_2}{\sigma}[/tex].
So,
- [tex]\tilde{d}=\frac{ncp}{\sqrt{\frac{n_1 n_2}{n_1+n_2}}}[/tex].
[edita] One-way ANOVA test for mean difference across multiple independent groups
One-way ANOVA test applies noncentral F distribution. While with a given population standard deviation [tex]\sigma[/tex], the same test question applies noncentral chi-square distribution.
- [tex]F:=\frac{\frac{SS_{between}}{\sigma^{2}}/df_{between}}{\frac{SS_{within}}{\sigma^{2}}/df_{within}}[/tex]
For each j-th sample within i-th group [tex]X_{i,j} [/tex], denote [tex]M_{i}\left(X_{i,j}\right):=\frac{\sum_{w=1}^{n_{i}}X_{i,w}}{n_{i}};\;\mu_{i}\left(X_{i,j}\right):=\mu_{i};[/tex].
While,
- [tex]SS_{between} \over \sigma^{2}[/tex]
[tex]= \frac{SS\left(M_{i}\left(X_{i,j}\right);i=1,2,\cdots,K,\; j=1,2,\cdots,n_{i}\right)}{\sigma^{2}}[/tex]
[tex]= SS\left(\frac{M_{i}\left(X_{i,j}-\mu_{i}\right)}{\sigma}+\frac{\mu_{i}}{\sigma};i=1,2,\cdots,K,\; j=1,2,\cdots,n_{i}\right)[/tex]
[tex]\sim \chi^{2}\left(df=K-1,\; ncp=SS\left(\frac{\mu_{i}\left(X_{i,j}\right)}{\sigma};i=1,2,\cdots,K,\; j=1,2,\cdots,n_{i}\right)\right)[/tex]
So, both ncp(s) of F and [tex]\chi^2[/tex] equate
- [tex]SS\left(\mu_i(X_{i,j})/\sigma;i=1,2,\cdots,K,\; j=1,2,\cdots,n_{i}\right)[/tex].
In case of [tex]n:=n_1=n_2=\cdots=n_K[/tex] for K independent groups of same size, the total sample size is [tex]N:=n\cdot K[/tex].
- [tex]Cohen's\;\tilde{f}{}^{2}:=\frac{SS(\mu_{1},\mu_{2},...,\mu_{K})}{K\cdot\sigma^{2}}=\frac{SS\left(\mu_i\left(X_{i,j}\right)/\sigma;i=1,2,\cdots,K,\; j=1,2,\cdots,n_{i}\right)}{n\cdot K}=\frac{ncp}{n\cdot K}=\frac{ncp}N[/tex].
T-test of pair of independent groups is a special case of one-way ANOVA. Note that noncentral parameter [tex]ncp_F[/tex] of F is not comparable to the noncentral parameter [tex]ncp_t[/tex] of the corresponding t. Actually, [tex]ncp_F=ncp_t^2[/tex], and [tex]\tilde{f}=\left|\frac{\tilde{d}}{2}\right|[/tex] in the case.
[edita] RMSEA of Structural Equation Model
ncp of [tex]\chi^2[/tex] reported by Structural Equation Model softwares is proportional to the population value of [tex]RMSEA^2[/tex], or the squared distance per df from population var-cov matrix to the model space.
- [tex]\tilde{RMSEA}=\sqrt{\frac{ncp}{(N-1)df}}[/tex]
[edita] Power vs. Standardized Effect Size or ncp
[edita] Power of t test for a given Cohen's [tex]\delta[/tex]
[edita] Example of one-group mean test
[edita] Input
[edita] Results
[edita] Two-related-group mean test
For two-related-group case, the difference scores between each pair of samples can apply one-group mean test interface. Usually [tex]\mu_{null}[/tex] is set to zero.
[edita] Power of F test for a given Cohen's f
Let's use [tex]\tilde{f}^2[/tex] denote the population of Cohen's [tex]f^2[/tex], specially
- [tex]SS(\mu_1,\mu_2,\cdots,\mu_K) \over {K\times \sigma^2}[/tex]
in one-way ANOVA of [tex]K[/tex] groups setup with within-group sample size n and within-group population mean [tex]\mu_1,\mu_2,\cdots,\mu_K[/tex] respectively. The noncentral parameter of the corresponding F or [tex]\chi^2[/tex] distribution is [tex]ncp=n\times K \times \tilde{f}^2[/tex].
[edita] Power of SEM close-fit test for a given RMSEA
- [tex](N-1)\cdot df\cdot RMSEA^2+df\sim \chi^2_{df,ncp=(N-1)\cdot df\cdot Distance^2perDf(\tilde{\Sigma},Model\,Space)}[/tex]
- [tex]RMSEA=\sqrt{\frac{\hat{\chi}^2-df}{df\cdot(N-1)}}[/tex]
[edita] How to cite this page in APA style
In APA style this page can be cited in reference lists like --
Comparison of noncentral and central distributions. (yyyy, Month dd). In SlideWiki. Retrieved MM:SS, Month dd, yyyy, from http://mars.wiwi.hu-berlin.de/mediawiki/slides/index.php/Comparison_of_noncentral_and_central_distributions
For other styles, refers to examples on wikipedia.
[edita] External links
- Noncentral t-distribution on Wikipedia
- Noncentral chi-square distribution on Wikipedia
- Noncentral F-distribution on Wikipedia
- Confidence interval of Effect Size on Wikipedia

