Fitting a bivariate regression model

Attach the data set and get a quick listing of the data for the first example

 

> attach(regrex1)

> x

 [1]  9.8492 11.1450  5.1209  8.1085  3.5599  1.9883 15.5322

 [8]  7.5202 14.0202  7.5226 14.3272  3.8950  1.8707  0.8347

[15] 18.3413  1.6645 17.0579  9.2375 21.0000  3.5286 25.0000

[22]  9.9008 17.5990 13.8790 10.8690 12.6238  0.9719 14.4814

[29]  2.1792 18.4052

> y

 [1]  6.8102  9.8437  4.9767  5.0006  3.5047  3.6419  7.8725

 [8]  5.1442  8.3883  6.8204  9.0185  2.6148  3.7367  2.8651

[15] 10.7469  2.3710  9.6653  7.1926 11.7833  3.9537 15.0000

[22]  7.9152  9.3368  8.7587  6.7813  7.9924  2.2806 10.4050

[29]  3.7891 10.9067

 

Plot the data

 

plot(y ~ x)

Fit the regression model:

 

 

> # fit the model

> ex1.lm <- lm(y ~ x)

 

 

Type "attributes(ex1.lm)" to see the contents of ex1.lm

 

> attributes(ex1.lm)

$names

 [1] "coefficients"  "residuals"     "effects"     

 [4] "rank"          "fitted.values" "assign"      

 [7] "qr"            "df.residual"   "xlevels"     

[10] "call"          "terms"         "model"       

 

$class

[1] "lm"

 

 

Type "ex1.lm" to see a brief summary of model:

> ex1.lm

 

Call:

lm(formula = y ~ x)

 

Coefficients:

(Intercept)            x 

     2.2481       0.4691 

 

 

Type "summary(ex1.htm)" to see more details

 

> summary(ex1.lm)

 

Call:

lm(formula = y ~ x)

 

Residuals:

     Min       1Q   Median       3Q      Max

-1.66121 -0.53286 -0.02869  0.50436  2.36786

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  2.24814    0.29365   7.656 2.44e-08 ***

x            0.46906    0.02444  19.194  < 2e-16 ***

---

Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

 

Residual standard error: 0.8779 on 28 degrees of freedom

Multiple R-Squared: 0.9294,     Adjusted R-squared: 0.9268

F-statistic: 368.4 on 1 and 28 DF,  p-value: < 2.2e-16

 

 

Plot the regression line, and the deviations that were minimized while fitting the regression equation

 

> plot(y ~ x)

> abline(ex1.lm, col="red")

> segments(x, fitted(ex1.lm), x, y, col="red")

 

 

A second example:

 

> attach(regrex2)

> regrex2

         e       x      y1      y2      y3      y4      y5      y6

1  -0.0578  9.8492  6.8102  6.8390  6.8535  6.8622  6.7523  6.6367

2   2.3679 11.1450  9.8437  8.6597  8.0677  7.7125 12.2115 16.9473

3   0.3266  5.1209  4.9767  4.8134  4.7317  4.6828  5.3033  5.9564

4  -1.0509  8.1085  5.0006  5.5260  5.7887  5.9464  3.9496  1.8478

5  -0.4132  3.5599  3.5047  3.7113  3.8146  3.8766  3.0915  2.2650

6   0.4611  1.9883  3.6419  3.4113  3.2960  3.2268  4.1030  5.0252

7  -1.6612 15.5322  7.8725  8.7030  9.1183  9.3675  6.2112  2.8888

8  -0.6313  7.5202  5.1442  5.4598  5.6177  5.7124  4.5128  3.2502

9  -0.4362 14.0202  8.3883  8.6063  8.7154  8.7808  7.9520  7.0795

10  1.0437  7.5226  6.8204  6.2985  6.0376  5.8810  7.8640  9.9513

11  0.0500 14.3272  9.0185  8.9934  8.9809  8.9734  9.0684  9.1684

12 -1.4603  3.8950  2.6148  3.3449  3.7100  3.9290  1.1544 -1.7662

13  0.6111  1.8707  3.7367  3.4311  3.2783  3.1867  4.3478  5.5700

14  0.2254  0.8347  2.8651  2.7523  2.6960  2.6622  3.0905  3.5413

15 -0.1045 18.3413 10.7469 10.7990 10.8252 10.8408 10.6423 10.4333

16 -0.6579  1.6645  2.3710  2.6999  2.8644  2.9630  1.7130  0.3971

17 -0.5841 17.0579  9.6653  9.9572 10.1033 10.1909  9.0811  7.9129

18  0.6114  9.2375  7.1926  6.8868  6.7339  6.6422  7.8039  9.0268

19 -0.3151 21.0000 11.7833 11.9408 12.0196 12.0669 11.4682 10.8380

20  0.0504  3.5286  3.9537  3.9284  3.9158  3.9083  4.0040  4.1048

21  1.0253 25.0000 15.0000 14.4873 14.2309 14.0771 16.0252 18.0758

22  1.0230  9.9008  7.9152  7.4036  7.1479  6.9945  8.9381 10.9840

23 -1.1664 17.5990  9.3368  9.9199 10.2115 10.3865  8.1703  5.8375

24  0.0005 13.8790  8.7587  8.7584  8.7583  8.7582  8.7591  8.7600

25 -0.5650 10.8690  6.7813  7.0638  7.2050  7.2898  6.2162  5.0861

26 -0.1771 12.6238  7.9924  8.0809  8.1251  8.1517  7.8153  7.4611

27 -0.4234  0.9719  2.2806  2.4923  2.5981  2.6616  1.8572  1.0104

28  1.3642 14.4814 10.4050  9.7228  9.3818  9.1772 11.7691 14.4974

29  0.5187  2.1792  3.7891  3.5297  3.4000  3.3222  4.3077  5.3452

30  0.0254 18.4052 10.9067 10.8939 10.8876 10.8838 10.9320 10.9827

 

> summary(regrex2)

       e                  x                 y1       

 Min.   :-1.66120   Min.   : 0.8347   Min.   : 2.281 

 1st Qu.:-0.53280   1st Qu.: 3.6437   1st Qu.: 3.830 

 Median :-0.02865   Median : 9.8750   Median : 7.007 

 Mean   : 0.00001   Mean   :10.0678   Mean   : 6.971 

 3rd Qu.: 0.50430   3rd Qu.:14.4429   3rd Qu.: 9.257 

 Max.   : 2.36790   Max.   :25.0000   Max.   :15.000 

       y2               y3               y4       

 Min.   : 2.492   Min.   : 2.598   Min.   : 2.662 

 1st Qu.: 3.766   1st Qu.: 3.840   1st Qu.: 3.913 

 Median : 6.975   Median : 7.001   Median : 6.928 

 Mean   : 6.970   Mean   : 6.970   Mean   : 6.971 

 3rd Qu.: 8.935   3rd Qu.: 9.084   3rd Qu.: 9.126 

 Max.   :14.487   Max.   :14.231   Max.   :14.077 

       y5               y6       

 Min.   : 1.154   Min.   :-1.766 

 1st Qu.: 4.154   1st Qu.: 3.682 

 Median : 7.278   Median : 6.297 

 Mean   : 6.971   Mean   : 6.970 

 3rd Qu.: 9.036   3rd Qu.: 9.756 

 Max.   :16.025   Max.   :18.076 

 

 

The variables x and y1 are the same as in the previous example, so as a second example

look at the relationship between y2 and x:

 

> plot(y2 ~ x)

 

Note the stronger relationship.  The second regression example results are:

 

> ex2.lm <- lm(y2 ~ x)

> summary(ex2.lm)

 

Call:

lm(formula = y2 ~ x)

 

Residuals:

     Min       1Q   Median       3Q      Max

-0.83062 -0.26641 -0.01437  0.25222  1.18393

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  2.24810    0.14682   15.31  3.9e-15 ***

x            0.46906    0.01222   38.39  < 2e-16 ***

---

Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

 

Residual standard error: 0.439 on 28 degrees of freedom

Multiple R-Squared: 0.9814,     Adjusted R-squared: 0.9807

F-statistic:  1474 on 1 and 28 DF,  p-value: < 2.2e-16

 

 

Compare these with the results of the first regression:

 

> summary(ex1.lm)

 

Call:

lm(formula = y ~ x)

 

Residuals:

     Min       1Q   Median       3Q      Max

-1.66121 -0.53286 -0.02869  0.50436  2.36786

 

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  2.24814    0.29365   7.656 2.44e-08 ***

x            0.46906    0.02444  19.194  < 2e-16 ***

---

Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

 

Residual standard error: 0.8779 on 28 degrees of freedom

Multiple R-Squared: 0.9294,     Adjusted R-squared: 0.9268

F-statistic: 368.4 on 1 and 28 DF,  p-value: < 2.2e-16

 

Both regressions are significant, but all of the diagnostic statistics

for the second regression are higher.

 

Compare the two regressions and the size of the deviations that were minimized

 

> plot(y2 ~ x)

> abline(ex2.lm, col="blue")

> segments(x, fitted(ex2.lm), x, y2, col="blue")

 

 

Back to the first example--plot the regression line on top of the scatter plot

 

Get prediction intervals and confidence intervals, and plot them

 

> pred.data <- data.frame(x=1:25)

> pred.int <- predict(ex1.lm, int="p", newdata=pred.data)

> conf.int <- predict(ex1.lm, int="c", newdata=pred.data)

>

> plot(x, y, ylim=range(y, pred.int, na.rm=T))

> pred.ex1 <- pred.data$x

> matlines(pred.ex1, pred.int, lty=c(1,2,2), col="black")

> matlines(pred.ex1, conf.int, lty=c(1,2,2), col="red")

 

 

Examine some regression diagnostic plots

 

> # standard regression diagnostics (4-up)

> oldpar <- par(mfrow = c(2, 2))

> plot(ex1.lm)

> par(oldpar)

 

R-F spread plot

> # r-f spread plot

> library(lattice)

... [output deleted]

> trellis.device(color=TRUE, theme = "col.whitebg")

> rfs(ex1.lm)