Bivariate regression
The simplest regression model is the bivariate one, in which there is one response or dependent variable, and one predictor or independent variable, and the relationship between the two is represented by a straight line.
Fitting a regression equation (model)
Building a bivariate linear regression model to represent the relationship between two variables by a straight line involves determining the coefficients of that line, a process known as "fitting" the regression line. First plot the data! [regrex1.csv]
Scatter diagram for the example data set [Example regr_fit01]
# attach and plot the data
attach(regrex1)
plot(y ~ x)
# fit the model
ex1.lm <- lm(y ~ x)
# examine the model object
ex1.lm
summary(ex1.lm)
attributes(ex1.lm)
# plot the regression line
abline(ex1.lm, col="red")
Minimization of deviations
# plot deviations
segments(x, fitted(ex1.lm), x, y)
Another example data set [regrex2.csv]
Iterative fitting of a regression equation
Regression equations can also be fit by perturbing the parameter values, and choosing the combination that minimizes the sum of squares of residuals.
# fit a linear regression equation by
# minimizing the sum of squared residuals
# uses regrex1
plot(y ~ x)
n <- length(y)
k <- 1
n.b0 <- 11
b0.min <- 1.0 # 2.00 # 2.24
b0.max <- 3.0 # 2.40 # 2.25
n.b1 <- 11
b1.min <- 0.0 # 0.4 # 0.46
b1.max <- 1.0 # 0.5 # 0.47
b0 <- seq(b0.min, b0.max, len=n.b0)
b1 <- seq(b1.min, b1.max, len=n.b1)
rse <- matrix(nrow=n.b0, ncol=n.b1)
dimnames(rse) <- list(as.character(b0),as.character(b1))
for (j in 1:n.b0) {
for (k in 1:n.b1) {
sse <- 0.0
for (i in 1:n) {
sse <- sse + (y[i] - b0[j] - b1[k]*x[i])^2
}
rse[j,k] <- sqrt(sse/(n-k-1))
abline(b0[j], b1[k], col="gray")
#print(cbind(b0[j], b1[k], rse[j,k]))
}
}
rse
# plot the OLS regression line
abline(ex1.lm, col="red")
Examining the regression equation
Once the regression equation has been fit to the data, the next step is to examine the results and the significance of several statistics.
# examine the model object
ex1.lm
summary(ex1.lm)
# plot the regression line
plot(y ~ x)
abline(ex1.lm)
The fit of the regression model can also be displayed by plotting confidence intervals (which allow variability in the regression line to be visually assessed) and prediction intervals (which allow variability in the data to be assessed).
# get prediction intervals and confidence intervals
pred.data <- data.frame(x=1:25)
pred.int <- predict(ex1.lm, int="p", newdata=pred.data)
conf.int <- predict(ex1.lm, int="c", newdata=pred.data)
plot(x, y, ylim=range(y, pred.int, na.rm=T))
pred.ex1 <- pred.data$x
matlines(pred.ex1, pred.int, lty=c(1,2,2), col="black")
matlines(pred.ex1, conf.int, lty=c(1,2,2), col="red")
Residual plots and case-wise statistics
# standard regression diagnostics (4-up)
oldpar <- par(mfrow = c(2, 2))
plot(ex1.lm, which=c(1,2,4,5))
par(oldpar)
# r-f spread plot
library(lattice)
rfs(ex1.lm)