Lecture 8

Author

Jiaye Xu

Published

April 11, 2022

Lecture 8 Overview

  • MLR with Categorical Predictors

  • MLR with Interaction Terms


MLR with Categorical Predictors

In lecture 1, we introduced the data types: numeric and categorical data. And most or all of the data we have encountered in this course so far have been numeric. (e.g., people’s blood pressure, car speed data).

Categorical variables are those which may be non-numeric or, if numeric, the numerical values which they take on are essentially arbitrary in nature. So, for example, a variable taking on, say, the values “T” or “F” for “True” or “False” is categorical, but such a variable would still be categorical – and not a metric variable – even if it took on the values “1” or “0” in place of “T” or “F”.

A categorical variable given numerical value is said to be transformed into a “dummy variable”. In R, the “factor” variable-type allows for handling of categorical variables as dummy variables.

Let’s look at a scientific question related to categorical variable as a predictor in MLR:

Q: Is a baby’s birth weight related to the mother’s smoking during pregnancy?

To answer this question, a data set was collected on a random sample of \(n=32\) births:

  • Potential predictor (\(x_1\)): Smoking status of mother (yes or no)

  • Potential predictor (\(x_2\)): length of gestation (Gest) in weeks

  • Response (\(y\)): birth weight (Weight) in grams of baby

Note that smoking is a qualitative predictor (categorical variable). It is a “binary variable” with only two values (yes or no). The other predictor variable (Gest) is quantitative (numeric).

dat = read.table('birthsmokers.txt', header = T)
head(dat)
   Wgt Gest Smoke
1 2940   38   yes
2 3130   38    no
3 2420   36   yes
4 2450   34    no
5 2760   39   yes
6 2440   35   yes
colnames(dat)[1] = 'Weight'
dat$Smoke = as.factor(dat$Smoke) # convert character to factor (categorical variable)
pairs(dat) # factor columns are converted to numeric

Interpretation:

  • The scatter plot matrix suggests that there is a positive linear relationship between length of gestation and birth weight.

  • It is hard to see if any kind of relationship exists between birth weight and smoking status, or between length of gestation and smoking status.

We can also plot birth weight against length of gestation, using different colors for the smoking status:

plot(dat$Gest, dat$Weight, type = 'n', xlab = 'Length of Gestation', ylab = 'Birth Weight', main = 'Birth Weight vs Length of Gestation') # type = "n" for no plotting
points(dat$Gest[dat$Smoke == 'yes'], dat$Weight[dat$Smoke == 'yes'], col = 'red')
points(dat$Gest[dat$Smoke == 'no'], dat$Weight[dat$Smoke == 'no'], col = 'blue')
legend('topleft', bty = 'n', col = c('blue', 'red'), legend = c('Non-Smoker', 'Smoker'), pch = 1) # bty = "n", no box around legend

Recall that the scientific question remains – after taking into account length of gestation, is there a significant difference in the average birth weights of babies born to smoking and non-smoking mothers?

A first-order model with one binary predictor and one quantitative predictor (a parallel model) that helps us answer the question is:

\[y_i = \beta_0 +\beta_1x_{i1}+\beta_2x_{i2}+\epsilon_i\]

where

  • \(y_i\) is the birth weight of baby \(i\)

  • \(x_{i1}\) is length of gestation of baby \(i\)

  • \(x_{i2}\) is a binary variable coded as a \(1\), if the baby’s mother smoked during pregnancy and \(0\), if she did not and the independent error terms \(\epsilon_i\) follow a normal distribution with mean \(0\) and equal variance \(\sigma^2\).

Notice that in order to include a categorical variable (as a qualitative predictor) in a regression model, we have to “code” the variable, that is, assign a unique number to each of the possible categories.

A common coding scheme is “zero-one-indicator variable.” Using such a variable here, we code the binary predictor smoking as:

  • \(x_{i2} = 1\), if mother \(i\) smokes

  • \(x_{i2} = 0\), if mother \(i\) does not smoke

Comments:

  • In doing so, we use the tradition of assigning the value of \(1\) to those having the characteristic of interest and \(0\) to those not having the characteristic.

  • Incidentally, other terms sometimes used instead of “zero-one-indicator variable” are “dummy variable” or “binary variable”.

fit = lm(Weight ~ Gest + Smoke, data = dat)
betas = coef(fit)
yhat1 = betas[1] + betas[2]*(dat$Gest[dat$Smoke == 'yes']) + betas[3]
yhat2 = betas[1] + betas[2]*(dat$Gest[dat$Smoke == 'no'])

plot(dat$Gest, dat$Weight, type = 'n', ylim = c(2370, 3621), xlab = 'Length of Gestation', ylab = 'Birth Weight', main = 'Birth Weight vs Length of Gestation')
points(dat$Gest[dat$Smoke == 'yes'], dat$Weight[dat$Smoke == 'yes'], col = 'red')
points(dat$Gest[dat$Smoke == 'no'], dat$Weight[dat$Smoke == 'no'], col = 'blue')
lines(sort(dat$Gest[dat$Smoke == 'yes']), yhat1[order(dat$Gest[dat$Smoke == 'yes'])], col = 'red')
lines(sort(dat$Gest[dat$Smoke == 'no']), yhat2[order(dat$Gest[dat$Smoke == 'no'])], col = 'blue')
legend('topleft', bty = 'n', col = c('blue', 'red'), c('Non-Smoker', 'Smoker'), lty = c(1, 1))

Interpretations:

  • The blue circles represent the data on non-smoking mothers (\(x_2 = 0\)), while the red circles represent the data on smoking mothers (\(x_2 = 1\)).

  • The blue line represents the estimated linear relationship between length of gestation and birth weight for non-smoking mothers, while the red line represents the estimated linear relationship for smoking mothers.

summary(fit)

Call:
lm(formula = Weight ~ Gest + Smoke, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-223.693  -92.063   -9.365   79.663  197.507 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2389.573    349.206  -6.843 1.63e-07 ***
Gest          143.100      9.128  15.677 1.07e-15 ***
Smokeyes     -244.544     41.982  -5.825 2.58e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 115.5 on 29 degrees of freedom
Multiple R-squared:  0.8964,    Adjusted R-squared:  0.8892 
F-statistic: 125.4 on 2 and 29 DF,  p-value: 5.289e-15

The estimated regression equation is \[\text{Weight} = -2389.6 + 143.10 \text{Gest} - 244.5 \text{Smoke}\]

Therefore, the estimated regression equation for non-smoking mothers (smoke = 0) is: \[\text{Weight} = -2389.6 + 143.10 \text{Gest} \] and the estimated regression equation for smoking mothers (smoke = 1) is: \[\text{Weight} = -2634.1 + 143.10 \text{Gest}\] That is, we obtain two different parallel estimated lines. The difference between the two lines, \(-244.5\) (grams), represents the difference in the average birth weights for a fixed gestation length for smoking and non-smoking mothers in the sample.

Comments:

The general interpretation of the regression coefficients in a regression model with one \((0, 1)\) binary indicator variable and one quantitative predictor is:

  1. \(\beta_1\) represents the change in the mean response for each additional unit increase in the quantitative predictor \(x_1\) for both groups.

  2. \(\beta_2\) represents how much higher (or lower) the mean response function of the second group is than that of the first group for any value of \(x_1\).

Now, let’s use our model and analysis to answer the research question:

Q: Is baby’s birth weight related to smoking during pregnancy, after taking into account length of gestation?

A: We can answer our research question by testing the null hypothesis \(H_0 : \beta_2 = 0\) vs \(H_1 : \beta_2\neq 0\).

In conclusion:

  • There is sufficient evidence to conclude that there is a statistically significant difference in the mean birth weight of all babies of smoking mothers and the mean birth weight of babies of all non-smoking mothers, after taking into account length of gestation.

  • A 95% confidence interval for \(\beta_2\) tells us the magnitude of the difference.

confint(fit, level = 0.95)
                 2.5 %     97.5 %
(Intercept) -3103.7795 -1675.3663
Gest          124.4312   161.7694
Smokeyes     -330.4064  -158.6817

We can be 95% confident that the mean birth weight of smoking mothers is between \(158.7\) and \(330.4\) grams less than the mean birth weight of non-smoking mothers, regardless of the length of gestation.

Similarly, we can answer the question:

Q: How is birth weight related to gestation, after taking into account a mother’s smoking status, by testing the null hypothesis \(H_0 : \beta_1 = 0\) vs \(H_1 : \beta_1\neq 0\).

In conclusion:

  • There is sufficient evidence to conclude that there is a statistically significant linear relationship between birth weight and gestation, after taking into account a mother’s smoking status.

  • A 95% confidence interval for \(\beta_1\) tells us the magnitude of the difference.

We can be 95% confident that the mean birth weight is expected to increase between \(124.4\) and \(161.8\) grams given each unit increase in the length of gestation, regardless of the smoking status of the mother.

MLR with multiple categorical variables

In general, a categorical variable defining \(K\) groups should be represented by \(K - 1\) indicator variables. For example:

  • If your categorical variable defines 2 groups, then you need 1 indicator variable. (We have looked at an example of this case.)

  • If your categorical variable defines 3 groups, then you need 2 indicator variables.

Choose one group or category to be the “referencegroup:

Often it will be clear from the application which group should be the reference group, such as a control group in a medical experiment (e.g., the “non-smoking” group as the reference group in the birth weight example), but, if not, then the group with the most observations is often a good choice.

What is Implied by the Parallelness of the Two Lines?

In light of the plot, let’s investigate those questions again:

  1. Does the effect of the gestation length on mean birth weight depend on the mother’s smoking status?
  • The answer is NO. Regardless of whether or not the mother is a smoker, for each additional one-week of gestation, the mean birth weight is predicted to increase by 143 grams.

  • This lack of interaction between the two predictors is exhibited by the parallelness of the two lines.

  1. Does the effect of smoking on mean birth weight depend on the length of gestation?
  • The answer is NO. For a fixed length of gestation, the mean birth weight of babies born to smoking mothers is predicted to be 245 grams lower than the mean birth weight of babies born to non-smoking mothers.

  • Again, this lack of interaction between the two predictors is exhibited by the parallelness of the two lines.

MLR with Interaction Terms

Loosely speaking, interaction effects occur in regression when the influence of an independent variable (predictor) on the target variable depends on the behavior of another independent variable. (When two predictors do not interact, we say that each predictor has an “additive effect” on the response.)

Now let’s take a look at an example where including “interaction terms” is appropriate.

Interaction between Categorical and Numerical Variables

Example: Some researchers were interested in comparing the effectiveness of three treatments for severe depression.

For the sake of simplicity, we denote the three treatments A, B, and C. The researchers collected the following data on a random sample of \(n = 36\) severely depressed individuals:

  • response: \(y_i\), measure of the effectiveness of the treatment for individual \(i\)

  • quantitative predictor: \(x_{i1}\), age (in years) of individual \(i\)

  • qualitative predictor: \(x_{i2} = 1\) if individual \(i\) received treatment A, and 0 if not

  • \(x_{i3} = 1\) if individual \(i\) received treatment B, and 0 if not

A scatter plot of the data with treatment effectiveness on the y-axis and age on the x-axis looks like:

dat = read.table('depression.txt', header = T)
head(dat)
   y age x2 x3 TRT
1 56  21  1  0   A
2 41  23  0  1   B
3 40  30  0  1   B
4 28  19  0  0   C
5 55  28  1  0   A
6 25  23  0  0   C
names(dat)
[1] "y"   "age" "x2"  "x3"  "TRT"
y = dat$y
age = dat$age
x2 = dat$x2
x3 = dat$x3
plot(age, y, type = 'n', xlab = 'Age', ylab = 'Treatment Effectiveness', main = 'Treatment Effectiveness vs Age')
points(age[x2 == 1], y[x2 == 1], col = 'blue')
points(age[x3 == 1], y[x3 == 1], col = 'green')
points(age[x3 == 0 & x2 == 0], y[x3 == 0 & x2 == 0], col = 'red')
legend('bottomright', bty = 'n', col = c('blue','green', 'red'), c('A', 'B', 'C'), pch = c(1, 1, 1))

The blue circles represent the data for individuals receiving treatment A, the red circles treatment B, the green circles treatment C.

Fit a (second-order) multiple regression model with interaction terms

\[y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \beta_3x_{i3} + \beta_{12}x_{i1}x_{i2} + \beta_{13}x_{i1}x_{i3} + \epsilon_i\]

(The terms \(x_{i1}x_{i2}\) and \(x_{i1}x_{i3}\) are the interaction terms in the model.)

Let’s investigate the way the predictors have an “interaction effect” on the response:

  • If patient receives A, then (\(x_{i2} = 1\), \(x_{i3} = 0\)), then

\[\operatorname{E}(y_i) = (\beta_0 + \beta_2) + (\beta_1 + \beta_{12})x_{i1}\]

  • If patient receives B, then (\(x_{i2} = 0\), \(x_{i3} = 1\)), then

\[\operatorname{E}(y_i) = (\beta_0 + \beta_3) + (\beta_1 + \beta_{13})x_{i1}\]

  • If patient receives C, then (\(x_{i2} = 0\), \(x_{i3} = 0\)), then

\[\operatorname{E}(y_i) =\beta_0 + \beta_1x_{i1}\]

Comments:

  • The predictors have an “interaction effect” on the mean response implies

    • the effect of the individual’s age (\(x_{i1}\)) on the treatment’s mean effectiveness (\(\operatorname{E}(y_i)\)) depends on the treatment (\(x_{i2}\) and \(x_{i3}\))

    • the effect of treatment (\(x_{i2}\) and \(x_{i3}\)) on the treatment’s mean effectiveness (\(\operatorname{E}(y_i)\)) depends on the individual’s age (\(x_{i1}\)).

  • In general, two predictors “to interact” means:

    • Two predictors interact if the effect on the response variable of one predictor depends on the value of the other.

    • A slope parameter can no longer be interpreted as the change in the mean response for each unit increase in the predictor, while the other predictors are held constant.

fit = lm(y ~ age + x2 + x3 + age*x2 + age*x3)
summary(fit)

Call:
lm(formula = y ~ age + x2 + x3 + age * x2 + age * x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.4366 -2.7637  0.1887  2.9075  6.5634 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.21138    3.34964   1.854 0.073545 .  
age          1.03339    0.07233  14.288 6.34e-15 ***
x2          41.30421    5.08453   8.124 4.56e-09 ***
x3          22.70682    5.09097   4.460 0.000106 ***
age:x2      -0.70288    0.10896  -6.451 3.98e-07 ***
age:x3      -0.50971    0.11039  -4.617 6.85e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.925 on 30 degrees of freedom
Multiple R-squared:  0.9143,    Adjusted R-squared:  0.9001 
F-statistic: 64.04 on 5 and 30 DF,  p-value: 4.264e-15

The estimated regression equation is

\[\hat y_i = 6.21 + 1.0334 \text{age}_i + 41.30 x_{i2} + 22.71 x_{i3} - 0.703 \text{age}_i \times x_{i2} - 0.510 \text{age}_i \times x_{i3}\]

That is, we obtain the three “best fitting” lines – one for each treatment (A, B and C)

  • If patient receives A, then (\(x_{i2} = 1\), \(x_{i3} = 0\)), then

\[\hat y_i = 47.5 + 0.33x_{i1}\]

  • If patient receives B, then (\(x_{i2} = 0\), \(x_{i3} = 1\)), then

\[\hat y_i = 28.9 + 0.52x_{i1}\]

  • If patient receives C, then (\(x_{i2} = 0\), \(x_{i3} = 0\)), then

\[\hat y_i =6.21 + 1.03x_{i1}\]

Interpretations:

The effect of age on the predicted treatment effectiveness depends on the treatment given. That is, age appears to interact with treatment in its impact on treatment effectiveness. Specifically,

  • For patients in this study receiving treatment A, the effectiveness of the treatment is predicted to increase 0.33 units for every additional year in age.

  • For patients in this study receiving treatment B, the effectiveness of the treatment is predicted to increase 0.52 units for every additional year in age.

  • For patients in this study receiving treatment C, the effectiveness of the treatment is predicted to increase 1.03 units for every additional year in age.

The interaction is exhibited graphically by the “non-parallelness” of the lines.

bs = coef(fit)
yhat1 = bs[1] + bs[3] + (bs[2] + bs[5])*(age[x2 == 1])
yhat2 = bs[1] + bs[4] + (bs[2] + bs[6])*(age[x3 == 1])
yhat3 = bs[1] + bs[2]*(age[x3 == 0 & x2 == 0])
plot(age, y, type = 'n', xlab = 'Age', ylab = 'Treatment Effectiveness', main = 'Treatment Effectiveness vs Age')
points(age[x2 == 1], y[x2 == 1], col = 'blue')
points(age[x3 == 1], y[x3 == 1], col = 'green')
points(age[x3 == 0 & x2 == 0], y[x3 == 0 & x2 == 0], col = 'red')
lines(age[x2 == 1], yhat1, col = 'blue')
lines(age[x3 == 1], yhat2, col = 'green')
lines(age[x3 == 0 & x2 == 0], yhat3, col = 'red')
legend('bottomright', bty = 'n', col = c('blue','green', 'red'), c('A', 'B', 'C'), lty = c(1,1,1), pch = c(1, 1, 1))

Comments:

  1. As usual, we first should evaluate the model (via diagnostics) before we use our estimated model to draw conclusions about the larger population of depressed individuals.

  2. Answer scientific research questions with hypothesis testing tools. For example,

  1. Q: For every age, is there a difference in the mean effectiveness for the three treatments? That is to test the significance of treatment effect,

\[H_0 : \beta_2 =\beta_3 = \beta_{12} = \beta_{13} = 0 \text{ vs } H_1 : \text{ at least one of these parameters is not } 0\]

  1. Q: Does the effect of age on the treatment’s effectiveness depend on treatment? That is to test the interaction between age and treatment effect,

\[H_0 : \beta_{12} = \beta_{13} = 0 \text{ vs } H_1 : \text{ at least one of these parameters is not } 0\]

Implement F-test for nested model comparison in R:

# 1. F-test for treatment effect
mod.reduced = lm(y ~ age)
mod.full = lm(y ~ age + x2 + x3 + age*x2 + age*x3)
anova(mod.reduced, mod.full)
Analysis of Variance Table

Model 1: y ~ age
Model 2: y ~ age + x2 + x3 + age * x2 + age * x3
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1     34 1970.57                                 
2     30  462.15  4    1508.4 24.48 4.458e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 2. F-test for interaction between age and treatment effect
mod.reduced = lm(y ~ age + x2 + x3)
mod.full = lm(y ~ age + x2 + x3 + age*x2 + age*x3)
anova(mod.reduced, mod.full)
Analysis of Variance Table

Model 1: y ~ age + x2 + x3
Model 2: y ~ age + x2 + x3 + age * x2 + age * x3
  Res.Df     RSS Df Sum of Sq      F   Pr(>F)    
1     32 1165.57                                 
2     30  462.15  2    703.43 22.831 9.41e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can conclude that there is a significant difference in the mean effectiveness for the three treatments, and the effect of age on the treatment’s effectiveness depends on the treatment.

A Supplementary Example: Interaction between Categorical and Numeric Variables

In data set UN11, the places are divided into three groups. Note group is a categorical variable.

Dataset Description

National health, welfare, and education statistics for 210 places, mostly UN members, but also other areas like Hong Kong that are not independent countries.

library(alr4)
attach(UN11)
names(UN11)
[1] "region"    "group"     "fertility" "ppgdp"     "lifeExpF"  "pctUrban" 

region: region of the world

group: a factor with levels oecd for countries that are members of the OECD, the Organization for Economic Co-operation and Development, as of May 2012, africa for countries on the African continent, and other for all other countries. No OECD countries are located in Africa

fertility: number of children per woman

ppgdp: Per capita gross domestic product in US dollars

lifeExpF: Female life expectancy, years

pctUrban: Percent Urban

Research Goal: We might want to investigate how lifeExpF depends on this categorical variable group. (In constrast, recall the research goal of fitting one-way anova model.)

In R, categorical variables are defined as factors, so we should first check if group is a factor before we fit the model. Also recall the function to convert a numerical variable to a factor is as.factor().

# check if the class of object group is factor
is.factor(group)
[1] TRUE
# different categories of nations
levels(group)
[1] "oecd"   "other"  "africa"
factor.lm = lm(lifeExpF ~ group, data = UN11)
summary(factor.lm)

Call:
lm(formula = lifeExpF ~ group, data = UN11)

Residuals:
     Min       1Q   Median       3Q      Max 
-25.8367  -3.3045   0.3635   2.7183  18.2277 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.446      1.128  73.095  < 2e-16 ***
groupother    -7.120      1.271  -5.602  7.1e-08 ***
groupafrica  -22.674      1.420 -15.968  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.28 on 196 degrees of freedom
Multiple R-squared:  0.6191,    Adjusted R-squared:  0.6152 
F-statistic: 159.3 on 2 and 196 DF,  p-value: < 2.2e-16

Notice that the fit does not contain an estimated slope for the group oecd. This is because R represents factor variables by numeric dummy variables, which are \(1\) if an observation comes from a group and \(0\) if not. In this case, R creates two dummy variables: \(x_{i1}=1\) if country \(i\) is in group other and \(x_{i1}=0\) otherwise; \(x_{i2}=1\) if country \(i\) is in group africa and \(x_{i2}=0\) otherwise. So, if a country has \(x_{i1}=x_{i2}=0\), it is in group oecd.

Now, we consider adding a numerical predictor \(\log(\text{ppdgp})\) to the model. Before fitting a model, let’s do an exploratory plot first. We create a scatterplot of lifeExpF against \(\log(\text{ppdgp})\). Because we also have categorical information in the variable group, we can use different plotting symbols for each group of nations to visualize the differences.

plot(log(ppgdp), lifeExpF, pch = c(15, 1, 2)[group])
legend('topleft', pch = c(15, 1, 2), legend = levels(group))

As we know, a multiple linear regression model (MLR) involving a categorical predictor and a continuous predictor essentially combines the separate simple linear regression models for the three groups into one. So, in order to model seperate slopes for each group, we should add interaction of numerical and categorical variables.

Let \(Y_i\) and \(x_{i3}\) be the values of lifeExpF and \(\log(\text{ppdgp})\) for the \(i\)th country, and \(x_{i1}\) and \(x_{i2}\) be the dummy variables for the categories in group. We fit a non-parallel model:

\[Y_{i}=\beta_{0}+\beta_{1} x_{i 1}+\beta_{2} x_{i 2}+\beta_{3} x_{i 3}+\beta_{13} x_{i 1} x_{i 3}+\beta_{23} x_{i 2} x_{i 3}+\varepsilon_{i}\]

# fit non-parallel MLR
np.lm = lm(lifeExpF ~ group + log(ppgdp) + group*log(ppgdp), data = UN11)
# alternatively, use an equivalent way to fit the same model:
# np.lm = lm(lifeExpF ~ group*(log(ppgdp)))
summary(np.lm)

Call:
lm(formula = lifeExpF ~ group + log(ppgdp) + group * log(ppgdp), 
    data = UN11)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.634  -2.089   0.301   2.255  14.489 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)             59.2137    15.2203   3.890 0.000138 ***
groupother             -11.1731    15.5948  -0.716 0.474572    
groupafrica            -22.9848    15.7838  -1.456 0.146954    
log(ppgdp)               2.2425     1.4664   1.529 0.127844    
groupother:log(ppgdp)    0.9294     1.5177   0.612 0.540986    
groupafrica:log(ppgdp)   1.0950     1.5785   0.694 0.488703    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.129 on 193 degrees of freedom
Multiple R-squared:  0.7498,    Adjusted R-squared:  0.7433 
F-statistic: 115.7 on 5 and 193 DF,  p-value: < 2.2e-16