By beginning of this assignment, you affirm that you will not give or receive any
unauthorized help, and that all work will be your own. You agree to abide by Seneca’s
Academic Integrity Policy, and you understand any violation of academic integrity will be
subject to the penalties outlined in the policy.
Problem 1 (35 % marks) File: MALL. XLS
A national chain of women’s clothing stores with locations in the large shopping malls
thinks that it can do a better job of planning more renovations and expansions if it
understands what variables impact sales. It plans a small pilot study on stores in 25
different mall locations. The data it collects consist of monthly sales, store size (sq. ft),
number of linear feet of window display, number of competitors located in mall, size of
the mall (sq. ft), and distance to nearest competitor (ft).
1. Define a multiple regression model for the data. (6 marks)
2. Interpret the values of the coefficients in the model. (15 marks)
3. Test whether the model as a whole is significant. At the 0.05 level of significance,
what is your conclusion? (2 marks)
4. Use the model to predict monthly sales for each of the stores in the study. (6
marks)
5. Find and interpret the value of �2 for this model. (2 marks)
6. Test the individual regression coefficients (i.e., check the result of test statistics
that SAS or Excel provides). At the 0.05 level of significance, what are your
conclusions? (2 marks)
7. If you were going to drop just one variable from the model, which one would you
choose? Why? (2 marks)
Problem 2 (35%) – File: Bank. xlsx
Community Bank would like to increase the number of customers who use payroll
deposit. Management is considering a new sales campaign that will require each branch
manager to call each customer who does not currently use payoff direct deposit. As an
incentive to sign up for payroll direct deposit, each customer contacted will be offered
free checking for two years. Because of the time and cost associated with the new
campaign, management would like to focus their efforts on customers who have the
highest probability of signing up for payroll direct deposit. Management believes that the
average monthly balance in a customer’s checking account may be useful predictor of
whether the customer will sign up for direct payroll deposit. To investigate the
relationship between these two variables, Community Bank tried the new campaign
using a sample of 50 checking account customers who do not currently use payroll
direct deposit. The sample data show the average monthly checking account balance
(in hundreds of dollars) and whether the customer contacted signed up for payroll direct
deposit (coded 1 if the customer signed up for payroll direct deposit and 0 if not).
1. For the Community Bank data, use SAS to formulate the estimated logistic
regression equation. (5 marks)
2. Estimate the probability that customers with an average monthly balance of
$1000 will sign up for direct payroll deposit. (5 marks)
3. Suppose Community Bank only wants to contact customers who have a 0.50 or
higher probability of signing up for direct payroll deposit. What is the average
monthly balance required to achieve this level of probability? (10 marks)
4. What is the estimated odds ratio? What is the interpretation? (15 marks)
Problem 3 (30%) – File: Lakeland. xlsx
Over the past few years, the percentage of students who leave Lakeland College at the
end of the first year has increased. Last year Lakeland started a voluntary one-week
orientation program to help first-year students adjust to campus life. If Lakeland can
show that the orientation program has a positive effect on retention, they will consider
making the program a requirement for all first-year students. Lakeland’s administration
also suspects that students with lower GPAs have a higher probability of leaving
Lakeland at the end of the first year. To investigate the relation of these variables to
retention, Lakeland selected a random sample of 100 students from last year’s entering
class. The data are contained in the data set named Lakeland.
1. Write the logistic regression equation relating x to y. (5 mark)
2. For the Lakeland data, use SAS to compute the estimated logistic regression
equation. (5 marks)
3. Use the estimated logit computed above to estimate the probability that students
with a 2.5 grade point average who did not attend the orientation program will
return to Lakeland for their sophomore year. What is the estimated probability for
students with a 2.5 grade point average who attended the orientation program?
(10 marks)
4. What is the estimated odds ratio for the orientation program? Interpret it. (5
marks)
5. Would you recommend making the orientation program a required activity? Why
or why not? (5 marks)
Regression