Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 3, Due date, 04 August, 11:59 pm

Refer to Kenneth Trains website for more information

Please note:

a.         The assignment should be submitted to Moodle as a PDF file, ONLY one submission is accepted.

b.         The assignment MUST be typed, Scanned documents are not acceptable,

c.         Late submission is accepted but 10% of the mark will be deducted for each day of late submission,

d.         The assignment is an individual assignment so no group submission is accepted

PART 1

The purpose of this problem set is for you to develop an understanding of how to specify and

interpret multinomial logit models using the BIOGEME or R, you have to provide your code as an appendix as part of the assignment).

The problem set uses data on the choice of heating system in California houses. The observations consist of 750 single-family houses in California that were newly built and had central air-

conditioning. The choice is among heating systems.

Five types of systems are considered to have been possible:

(1) gas central,

(2) gas room,

(3) electric central,

(4) electric room,

(5) heat pump.

Data

The line of data contains the attributes of the decision-maker and the attributes of all the

available alternatives. File "data.xls" is an excel file of our data. Open the file and examine at the data. If you cannot read excel files, the data.

There are 750 lines of data with 19 variables on each line. The variables are:

1.  idcase: gives the observation number (1-750)

2.  depvar: identifies the chosen alternative (1-5)

3.  ic1: installation cost for a gas central system

4.  ic2: installation cost for a gas room system

5.  ic3: installation cost for a electric central system

6.  ic4: installation cost for a electric room system

7.  ic5: installation cost for a heat pump

8.  oc1: annual operating cost for a gas central system

9.  oc2: annual operating cost for a gas room system

10. oc3: annual operating cost for a electric central system

11. oc4: annual operating cost for a electric room system

12. oc5: annual operating cost for a heat pump

13. income: annual income of the household

14. agehead: age of the household head

15. rooms: number of rooms in the house

16. ncoast: identifies whether the house is in the northern coastal region

17. scoast: identifies whether the house is in the southern coastal region

18. mountn: identifies whether the house is in the mountain region

19. valley: identifies whether the house is in the central valley region

You have to add variables to the data showing that the alternatives are available to individuals to be able to run the code in BIOGEME (refer to examples available on BIOGEME webpage).

Note that the attributes of the alternatives, namely, installation cost and operating cost, take a

different value for each alternative. Therefore, there are 5 installation costs (one for each of the 5 systems) and 5 operating costs. To estimate the logit model, the researcher needs data on the

attributes of all the alternatives, not just the attributes for the chosen alternative. For example, it is not sufficient for the researcher to determine how much was paid for the system that was

actually installed (ie., the bill for the installation). The researcher needs to determine how much it would have cost to install each of the systems if they had been installed. The importance of

costs in the choice process (i.e., the coefficients of installation and operating costs) is determined through comparison of the costs of the chosen system with the costs of the non-chosen systems.

For these data, the costs were calculated as the amount the system would cost if it were installed in the house, given the characteristics of the house (such as size), the price of gas and electricity in the house location, and the weather conditions in the area (which determine the necessary

capacity of the system and the amount it will be run.) These cost are conditional on the house

having central air-conditioning. (That's why the installation cost of gas central is lower than that for gas room: the central system can use the air-conditioning ducts that have been installed.)

Question 1

Run a logit model with installation cost and operating cost as the only explanatory variables.

Evaluate the estimation results:

(a) Do the estimated coefficients have the expected signs?

(b) Are both coefficients significantly different from zero?

(c) How closely do the predicted shares match the actual shares of houses with each heating system?

(d) The ratio of coefficients usually provides economically meaningful information. The

willingness to pay (wtp) through higher installation cost for a one-dollar reduction in operating costs is the ratio of the operating cost coefficient to the installation cost coefficient. What is the estimated wtp from this model? Is it reasonable in magnitude?


Question 2

Add alternative-specific constants to the model for alternatives 1-4. Remember that you can only

add 4 constants since there are 5 alternatives; by adding constants for alts 1-4, you are normalizing the constant for alt 5 to zero.

(a) How well do the estimated probabilities match the shares of customers choosing each

alternative? Note that they match exactly: alternative-specific constants in a logit model insure that the average probabilities equal the observed shares.

(b) Suppose you had included constants for alternatives 1,3,4, and 5, with the constant for

alternative 2 normalized to zero. What would be the estimated coefficient of the constant for alternative 1? Figure this out logically rather than actually estimating the model.

Question 3

Now try some models with sociodemographic variables entering.

(a) Enter installation cost divided by income, instead of installation cost. With this specification, the magnitude of the installation cost coefficient is inversely related to income, such that high

income households are less concerned with installation costs than lower income households. Does dividing installation cost by income seem to make the model better or worse?

(b) Instead of dividing installation cost by income, enter alternative-specific income effects.

Do similarly for alts2-4, with the coefficient for alt 5 normalized to zero.

What do the estimates imply about the impact of income on the choice of central systems versus room system? Do these income terms enter significantly?

(c) Try other models. Determine which model you think is best from these data.

Question 4

We now are going to consider the use of the logit model for prediction. Specify a model with

installation costs, operating costs, and alternative specific constants. Run the model (or retrieve your previous output). You'll be using this model for prediction below.

Question 5

The California Energy Commission (CEC) is considering whether to offer rebates on heat   pumps. The CEC wants to predict the effect of the rebates on the heating system choices of customers in California. The rebates will be set at 10% of the installation cost.

Using the estimated coefficients from the model in part 4, calculate predicted shares under this

new installation cost instead of original value. How much do the rebates raise the share of houses with heat pumps?

PART 2


The data file contains data on the choice of heating and central cooling system for 250 single-

family, newly built houses in California. The data are held in the "long" format, where each line of data represents an alternative.

The alternatives are:

1. Gas central heat with cooling

2. Electric central resistence heat with cooling

3. Electric room resistence heat with cooling

4. Electric heat pump, which provides cooling also

5. Gas central heat without cooling

6. Electric central resistence heat without cooling

7. Electric room resistence heat without cooling

The variables are:

1. idcase: identifies the house, 1-250.

2. idalt: identified the alternative, 1-7

3. Depvar: identifies whether the alternative was chosen, 1=chosen, 0=nonchosen 4. Installation cost, in dollars

5. Operating cost, in dollars per year

6. Income of household, in thousands of dollars per year (such that 20 means $20,000 income)

You have to start to estimate a model with the cooling alternatives (1-4) in one nest and the non- cooling alternatives (5-7) in another nest.

Question 1:

Review the BioGeme code and make sure you understand how the code works.

(a) What does the estimated log-sum value tell you about the degree of correlation in unobserved factors over alternatives within each nest?

(b) Test the hypothesis that the log-sum coefficient is 1.0 (the value that it takes for a standard logit model.) Can the hypothesis that the true model is standard logit be rejected?

(c) Which nest is estimated to have the higher correlation in unobserved factors? Can you think of a real-world reason for this nest to have a higher correlation?

(d) What is the estimated willingness to pay for operating costs?

Question 2:

Re-estimate the model with the room alternatives (3 and 7) in one nest and the central alternatives (1 2 4 5 6) in another nest. (Note that a heat pump is a central system.)

(a) What does the estimate imply about the substitution patterns across alternatives? Do you think the estimate is plausible?

(b) Is the log-sum coefficient significantly different from 1?

(c) How does the value of the log-likelihood function compare for this model relative to the

model in exercise 1, where the cooling alternatives are in one nest and the heating alternatives in the other nest.

Question 3:

Rewrite the code to allow three nests. Estimate a model with alternatives 1,2,3 in a nest,

alternative 4 in a nest alone, and alternatives 5,6,7 in a nest. Does this model seem better or worse than the model in exercise 1, which puts alternative 4 in the same nest as alternatives 1,2,3.