## 1. Salary differences by gender

A researcher is interested in whether there is gender discrimination in salaries in an industry and gets a random sample of male and female wages in the industry This data is available on my website and assumes binary genders

(a) Estimate the mean salaries for men and for women.

(b) Estimate the variance in salaries for men and for women.

(c) Does the data suggest that men earn more than women on average, assuming different variances?

(d) Examine whether the assumption of different variances is correct and make any appropriate adjustments to your test.

(e) Explain briefly why this might not answer the question of interest.

## 2. Regression

In order to control for the possibility that wages might be affected by things other than gender, we might use regression analysis.

Estimate the regression below Salary i = 00+ ,(31 Educations +132 Experience i + 133Gender i + E.

(a) How do you interpret the coefficient estimate of /33 and the R2?

(b) Are the coefficient estimates statistically significant?

(c) What does the F-test tell us?

(d) Do the results agree with Question 1 and if not why might this be?

Now estimate a new regression as follows Salary i = 00+01 Education i+02 Experience i+03 Gender i+04 Education*Gender i+05 Expe ience*Gender i+E i.

(e) Compare this regression to that of part a.

(f) Explain why this might not answer the question of whether there is gender discrimination in salaries in the industry.

## 3. Principles of Regression

Consider the following multiple regression model: yi = /30 + 131xj,1 + 02×2,2 + E.

(a) Explain how the ordinary least squares estimator for 3 = (00, 01, 02) is determined and how the expressions for the coefficient estimators b = (b0, b1, b2) are derived.

(b) Which assumptions are needed to make b an unbiased estimator for 0 and why? An unbiased estimator is one where the expected value (not the actual value obtained from the sample) of b = 0.

(c) Explain how one can test the hypothesis that 01 = 1.Which additional assumptions are needed and why?

(d) Suppose that xj,i = 2×2,2. What will happen if you try to estimate the above model?

(e) Again assuming xi,1 = 2×2,2, how will estimating yi = 00* + 0*ixi,2 + El differ from estimating yi = 00 +131X0 ±Ei and what does this imply for the units with which the variables are measured?

(f) Now suppose that xi,1 = x2,2 + 2/2: where x2,2 and ui are uncorrelated. How will estimating yi = 00 + 01×0 + 02×2,2 + Ei differ from estimating yi = 00* + 0*1×2,1 + 02*ui + E?