Find a data set (no time series data) with:
1. A minimum of 50 observations, no precise limit on the maximum number of observations
2. At least 5 independent variables, maximum number of independent variables approximately 15. Ideally, I would like you to have at least 40% of your independent random variables to be continuous.
3. One possibility is to use stepwise regression first to reduce the number of independent variables.
Can also use principal components or other methods.
4. Ideally, I would like you to have at least 3-5 independent variables left after stepwise regression. Also, test for 2nd order variables and interactions.
5. At the beginning, tell me some background on the data set. That is, tell me where you got the data from (including a link if it was from the web). Describe the independent variables, the dependent variables (this can also be listed in a chart). Tell me what you thought may happen before you analyzed the data in terms of which variables you thought were most important and the signs of the coefficients (positive or negative).
6. Run the analysis.
7. For each possible model you consider, list R2, R2adj , PRESS, MSE, Cp.
8. Check multicollinearity, outlier observations, influential observations, heteroscedasticity, and the rest of the assumptions. Do any plots, tests and transformations that are necessary. You can cut and paste these results into a word or any other wordprocessing program and describe what you see in the plots, what the tests are for, and what they tell you.
9. At the end, tell me the best model and why you picked this model. Also, explain what each of the coefficients in the final model mean in layperson terms. Finally, just give a conclusion of your overall thoughts of what the model means in predicting your dependent variable.