Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STP 429 – Applied Regression

Lab #1 : Simple Linear Regression (100 points)

Website for data collection:https://www.transtats.bts.gov/

On the left side, under Data finder, click Aviation

Under Databases, click Airline On-Time Performance Data

Click: Reporting Carrier On-Time Performance

On the left side, under Data tools, click Download

Data to be collected:

Include your state, Filter year as 2022 and Filter period as June as shown in the example below:

 

For the field names include the following (by checking the appropriate boxes):

DayofMonth, Airline  (Reporting Airline), Destination (Dest, DestCityName, DestState), Arrival Performance (all fields), Cause of Delay (all fields), Departure Delay, Distance

Feel free to include any additional fields that you think might be useful for your analysis.  Once you have identified all fields, click Download to get a Zip file.  Once opened, it will create a CSV file that you will be able to import into SAS.

For the analysis:  Only include flights from the destination city in your state with the largest number of observations and with arrival delays greater than zero (Using an IF statement in the data step).

Dependent variable:  Arrival Delay (min)

Independent variables:  Use at least all 5 delay variables.

Goal for the analysis:  Determine which of the independent variables can produce the best model to predict arrival delay time.

Your step-by-step analysis should include:

•    Research Question  Develop a research question to explain why you are doing this analysis.  What are you trying to learn from the data?

•    Exploratory Data Analysis – all variables that are used at the beginning of the analysis should be       graphed as tables or charts.  Each table/chart should be analyzed for distributions/potential outliers. For all flights in June, determine the percentage of cancelled flights in June using the cancelled flight indicator in the database.  Additionally, produce a histogram for the “Departure Delay” and note if there were flights that would be considered unusual values.

•    Correlation Analysis (PROC CORR) – all potential independent variables should be correlated with the    dependent variable.  This should include scatterplots and relevant statistics.  You should also determine if distance is correlated with either arrival or departure delay times and if there is a correlation between arrival and departure delay times.

•    Simple Linear Regression (PROC REG) – The final independent variable should be regressed on the          dependent variable.  Report the model in equation form and interpret each coefficient of the model in    the context of the data.  Analysis of the regression output should be completed with careful attention to the relevant parts of the SAS regression output.

After completing your analysis, you will need to create a statistical report in the following format:   Abstract/Executive Summary (20 points) – High level summary of the purpose, data and summary.

Data (10 points) Discuss the data to include:  number of observations, variables used, why you chose these variables.

Methodology (10 points)– What statistical procedures did you use to analyze the data.

Results (50 points) This should include: analysis and results of each step, what decisions you made to proceed to the next step and why.  All graphs/tables should be included and referenced, preferably in an Appendix.  All   SAS code should also be included in the Appendix, not in the body of the report.

Final Conclusions and Next Steps (10 points) What were your overall conclusions from this analysis?  Were there different steps that you would have taken or different data that you would have used if you were to     complete this analysis again?

Submission Guidelines Submit the report (including all SAS code and graphs) in a PDF to be uploaded to Canvas by Sunday, September 18 at 11:59 PM.  No late submissions will be accepted.