Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Statistical Programming with SAS

(Fall 2022)

Mid-term project

1. This is an individual project. You are provided with an un-cleaned clinical baseline worksheet named BaselineRaw in one .xlsx file; Each student has a different dataset containing data from 20,000 patients;

2. You will need to upload the .xlsx file to your SAS OnDemand account and work on it with SAS Studio; After you finished your project, you will need to download your report and .sas program to submit to Canvas.

3. You need to use Libname xlsx engine OR Proc IMPORT to read the data file into SAS.

4. Do all the work (including reporting) within SAS, don’t edit the report with any non-SAS software, except that you can open the report file created from SAS in MS Word software to view your report before submission;

5. The Baseline data values (in BaselineRaw worksheet) have some problems and you need to clean it before doing calculations and analysis as instructed below:

6. Data cleaning (only for BaselineRaw worksheet):

a. For the BaselineRaw worksheet, correct some spelling errors and other problems in the values in these two variables ‘Sex’ and ‘Group’ so that:

- If it is na or n/a in any of these two variables, set it to missing;

- ‘Sex’ should have value of “F” or “M” only, missing values should remain missing;

Note:

1. ‘f’, ‘fema’, ‘female’, ‘F’ should be coded as ‘F’;

2. ‘m’, ‘Male’, ‘M’, ‘mal’ or ‘male’ should be coded as ‘M’;

- ‘Group’ should have values “Placebo” and (“Drug1” or “Drug2”) only, with no leading or trailing blanks. Missing values should remain missing;

Note:

1. Any value containing “placebo’ (case-insensitive) should be coded as “Placebo” (case-sensitive). Any value containing “Drug1” or “Drug2” (case-insensitive) should be coded as “Drug1” or “Drug2” (case-sensitive);

2. All other values should be set to missing;

b. If the values in variable Study_end_Date is not the date 26Sep2022 or if it is missing, set it to 26Sep2022;

c. After these, name the cleaned SAS dataset as Baseline;

7. Do these in the same data step:

a. In the clean Baseline dataset, calculate Age at enrollment as this:

Age = int((EnrollDate - DOB)/365.25);

b. Create a new variable named BMI and calculate the BMI and keep only one decimal place:

The original Weight is in kg and Height is in meter, so use this formula:

BMI = Weight/Height**2;

c. Create a new variable called cWeight and group patients into these 6 categories based on BMI as shown in this table:

cWeight

BMI

Underweight

BMI < 18.5

Normal weight

BMI >=18.5 to 25

Overweight

BMI >= 25 to 30

Obese

BMI >= 30 to 40

Morbidly Obese

BMI >= 40

.

Note: ‘.’ Is the numeric missing; blank is the character missing;

d. Calculate the average payment from three payments (Payment1, Payment2, Payment3) in the Baseline dataset and name it as AvgPay, keep two decimal places; Label this variable as ‘Average Payment’;

e. If there are missing in variable Covid_19, code it as ‘NA’. Label this variable Covid_19 as ‘Covid PCR test’.

8. Now, after above step 7, if there are any complete duplicates (exactly the same observation across all variables), remove all complete duplicates, and output all removed complete duplicates into a separate SAS dataset named Duplicated and report later;

9. Now this final dataset (already removed complete duplicates) is the Cleaned dataset. Please save this for your Final-term project. Please save it as a SAS dataset with the name: “your SIS_forfinal”. Example: zhc2006_forfinal.

10. Your mid-term report consists of two files: Report file and SAS program file

- Report file:

1. The report should be in a .rtf file.

2. The file name should be: “your Full name (your SIS ID) – Mid-term report.rtf”. Example: Zhengming Chen (zhc2006) – Mid-term report.rtf;

3. The first page should have a header “Mid-term report / Your full name / date finished” as shown in example report below, and this should appear in the first page only (see example below);

4. It contains a Table-1: Name this table with a title “List of the first 100 observations after cleaning and calculation”:

To list ONLY the first 100 observations after sorted by ID in ascending order from the Cleaned (the final dataset) dataset. Only list the data for these variables and in this order:

Obs, ID, Sex, Age, Group, EnrollDate, Study_end_date, BMI, cWeight, AvgPay, Covid_19;

Note:

1. ID should have a label and shown in report as ‘Unique Patient ID’;

2. cWeight should have a label and shown in report as ‘Weight Status’;

3. Covid_19 should have label and shown in report as ‘Covid PCR test’.

4. AvgPay should have label and shown in report as ‘Average Payment’.

5. It is your choice to use the variable name or label for other variables in your report;

5. It contains a Table-2: Name this table with a title: “List of the first 100 removed complete duplicates from baseline data”:

To list the first 100 removed complete duplicated baseline observations after sorted by ID in ascending order. If less than 100, list all of them. Only list the data for these variables and in this order:

Obs, ID, Sex, Age, Group, EnrollDate, Study_end_date, BMI, cWeight, AvgPay;

6. It contains a Table-3: Name this table with a title: “List of the first 100 observations that have data issues in either of these variables from BaselineRaw data: Sex, or Group, or Study_end_date”:

1. To list the first 100 observations that have data issues in the original BaselineRaw after sorted by ID in ascending order. If less then 100, list them all.

2. And only show the original data for these variable in this order: Obs, ID, Sex, Group, Study_end_date

7. Table-1, Table-2, and Table-3 should be sorted by ID in ascending order;

8. All the date values in the report should be in the form of mm/dd/yyyy in the report;

- SAS program file:

This is your only .sas program file that did all your work.

11. If there are any other requirements not mentioned in this instruction, please use common sense and make judgement yourself.

12. Please submit the Report file and SAS program file via Canvas before this time: 02Nov2022, 3:45pm

Example report: