Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 2 (APH415 Survival Analysis)

Background

Lung adenocarcinoma (LUAD) remains a leading cause of cancer-related mortality. The GSE72094 dataset (GEO Accession viewer) comprises gene expression profiles from resected tumors of 442 LUAD patients, accompanied by detailed annotations of prevalent driver mutations (e.g., KRAS,  EGFR), tumor suppressor mutations (e.g., STK11 and TP53), and key clinical characteristics including overall survival.  While the original study focused on understanding how STK11 and TP53 mutations contribute to biological heterogeneity among KRAS-mutant tumors, it provides a large-scale microarray-based expression dataset together with rich clinical and mutational information.

Tasks and instructions

You will make use of the data file  “Ass2_surv_data.csv ” downloaded from GSE72094 study. You will apply survival analysis methods to investigate whether certain clinical characteristics or genes are associated with patients’ overall survival time.

The dataset contains the following variables:

Variable

Label

ID

patient ID

Status

failure indicator (censored = 0, death = 1)

Age

age (in years)

Gender

gender (male = M, female = F)

Stage

sub-stages of cancer (1A, 1B, etc.)

Smoking_status

Smoking (Ever, Never, etc.)

Stage_main

main stages of cancer (1, 2, 3, 4)

time_day

survival time (in days)

Status

failure indicator (censored = 0, death = 1)

merck-xxx

Gene expression represented by probe (feature) IDs from the Illumina HumanHT-12 V4.0 BeadChip platform (GPL15048), with their corresponding gene symbols provided below.

One gene may have multiple probe IDs, and you may select one or more for each gene, but be careful with multicollinearity problems. R studio may transform the symbol “- “ into “_ ” .

Gene

Probe IDs

CCNA2

merck-NM_001237_a_at

AURKA

merck2-BE856617_at, merck-BC050630_at, merck-

NM_017900_s_at, merck-NM_ 198436_s_at, merck-

XM_933637_s_at

AURKB

merck-NM_004217_at

FEN1

merck2-XM_937756_a_at, merck-NM_004111_s_at

CD44

merck2-BM550721_at, merck2-BM792065_at, merck2-

CR621045_at merck-BC004372_a_at, merck-NM_001001389_at

CCND3

merck2-BQ669293_at, merck-NM_001760_at

NCALD

merck2-AF251061_at, merck-BC063428_s_at, merck-

NM_001040630_at

MACF1

merck2-AB029290_at, merck2-BP284289_a_at, merck2-

BQ651417_a_at, merck2-CX870374_a_at, merck2-

NM_033024_at, merck-AF141968_a_at, merck-AK023406_a_at, merck-AK023821_at

LRC4

merck2-NM_007360_at, merck2-NM_013431_at, merck2-

NM_013431_x_at, merck-NM_007360_at, merck-NM_013431_at, merck-NM_021209_s_at, merck-NM_ 176677_at

NLRC4

merck-NM_021209_s_at

PLEKHN1

merck-NM_032129_at

RASIP1

merck-NM_017805_at

SPP1

merck2-CA447290_at, merck2-DQ892544_at, merck-

BG211014_a_at, merck-NM_000582_at, merck-NM_024790_s_at

GPT2

merck2-BX099266_at, merck-BC062555_s_at, merck-

NM_001147_at, merck-NM_ 133443_s_at

SGPL1

merck-ENST00000299297_at, merck-NM_003901_a_at

PCOLCE2

merck2-AK223633_at, merck-NM_013363_at

You must perform appropriate statistical methods (from descriptive statistics, simpler methods e.g., univariate tests, to more complex multivariate models) to investigate this research question and report your findings. Statistical analysis will be carried out using a standard platform such as R, STATA, SPSS and SAS. You have to first set the data as ‘survival data’. You need to perform the analysis and report the results using suitable tables and graphical summary. You need to test the assumptions of the model or other diagnostic statistics where necessary. You also need to produce a regression equation where applicable. Finally, you have to present the command file of your statistical  package (either R, STATA, SPSS or SAS) as an appendix.

You are expected to participate in the manuscript preparation for this study for an academic publication. As a biostatistician, you are  responsible for the writing of the Method and Results Sections (including the tables for descriptive statistics and tables/figures for inferential statistics, etc.). Also, you need to provide your interpretations on the findings. You are particularly encouraged to produce tables/figures for data summary, however, the maximum allowed is six. This work should be no more than 1,200 words (excluding tables, figures, command coding or references).

This task comprises 40% of your total grade for the module. Your answers will be assessed using the XJTLU Assessment Criteria released to you previously.

NOTE: All students are not allowed for any form of academic offence—including, but not limited to, collusion with other students and the undeclared use of generative artificial intelligence — is strictly prohibited and will be investigated under the University’ s Academic Misconduct Regulations.

Submission

You need to submit one document, containing the report, related tables/figures, and the command file, in PDF.  The document should be named .pdf. For example, APH415 assignment two, 123456789.pdf.

You must submit your work via Learning Mall Online. All penalties for late or incomplete submission apply.