Assignment 2 (APH415 Survival Analysis)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Assignment 2 (APH415 Survival Analysis)
Background
Lung adenocarcinoma (LUAD) remains a leading cause of cancer-related mortality. The GSE72094 dataset (GEO Accession viewer) comprises gene expression profiles from resected tumors of 442 LUAD patients, accompanied by detailed annotations of prevalent driver mutations (e.g., KRAS, EGFR), tumor suppressor mutations (e.g., STK11 and TP53), and key clinical characteristics including overall survival. While the original study focused on understanding how STK11 and TP53 mutations contribute to biological heterogeneity among KRAS-mutant tumors, it provides a large-scale microarray-based expression dataset together with rich clinical and mutational information.
Tasks and instructions
You will make use of the data file “Ass2_surv_data.csv ” downloaded from GSE72094 study. You will apply survival analysis methods to investigate whether certain clinical characteristics or genes are associated with patients’ overall survival time.
The dataset contains the following variables:
|
Variable |
Label |
|
ID |
patient ID |
|
Status |
failure indicator (censored = 0, death = 1) |
|
Age |
age (in years) |
|
Gender |
gender (male = M, female = F) |
|
Stage |
sub-stages of cancer (1A, 1B, etc.) |
|
Smoking_status |
Smoking (Ever, Never, etc.) |
|
Stage_main |
main stages of cancer (1, 2, 3, 4) |
|
time_day |
survival time (in days) |
|
Status |
failure indicator (censored = 0, death = 1) |
|
merck-xxx |
Gene expression represented by probe (feature) IDs from the Illumina HumanHT-12 V4.0 BeadChip platform (GPL15048), with their corresponding gene symbols provided below. |
One gene may have multiple probe IDs, and you may select one or more for each gene, but be careful with multicollinearity problems. R studio may transform the symbol “- “ into “_ ” .
|
Gene |
Probe IDs |
|
CCNA2 |
merck-NM_001237_a_at |
|
AURKA |
merck2-BE856617_at, merck-BC050630_at, merck- NM_017900_s_at, merck-NM_ 198436_s_at, merck- XM_933637_s_at |
|
AURKB |
merck-NM_004217_at |
|
FEN1 |
merck2-XM_937756_a_at, merck-NM_004111_s_at |
|
CD44 |
merck2-BM550721_at, merck2-BM792065_at, merck2- CR621045_at merck-BC004372_a_at, merck-NM_001001389_at |
|
CCND3 |
merck2-BQ669293_at, merck-NM_001760_at |
|
NCALD |
merck2-AF251061_at, merck-BC063428_s_at, merck- NM_001040630_at |
|
MACF1 |
merck2-AB029290_at, merck2-BP284289_a_at, merck2- BQ651417_a_at, merck2-CX870374_a_at, merck2- NM_033024_at, merck-AF141968_a_at, merck-AK023406_a_at, merck-AK023821_at |
|
LRC4 |
merck2-NM_007360_at, merck2-NM_013431_at, merck2- NM_013431_x_at, merck-NM_007360_at, merck-NM_013431_at, merck-NM_021209_s_at, merck-NM_ 176677_at |
|
NLRC4 |
merck-NM_021209_s_at |
|
PLEKHN1 |
merck-NM_032129_at |
|
RASIP1 |
merck-NM_017805_at |
|
SPP1 |
merck2-CA447290_at, merck2-DQ892544_at, merck- BG211014_a_at, merck-NM_000582_at, merck-NM_024790_s_at |
|
GPT2 |
merck2-BX099266_at, merck-BC062555_s_at, merck- NM_001147_at, merck-NM_ 133443_s_at |
|
SGPL1 |
merck-ENST00000299297_at, merck-NM_003901_a_at |
|
PCOLCE2 |
merck2-AK223633_at, merck-NM_013363_at |
You must perform appropriate statistical methods (from descriptive statistics, simpler methods e.g., univariate tests, to more complex multivariate models) to investigate this research question and report your findings. Statistical analysis will be carried out using a standard platform such as R, STATA, SPSS and SAS. You have to first set the data as ‘survival data’. You need to perform the analysis and report the results using suitable tables and graphical summary. You need to test the assumptions of the model or other diagnostic statistics where necessary. You also need to produce a regression equation where applicable. Finally, you have to present the command file of your statistical package (either R, STATA, SPSS or SAS) as an appendix.
You are expected to participate in the manuscript preparation for this study for an academic publication. As a biostatistician, you are responsible for the writing of the Method and Results Sections (including the tables for descriptive statistics and tables/figures for inferential statistics, etc.). Also, you need to provide your interpretations on the findings. You are particularly encouraged to produce tables/figures for data summary, however, the maximum allowed is six. This work should be no more than 1,200 words (excluding tables, figures, command coding or references).
This task comprises 40% of your total grade for the module. Your answers will be assessed using the XJTLU Assessment Criteria released to you previously.
NOTE: All students are not allowed for any form of academic offence—including, but not limited to, collusion with other students and the undeclared use of generative artificial intelligence — is strictly prohibited and will be investigated under the University’ s Academic Misconduct Regulations.
Submission
You need to submit one document, containing the report, related tables/figures, and the command file, in PDF. The document should be named
You must submit your work via Learning Mall Online. All penalties for late or incomplete submission apply.
2025-11-05