- Email: email@example.com
- Thesis title: Statistical Learning in highly correlated high-dimensional data.
I attended the University of Leeds for my undergraduate degree between 2011 and 2015. I completed an integrated masters course and graduated with an MMath degree from the department of Mathematics in 2015. Throughout my degree I grew an interest in statistics, specifically statistical modelling, and specialised in statistics in my final year. I also completed a masters project titled "Modelling Species Abundance" under the supervision of Dr Andrew Baczkowski. In my final year I took a module called Statistics and DNA which piqued my interest in how statistics could play a part in medicine. The idea that statistics could be used to potentially help save lives was fascinating to me. This therefore led me onto applying for a PhD which had a strong link to pharmaceuticals and medicine.
In September 2015, I started my PhD at the University of Leeds in the Department of Statistics under the primary supervision of Dr Arief Gusnanto and co-supervision of Professor Charles Taylor. My research involves data collected on copy number alterations. Copy number alterations (CNA) are the duplications and deletions of chromosomes which occur along the genome, which is usually a consequence of cancer. Copy number alteration profiles are collected using next generation sequencing technology which is a technology which sequences an entire human genome. This is crucial topic as CNA data has become a determinant for tumour subtyping, hence is important when considering which drug to prescribe a patient suffering with cancer. The topic of my PhD research is to investigate statistical learning or classification when the data are high-dimensional and highly-correlated. This entails identifying variables or, in our example case, genomic regions that would have discriminating power to distinguish tumour subtypes and investigating sparse variance-covariance structure in the data. Lastly, in those two issues, statistical inference is critical to understand whether the pattern that we see are real with some probability. My work so far has been investigating whether some pattern of CNA profiles in each genomic region is statistically different between patients in one subtype group relative to the other subtype group. To do this I have developed a modified Cramer-von Mises type hypothesis test which is more powerful at identifying regions of interest than other standard hypothesis tests.
Alongside my research I am actively involved in teaching and marking exams for undergraduate statistics modules. I have attended various seminars and have presented a poster at the Big Data Analytics conference. I have also acted as PGR Representative since January 2016.
I expect to submit my thesis and complete my PhD in 2019.
- MMath mathematics, University of Leeds (4 Years)