📖Program Curriculum
Course modules
Compulsory modules
All the modules in the following list need to be taken as part of this course.
Introduction to Bioinformatics using Python
Module Leader
Dr Alexey Larionov
Aim
This module provides a general introduction to bioinformatics and fundamentals of programming. The module covers the programming basics required by students in order to program in Python, which is nowadays becoming one of the most popular programming languages in the bioinformatics community; and its application in retrieving, parsing and visualising biological sequence data.
Syllabus
Fundamentals of Python programming.
Introduction to Object Oriented Programming (OOP).
Simple mathematical operations.
Modules in Python.
Various data types and Objects.
Control Statements.
Lists, Tuples and Dictionaries.
Functions.
File IO.
Programming for biology using BioPython
DNA sequence manipulation using Perl
Reading protein files
Performing Multiple Sequence Alignment
BLAST
Data Visualisation
Biological data format.
Intended learning outcomes
On successful completion of this module you should be able to:
Identify the most important programming structures.
Retrieve relevant nucleotide, protein sequences and their corresponding metadata from online public data resources.
Develop custom Python scripts for sequence manipulation.
Develop Python scripts to automate data handling and curation tasks.
Develop advanced stand-alone Python programs for the acquisition and consolidation of data from remote databases.
Exploratory Data Analysis and Essential Statistics using R
Module Leader
Dr Maria Anastasiadi
Aim
This module aims to provide you with an overview of important concepts in statistics and exploratory data analysis. The module introduces the main concepts in analysing biological datasets using the R environment, as well developing bespoke scripts for multivariate analysis such as principal component analysis and hierarchical clustering.
Syllabus
Introductory statistics – averages, variance and significance testing.
Data pre-processing techniques.
Exploratory data analysis using unsupervised methods (PCA, HCA).
An introduction to R.
Intended learning outcomes
On successful completion of this module you should be able to:
Devise basic R programs to meet given specifications.
Critically assess the basic principles of different statistical techniques, be able to implement them programmatically and effectively integrate and devise statistical methods into experimental protocol design.
Apply different data pre-processing techniques.
Describe the difference between univariate and multivariate analysis.
Apply exploratory data analysis using unsupervised multivariate analysis methods.
Next Generation Sequencing Informatics
Module Leader
Professor Fady Mohareb
Aim
To introduce you to the techniques that have given rise to the genomic data now available and develop skills and understanding in the bioinformatics approaches that facilitate evaluation and application of these data. Over the past decade, Next-generation DNA Sequencing (NGS) technology has been a huge stimulus for a lot of breakthrough discoveries in biology. This module provides an overview of many core types of NGS projects, including latest protocols in genomic and transcriptomic analyses, genotyping and variant calling as well as detailed hands-on practical sessions of our best practice data-analysis workflows.
Syllabus
Gene expression analysis using microarray.
Introduction to Next Generation Sequencing (NGS) Technology.
Overview of genome assembly and quality control.
Transcriptome informatics.
Sequence data analysis web platforms.
Geneotyping and variant calling.
Intended learning outcomes
On successful completion of this module you should be able to:
Critically evaluate the operation of the most common analytical techniques used in the acquisition of genomic sequence and expression data.
Apply various techniques to overcome the challenges of dealing with sequence data and be able to identify and apply appropriate software tools to tackle these challenges.
Perform gene expression profiling using both first and next generation sequencing data.
Critically assess current practices and evaluate the relative strengths and weaknesses of the techniques covered and how these relate to the quality of the biological findings that result
Critically contrast a range of NGS tools and related sequence software tools for NGS applications, and interpret the output from those tools.
Application of Bioinformatics in Epigenetics, Proteomics and Metagenomics
Module Leader
Dr Alexey Larionov
Aim
To provide you with the knowledge of the current trends in analysing epigenomic, proteomic, and metagenomic data and to demonstrate its principles, challenges, and complexities in bioinformatics.
Syllabus
Introduction of DNA methylation data (Illumina Infinium HumanMethylation450 and EPIC BeadChips).
Quality control, pre-processing, and analysis of DNA methylation data through a standard pipeline.
Application of bioinformatics on DNA methylation data to assess phenotypic outcomes.
Introduction to practical proteomics (qualitative & quantitative).
Proteomics repositories (PRIDE, PeptidAtlas, etc.).
Protein/peptide identification algorithms.
Protein structures and molecular modelling.
Soil metagenomics: quality control, filtering and assembly to taxonomic classification, clustering, and functional assignment.
Analysis of microbial community composition and comparative metagenomics.
Intended learning outcomes
On successful completion of this module you should be able to:
Synthesise information to discuss the key technological development in the acquisition of epigenomic, proteomic and metagenomic data.
Explain the mode of operation of the most common analytical techniques and how these relate to the quality of the data acquired.
Critically assess current practices and identify the relative strengths and weaknesses of the techniques covered .
Discover information using bioinformatics tools and effectively apply the information to biological problems.
Participate in scientific discussions regarding the omic technologies and evaluate scientific results.
Machine Learning for Metabolomics
Module Leader
Dr Maria Anastasiadi
Aim
During this module you will learn about the main aspects related to the analysis of the metabolic profile in living organisms and explore statistical and computational techniques that are central to the field of metabolomics with particular emphasis to machine learning. Machine learning is a rapidly expanding form of artificial intelligence (AI) which has found many applications in the field of metabolomics. Examples include explanatory analysis of complex biological systems, novel biomarker discover and prediction modelling.
Syllabus
Metabolomics: overview and workflow.
Multivariate classification and biomarker discovery.
Introduction to machine learning.
Applications of machine learning in metabolomics.
Advanced topics in machine learning.
Applications of machine learning in food metabolomics.
Introduction to image analysis..
Advanced topics in R.
Intended learning outcomes
On successful completion of this module you should be able to:
Critically assess various metabolomics analytical and spectral platforms.
Apply state-of-the-art best practices in machine learning to fit the purpose of the analysis.
Develop classification and regression models based on multivariate metabolic data.
In-depth understand and application of machine learning algorithms and be able to provide examples of specific machine learning algorithms for each task.
Apply statistical and machine learning procedures covered during the module, to derive biological relevant information from metabolic datasets using R.
Programming Using Java
Module Leader
Professor Fady Mohareb
Aim
To introduce you concepts of object oriented programming using Java. Java is the pre-eminent programming language for serious application development on the Internet. The module covers Java data objects of primitive and reference data types and introduces you to the basic fundamentals of programming in Java, with hands-on practical sessions on implementing simple programs using calculations, variables, control statements and loops.
Syllabus
Fundamental principles of programming in Java,
Object-oriented programming using Java,
Variables and calculations,
Strings,
Arrays, ArrayLists and HashMaps,
GUI programming.
Intended learning outcomes
On successful completion of this module you should be able to:
Identify and apply the most important programming structures.
Develop Java programs to meet given specifications.
Implement custom Java classes, interfaces, and packages.
Implement standalone application interfaces using Java Swing Components.
Data Integration and Interaction Networks
Module Leader
Dr Tomasz Kurowski
Aim
Data integration represents a major challenge for bioinformatics research. This module covers the most popular data management, integration and visualisation tools within the bioinformatics community as well as the main concepts of databases design and normalisation.
Syllabus
Database design and normalisation,
Development of database access interfaces,
Design and implementation of data repository Web front-ends,
Techniques to integrate, interpret, analyse and visualise biological data sets
Introduction to interaction networks,
Data Integration and visualisation.
Intended learning outcomes
On successful completion of this module you should be able to:
Utilise systems software for the visualisation of systems and system interactions.
Critically apply available tools for data integration.
Design, normalise and implement databases for experimental datasets.
Critically assess the main data standards protocols for genomics, as well as the current approaches for modelling and warehousing of life science data.
Discover systems relationships between data using bioinformatics tools and approaches.
Advanced Sequencing Informatics and Genome Assembly
Module Leader
Professor Fady Mohareb
Aim
This module aims to develop a system-level view of biological systems and their response to various internal and external factors, through the integration of advanced NGS and 3GSsequencing data with functional annotation using established concepts of graph theories widely applied for various assemblers such as de-Brujin and Overlap-layout consensus. This module gives an insight on the details of omic-scale/big-data-driven life science making use of core platform technologies.
Syllabus
How research is conducted in genome bioinformatics and within the broader context of interdisciplinary life sciences.
Advanced Java programming.
Application of graph-theory using Java.
Advanced Next-Generation Sequencing informatics.
De-novo genome assembly.
Gene prediction and functional annotation.
Intended learning outcomes
On successful completion of this module you should be able to:
Critically assess the technical limitations and the underlying biological and experimental assumptions that impact on data quality.
Apply and optimise various algorithms for short and long reads sequence assembly.
Successfully develop and optimise de-novo genome assemblies for various species.
Develop in-silico gene prediction models and functional annotation.