layoutGraphic

test

Structure

Programme Structure:

The programme requires full time study for 1 year. It comprises five core modules, one 60 credit dissertation/project and two elective courses.

Students are required to take five core modules:

  • MT5756 Data Analysis: a 4-week intensive 20-credit course.
  • MT5753 Statistical Modelling: a 4-week intensive 20-credit course which currently exists as part of the MRes programme.
  • MT5757 Advanced Data Analysis: a 4-week intensive 20-credit course.
  • MT5758 Applied Multivariate Analysis: a standard full semester 15-credit course
  • MT5759 Knowledge Discovery and Datamining: a standard full semester 15-credit course.



    Programme structure over three semesters:

    course structure

The first semester follows a short intensive course structure. Each of these is a 20-credit module.

The second semester follows the standard structure of 15-credit lecture modules with some flexibility in the choice of modules. There are several potential modules within statistics including MT3607 Computing in Statistics (recommended), as well as modules outwith the School depending on the exact nature of the student's undergraduate degree and career intentions.

The third semester is dedicated to the production of an MSc dissertation. Dissertations will be either selected from a pre-determined set of topics, or a topic of the students own devising subject to approval of staff. Collaborations with industry partners provide relevant and challenging topics.

 



test

Programme Features:

In general the course is defined by:

  • practical and pragmatic tools for modern data analysis;
  • being focussed towards market requirements;
  • breadth favoured over depth, but instilling recognition of knowledge limitations. Subsequent emphasis is on competency in researching
  • statistical methods as required;
  • skills beyond statistical analysis but integral to the position of a competent data analyst, i.e. communication, presentation, research,
  • problem-solving, critical thinking, collaboration skills.

The core modules offered to this end are:

  • MT5756 Data Analysis
    This module covers: Types of data and their numerical and graphical treatment. Data entry/import/export. Basic probability theory and concepts of inference. Fundamental statistical concepts with particular emphasis on sampling issues. Basic statistical models and tests. Computer-intensive inference is introduced.


  • MT5753 Statistical Modelling
    This module covers: Regression modelling from standard linear models to Generalized Linear Models. Offsets, overdispersion, model selection and diagnostics are covered, as well as remedies for some violations of assumptions. High level treatment of Generalized Least Squares is also given.


  • MT5757 Advanced Data Analysis
    This module covers: Modelling methods relevant to situations where the data fails to meet standard model assumptions. Nonlinear models via nonlinear least squares, basic splines and Generalized Additive Models; Ridge Regression and Principal Components Regression for collinear data; models for non-independent errors. Pragmatic data imputation is covered with associated issues. Computer intensive inference is further expanded for more complex scenarios.


  • MT5758 Applied Multivariate Analysis
    This module covers: Multivariate data in general, and the contrast with univariate analysis. Basic matrix theory. Concepts of similarity/distance (simple and generalized) and metrics. Dimension reduction: eigenanalysis, Principal Components Analysis, types of Multidimensional Scaling, Factor Analysis and reduced space plotting. Locating multivariate gradients/trends and potential causes via indirect or direct methods (e.g. Redundancy Analysis). Locating clusters and predicting group membership: cluster analyses, discriminant analyses, along with automated selection of complexity. Practical implementation using commercial and research software.


  • MT5759 Knowledge Discovery and Datamining
    This module covers: What defines datamining and its necessity in the contemporary data-rich world. Massive data - their storage, access and analysis. History of data-mining and statistical versus computer science research threads. Prediction, description and the philosophy of models. Automation of model selection and measures of optimality including: training data, validation data, cross-validation, penalised fit statistics, the branch-and-bound algorithm. The classification problem in general. Theory and application of tree-type models/algorithms for classification and prediction. Bagging, boosting and the practical implementation of tree methods. Theory and application of Neural Nets. Practical implementation of methods using commercial datamining software. Case studies address applications of current note: segmentation of databases for targeting markets, fraud and fault detection.

top of page

Centre for Research into Ecological and Environmental Modelling - School of Mathematics & Statistics - University of St Andrews