Data Mining

Subject MAST90028 (2010)

Note: This is an archived Handbook entry from 2010.

Credit Points: 12.50
Level: 9 (Graduate/Postgraduate)
Dates & Locations:

This subject has the following teaching availabilities in 2010:

Semester 1, Parkville - Taught on campus.
Pre-teaching Period Start not applicable
Teaching Period not applicable
Assessment Period End not applicable
Last date to Self-Enrol not applicable
Census Date not applicable
Last date to Withdraw without fail not applicable

Timetable can be viewed here. For information about these dates, click here.
Time Commitment: Contact Hours: 36 hours comprising one two-hour lecture per week and one one-hour practical class per week.
Total Time Commitment: Not available
Prerequisites: None
Corequisites: None
Recommended Background Knowledge:

It is recommended students complete a second year statistics subject (such as 620-202 [2008] Statistics or its equivalent) and have had some exposure to computer packages.

Non Allowed Subjects: None
Core Participation Requirements:

For the purposes of considering requests for Reasonable Adjustments under the Disability Standards for Education (Cwth 2005), and Students Experiencing Academic Disadvantage Policy, academic requirements for this subject are articulated in the Subject Description, Subject Objectives, Generic Skills and Assessment Requirements for this entry.

The University is dedicated to provide support to those with special requirements. Further details on the disability support scheme can be found at the Disability Liaison Unit website:


Dr Guoqi Qian


Subject Overview:

Data Mining refers to the management and analysis of large data sets.

Data Mining became possible with the advent of large-scale data collection and the computing power necessary to process it. It involves all of the following steps

1. Data Warehousing

2. Data Cleaning

3. Data Description and Visualisation

4. Data Analysis and Interpretation

This course deals only with step 4 of the Data Mining process: data analysis and interpretation. It considers techniques for Rule Finding, Classification, Regression and Clustering. The themes that run through the course are:

1. Model fitting and selection and how to avoid overfitting

2. Scalable algorithms that can be used with very large data sets

3. How to acommodate high-dimensional data

4. Actionability and interpretability of models

Objectives: After completing this subject, students should:

  • understand many of the techniques used to analyse large data sets;
  • have acquired skills and techniques widely used in modern data mining; and
  • have gained the ability to pursue further studies in this and related areas.

Up to 40 pages of written assignments (20%: two assignments worth 10% each, due mid and late in semester), a three-hour written examination (80%, in the examination period).

Prescribed Texts: None.
Recommended Texts: TBA.
Breadth Options:

This subject is not available as a breadth subject.

Fees Information: Subject EFTSL, Level, Discipline & Census Date
Generic Skills:

Upon completion of this subject, students should develop:

  • problem-solving skills (especially through tutorial exercises and assignments) including engaging with unfamiliar problems and identifying relevant strategies;
  • analytical skills including the ability to construct and express logical arguments and to work in abstract or general terms to increase the clarity and efficiency of the analysis; and
  • ability to work in a team, through interactions with other students.

Related Course(s): Master of Science (Mathematics and Statistics)

Download PDF version.