Web Search and Text Analysis

Subject COMP90042 (2012)

Note: This is an archived Handbook entry from 2012.

Credit Points: 12.50
Level: 9 (Graduate/Postgraduate)
Dates & Locations:

This subject has the following teaching availabilities in 2012:

Semester 1, Parkville - Taught on campus.
Pre-teaching Period Start not applicable
Teaching Period not applicable
Assessment Period End not applicable
Last date to Self-Enrol not applicable
Census Date not applicable
Last date to Withdraw without fail not applicable

Timetable can be viewed here. For information about these dates, click here.
Time Commitment: Contact Hours: 24 one-hour lectures (two-per week), 12 one-hour workshops (one-per week.
Total Time Commitment:

120 hours.


One of the following:

Study Period Commencement:
Credit Points:


Recommended Background Knowledge:


Non Allowed Subjects:

433-460 Human Language Technology
433-467 Text and Document Management
433-660 Human Language Technology
433-667 Text and Document Management
433-476 Text and Document Management

Core Participation Requirements:

For the purposes of considering request for Reasonable Adjustments under the Disability Standards for Education (Cwth 2005), and Students Experiencing Academic Disadvantage Policy, academic requirements for this subject are articulated in the Subject Description, Subject Objectives, Generic Skills and Assessment Requirements of this entry.The University is dedicated to provide support to those with special requirements. Further details on the Disability support scheme can be found at the Disability Liaison Unit Website:http://www.services.unimelb.edu.au/disability/


Assoc Prof Steven Bird


Associate Professor Tim Baldwin

email: tbaldwin@unimelb.edu.au

Subject Overview:

The web is a vast and expanding storehouse of semi-structured textual information. Accessing and processing this information is one of the major challenges of the information age. In this subject, students study the technologies behind search engines, spam filtering, plagiarism detection, information extraction, question answering and newly emerging fields of information engineering. Topics include: web indexing, query evaluation, probabilistic language modelling, document classification and filtering, grammar and spelling correction, topic detection, cross-language information retrieval, machine translation and summarisation.


On completion of this subject students should be able to:

  • Articulate issues relevant to the efficient implementation of web search systems and information retrieval systems
  • Apply information retrieval methodologies as they relate to textual data
  • Apply symbolic and statistical natural language processing techniques in textual analysis tasks
  • Develop and evaluate computational models of language
  • Apply core information engineering technologies in the management and exploitation of online information

  • Two collaborative and/or individual projects due around weeks 6 and 11 of semester expected to take about 36 hours (20% each)
  • A research-oriented workshop presentation (10%)
  • And an end-of-semester written examination not exceeding 3 hours (50%).
Prescribed Texts: None
Breadth Options:

This subject is not available as a breadth subject.

Fees Information: Subject EFTSL, Level, Discipline & Census Date
Generic Skills:

On completion of this subject students should have the:

  • Ability to undertake problem identification, formulation, and solution
  • Ability to utilise a systems approach to complex problems and to design for operational performance
  • Ability to manage information and documentation
  • Capacity for creativity and innovation
  • Ability to communicate effectively, with the engineering team and with the community at large

Related Course(s): Bachelor of Computer Science (Honours)
Master of Engineering in Distributed Computing
Master of Science (Computer Science)
Master of Software Systems Engineering
Related Majors/Minors/Specialisations: B-ENG Software Engineering stream
Computer Science
Master of Engineering (Software)

Download PDF version.