Queensland University of Technology   Brisbane Australia Skip bannerSkip to content A university for the real world - Built Environment and Engineering
QUT Home
Contact us Staff Directory A-Z Index
BEE Home About the Faculty Study Research, industry and community For Staff

Speech Analysis and Audio Processing

Research, industry and community
Research
Research funding
Industry collaboration
Events and conferences
Consulting and professional services
Community service
Faculty and research projects
  Airborne Avionics Research Group
  Airport Metropolis
  Dual Fuel
  Liquid Dessicant Solar Air-Conditioner
  Medical Engineering Research Facility
  Nanango TIE QUT Observatory
  Organic Photovoltaics
  Phenomena in Microgravity Laboratory
  QUT Motorsport
  Speech, Audio, Image and Video Technologies
  Transportation
  Tribology
  UAV Team
For research students

[Print-friendly version]

The demand for speech analysis has increased dramatically in recent years with the social demand for high quality speech and audio systems in compact and mobile applications. The objective of these projects is to develop improved systems using advanced signal processing.

Speech Coding at Very Low Bitrates

The objective of speech coding is to transform the analogue speech signal to a digital representation. The advantage of such a transformation is that it is easier to manipulate, it can be combined with other data, it has improved quality and reliability, and can be made secure. Redundancies, introduced in the speech signal during the human speech production process, make it possible to encode speech at very low bit rates. Moreover, our hearing system isn't equally sensitive to distortions at different frequencies and has a limited dynamic range. Speech coding techniques take advantage of these properties for reducing the bit rate.

The motivation for reducing the bit rates of speech signals is the demand for cost-effective implementation algorithms, to conserve bandwidth in both wired and wireless networks, and to conserve disk space in voice storage systems. Such applications are restricted rate voice communications, answering machines, pager messages, voice mail, and high capacity archiving.

The speech coding technology to achieve high voice quality is well developed for bit rates as low as 4.8 kbits/sec. This project focusses on bringing the rate significantly lower without seriously degrading the speech quality. The primary analysis technique is temporal decomposition.

Low Bitrate Speech Coding

Speech coding aims at representing speech in a compact digital form, while preserving intelligibility, naturalness, and quality of the reconstructed speech. The need for narrow-band and secure transmission for cellular, satellite, and military communications is the main reason behind the development of coders operating at 2.4 kb/s. Additionally, there is a trend towards integrating voice related applications in the context of multimedia communications, and Internet applications. These emerging technologies require greater level of speech compression.

At the targeted bit rate of 2.4kb/s, or below, the existing coders are either model-based, or hybrid coders. These coders are able to provide a high level of intelligibility. However, speech quality, naturalness, and speaker recognizability are all poor at such a bit rate.

Our research is devoted to developing and enhancing the performance of the hybrid coders operating at 2.4kb/sec bit rate and below. Specifically, the Prototype Interpolation Waveform (PWI) coder, and the Multi-Band Excitation (MBE) coder are the core coders to achieve good naturalness at rates below 2.4 kb/s. Multiscale techniques represented by the Wavelet Transform, together with Vector Quantisation (VQ) techniques is utilized to achieve appropriate speech decomposition. Our objective is to develop techniques able to achieve toll quality at 2.0-2.4 kb/s.

Real-time Speech Coding

Real-time implementation of low bit-rate speech coders has previously been a difficult and costly task. Recent advances in device technology and the availability of fast programmable digital signal processors has made the task easier. Complex speech processing algorithms can now be performed on a single chip.

With an appreciation of the difficulties that are associated with developing real-time, applications, better decisions can be made about the coder implementation, with a higher probability of developing a product which meets the given criteria.

In this project, the real-time considerations and their application to speech coders are investigated to develop practical real-time systems. The primary development system is the TMS320C30 digital signal processor.

Selected Papers

S. Boland, M. Deriche, and S. Sridharan. An overview of ISO/MPEG audio codec. In Audio Engineering Society Convention (AES), April 1995.

S. Boland, S. Sridharan, and M. Deriche. Low bitrate speech and music coding using the wavelet transform. In International Conference on Speech Science and Technology, pages 164-169, December 1994.

S. Ghaemmaghami, M. Deriche, and B. Boashash. Hierarchical approach to formant detection and tracking through instantaneous frequency estimation. Electronic Letters, 33 no. 1 pp. 17-18, 1997.

S. Ghaemmaghami, M. Deriche, and B. Boashash. Comparitive study of different parameters for temporal decomposition based speech coding. In IEEE Conference on Acoustics, Speech and Signal Processing, 1997.

S. Ghaemmaghami, M. Deriche, and B. Boashash. Efficient speech coding at very low rates using phonetic ques. In Workshop on Signal Processing and its Applications, 1997.