Queensland University of Technology   Brisbane Australia Skip bannerSkip to content A university for the real world - Built Environment and Engineering
QUT Home
Contact us Staff Directory A-Z Index
BEE Home About the Faculty Study Research, industry and community For Staff

Speech Coding Using Phonetic Vocoding

Research, industry and community
Research
Research funding
Industry collaboration
Events and conferences
Consulting and professional services
Community service
Faculty and research projects
  Airborne Avionics Research Group
  Airport Metropolis
  Dual Fuel
  Liquid Dessicant Solar Air-Conditioner
  Medical Engineering Research Facility
  Nanango TIE QUT Observatory
  Organic Photovoltaics
  Phenomena in Microgravity Laboratory
  QUT Motorsport
  Speech, Audio, Image and Video Technologies
    Research
    Postgraduate Training
    Consultancy/Product Development
    Speech & Audio Research Lab
    Image & Video Research Lab
    Scholarships
    News & Events
    Publications
  Demonstrations
      Speech Enhancement
      Speech Coding (using Temporal Decomposition)
      * Speech Coding (using Phonetic Vocoding)
      Speech Synthesis
    Microphone Array Beamforming
    Students
    Staff
    Contact Us
  Transportation
  Tribology
  UAV Team
For research students

[Print-friendly version]

This example demonstrates a method of speech coding known as phonetic vocoding. This technique compresses speech by using a HMM based speech recogniser at the coder to quantise the speech into phonetic units and then uses the HMM model statistics at the decoder to reconstruct the spectral envelope. 49 phonemes are modeled by left-right, single mixture Hidden Markov Models using Adaptive Melcepstral Coefficients (AMC) plus energy and their delta terms for the recognition/encoding stage and Line Spectral Frequency (LSF) trained models are used at the synthesis/decoding stage.

The phoneme index, state durations and speaker adaptation information is transmitted along with prosody information. The pitch contour is coded using Piecewise Linear Approximation (PLA). The example may be compared with the original speech and also with speech produced using the same speech synthesiser without any quantisation of the spectral envelope or pitch parameters.

Coding Sample 1: Male Speaker (each 134K)

(Ref: J.Dines and S.Sridharan, "A speaker independent phonetic vocoder for the English language", IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS-2000), pp. 696-701.)