pb425 - Modern Speech Technology (Complete module description)

pb425 - Modern Speech Technology (Complete module description)

Original version English PDF Download
Module label Modern Speech Technology
Modulkürzel pb425
Credit points 3.0 KP
Workload 90 h
(

Workload 28 hours + 62 self study

)
Institute directory Institute of Physics
Verwendbarkeit des Moduls
  • Area of Specialisation (Bachelor) > Area of Specialisation
  • Bachelor's Programme Biology (Bachelor) > Area of Specialisation
  • Bachelor's Programme Business Administration and Law (Bachelor) > Area of Specialisation
  • Bachelor's Programme Business Informatics (Bachelor) > Area of Specialisation
  • Bachelor's Programme Chemistry (Bachelor) > Area of Specialisation
  • Bachelor's Programme Comparative and European Law (Bachelor) > Area of Specialisation
  • Bachelor's Programme Computing Science (Bachelor) > Area of Specialisation
  • Bachelor's Programme Economics and Business Administration (Bachelor) > Area of Specialisation
  • Bachelor's Programme Education (Bachelor) > Area of Specialisation
  • Bachelor's Programme Engineering Physics (Bachelor) > Area of Specialisation
  • Bachelor's Programme Environmental Science (Bachelor) > Area of Specialisation
  • Bachelor's Programme Mathematics (Bachelor) > Area of Specialisation
  • Bachelor's Programme Physics, Engineering and Medicine (Bachelor) > Area of Specialisation
  • Bachelor's Programme Social Studies (Bachelor) > Area of Specialisation
  • Bachelor's Programme Sustainability Economics (Bachelor) > Area of Specialisation
  • Fach-Bachelor Pädagogisches Handeln in der Migrationsgesellschaft (Bachelor) > Area of Specialisation
Zuständige Personen
  • Enzner, Gerald (module responsibility)
  • Enzner, Gerald (Prüfungsberechtigt)
  • Chinaev, Aleksej (Prüfungsberechtigt)
Prerequisites

First Insight into Signal- or System Theory

Skills to be acquired in this module

The course provides the engineering tools for modern speech signal processing. It complements the physiological aspects of speech generation and the psychoacoustic aspects of speech perception provided by the companion course "Einführung in die Sprachverarbeitung" (pb185) by teaching the technical representation of speech in algorithms, hardware and software. Special attention is devoted to the task of speech signal enhancement in several facets, such as noise filtering, reverberation reduction and echo cancellation. The required tools of modern speech processing are presented here in an elementary and intuitive manner. Mathematical requirements are moderate to low. The exercises are implemented in software or on the blackboard by the students themselves or in teams and, if necessary, with component-wise support by the lecturers. The students thus gain a preliminary insight into the scientific working method for student qualification projects, such as the bachelor thesis, as well as into the engineering working method in the industrial environment of speech processing.

Module contents

1.         Technical Representation of Speech Signals
- Bandwidth, sampling rate, sampling rate error, bit depth
- Single-channel, multi-channel, and binaural signals
- Acoustic sensor network
- Databases for speech and noise signals
- Acoustic room simulation
 

2.         Speech Signal Enhancement Tasks
- Competitive speech enhancement challenges
- Tasks with ambient noise, reverberation, or interferences
- Evaluation metrics: PESQ, STOI, SegSNR, POLQA, ViSQOL
- Model-based processing paradigm: e.g., Wiener-Filter
- DNN-based processing paradigm: i.e., end-to-end with data
 

3.         Hardware- und Software-tools for speech processing
- Integrated development envs: Matlab, Python, TensorFlow
- Commercial solutions: e.g., Nvidia, Krisp
- Devices for speech acquisition and processing: Personal Computer, Mobile, Raspberry Pi, Hearing Aid
 

4.         Remote Speech Communication
- Systems for speech communication: fixed telephone network, mobile network, Voice-over-IP,
   conferencing systems, NFMI
- Fundamentals of transmission: source and channel coding
- Emerging services: WebRTC, Speex, EVS
- Principles and properties of codecs: Sampling rate, bitrate, latency,
- Packet loss: statistical modeling and concealment

Literaturempfehlungen

Vary, Martin: Digital Speech Transmission – Enhancement, Coding, and Error Concealment, Wiley 2006

Links
Language of instruction English
Duration (semesters) 1 Semester
Module frequency jährlich
Module capacity unrestricted
Type of module Ergänzung/Professionalisierung
Module level PB (Professionalisierungsbereich / Professionalization)
Teaching/Learning method Slide projection, blackboard, and Matlab
Examination Prüfungszeiten Type of examination
Final exam of module

1 written exam (30 - 60 minutes) or

1 oral exam (20 - 30 minutes) or

1 formal presentation ( 20 - 30 minutes)

Lehrveranstaltungsform Lecture
SWS 2
Frequency SoSe
Workload Präsenzzeit 28 h
62 self study