inf105 - Fault Tolerance in Distributed Systems (Complete module description)

inf105 - Fault Tolerance in Distributed Systems (Complete module description)

Original version English PDF Download
Module label Fault Tolerance in Distributed Systems
Modulkürzel inf105
Credit points 6.0 KP
Workload 180 h
Institute directory Department of Computing Science
Verwendbarkeit des Moduls
  • Master's Programme Computing Science (Master) >
Zuständige Personen
  • Theel, Oliver (module responsibility)
  • Lehrenden, Die im Modul (Prüfungsberechtigt)
Prerequisites
useful previous knowledge: Distributed operating systems
Skills to be acquired in this module
This module provides knowledge of fault-tolerant distributed systems. The terminology, structure, conception, core challenges and related implementation concepts will be covered in detail.

Professional competence
The students:
  • assess what a fault-tolerant distributed system is and develop awareness of its capabilities
  • name and discuss common implementations of fault-tolerant distributed systems

Methodological competence
The students:
  • reflect the implementation challenges of a distributed system
  • are able to adapt and evolve implementation concepts of fault-tolerant distributed systems in new contexts

Social competence
The students:
  • solve problems in small teams
  • present their solutions to the members of the tutorial
  • discuss their different solutions with members of the tutorial


Self-competence
The students:
  • accept criticism
  • question their initially applied methods for problem solving
  • question their initial solutions in the light of newly learned methods
Module contents
  1. Fault, Error, Failure
  2. Failure semantics, Fault tolerance
  3. Byzantine agreement protocols
  4. Stable storage
  5. Fail-stop processors
  6. Atomic commit protocols
  7. Classification of replication control schemes - pessimistic vs. optimistic - semantic vs. syntactic - static vs. dynamic
  8. Consistency notions
  9. Quality criteria
  10. Survey of replication control schemes
  11. Design of replication control schemes
  12. Unifying frameworks
  13. Replication in practice
Literaturempfehlungen
  • P. Jalote (1994): Fault Tolerance in Distributed Systems. Prentice-Hall.
  • A. Helal et. Al (1996): Replication Techniques in Distributed Systems. Kluwer Academics
  • A. Schiper et. Al (2010): Replication: Theory and Practice
Links
Language of instruction German
Duration (semesters) 1 Semester
Module frequency annual
Module capacity unlimited
Reference text
connectet with: Betriebssysteme 1 und 2 Betriebssysteme-Praktikum Verteilte Betriebssysteme
Teaching/Learning method 1VL + 1S or 1VL + 1Ü
Previous knowledge Distributed operating systems
Form of instruction Comment SWS Frequency Workload of compulsory attendance
Lecture 2 WiSe 28
Seminar or exercise 2 WiSe 28
Präsenzzeit Modul insgesamt 56 h
Examination Prüfungszeiten Type of examination
Final exam of module
End of lecture period
Written exam or oral exam or practical work