inf105 - Fault Tolerance in Distributed Systems (Complete module description)

inf105 - Fault Tolerance in Distributed Systems (Complete module description)

Original version English PDF download
Module label Fault Tolerance in Distributed Systems
Module code inf105
Credit points 6.0 KP
Workload 180 h
Institute directory Department of Computing Science
Applicability of the module
  • Master's Programme Computing Science (Master) > Praktische Informatik
Responsible persons
  • Theel, Oliver (module responsibility)
  • Lehrenden, Die im Modul (authorised to take exams)
Prerequisites

Useful prior knowledge: Knowledge in the field of distributed operating systems

Skills to be acquired in this module

This module provides knowledge of fault-tolerant distributed systems. The terminology, structure, conception, core challenges and related implementation concepts will be covered in detail.

Professional competence
The students:

  • assess what a fault-tolerant distributed system is and develop awareness of its capabilities
  • name and discuss common implementations of fault-tolerant distributed systems


Methodological competence
The students:

  • reflect the implementation challenges of a distributed system
  • are able to adapt and evolve implementation concepts of fault-tolerant distributed systems in new contexts


Social competence
The students:

  • solve problems in small teams
  • present their solutions to the members of the tutorial
  • discuss their different solutions with members of the tutorial



Self-competence
The students:

  • accept criticism
  • question their initially applied methods for problem solving
  • question their initial solutions in the light of newly learned methods
Module contents
  1. Fault, Error, Failure
  2. Failure semantics, Fault tolerance
  3. Byzantine agreement protocols
  4. Stable storage
  5. Fail-stop processors
  6. Atomic commit protocols
  7. Classification of replication control schemes - pessimistic vs. optimistic - semantic vs. syntactic - static vs. dynamic
  8. Consistency notions
  9. Quality criteria
  10. Survey of replication control schemes
  11. Design of replication control schemes
  12. Unifying frameworks
  13. Replication in practice
Recommended reading
  • P. Jalote (1994): Fault Tolerance in Distributed Systems. Prentice-Hall.
  • A. Helal et. Al (1996): Replication Techniques in Distributed Systems. Kluwer Academics
  • A. Schiper et. Al (2010): Replication: Theory and Practice
Links
Language of instruction German
Duration (semesters) 1 Semester
Module frequency annual
Module capacity unlimited
Teaching/Learning method V+S or V+Ü
Type of course Comment SWS Frequency Workload of compulsory attendance
Lecture 2 WiSe 28
Seminar or exercise 2 WiSe 28
Total module attendance time 56 h
Examination Prüfungszeiten Type of examination
Final exam of module

End of lecture period

Written exam or oral exam or practical work