Architecture.md

Timer

Watchdog Timer

A watchdog timer is an electronic or software timer that is used to detect and recover from computer malfunctions. This timer is constantly reset in order to detect and recover from computer malfunction. The timer is reset to prevent it from “timing out”, when the time would elapse to zero. An elapsed watchdog would indicate a hardware fault or malfunction or a software error. The computer system is then typically triggered to be placed into a safe-state and restarted in order to restore normal system operation. There are several methods of handling elapsed watchdogs that produce a system state that is desired.

Watchdogs can also be configured to suit the needs of the system, such as having a fixed or programmable timeout interval or being arranged to log timeouts rather than restarting in the case that a timeout would indicate something other than a fatal error or hardware malfunction. Thus, the triggered corrective actions as well as the reset conditions for watchdog timers are architecture and design dependent.

A useful example for a watchdog would be a two-stage fail-safe timer. This would be a multistage timer configuration in which the first timeout would trigger the logging of the current state to a persistent storage device for the purposes of debugging and the second would start reset.

Restarting

Kicking

The act of restarting a watchdog timer is commonly referred to as kicking the watchdog. Kicking is topically done by writing to a watchdog control port or by setting bits in a particular register. A watchdog can be kicked in the user space on Linux by writing a zero to /dev/watchdog. For highly coupled watchdogs, a special machine language instruction can clear a watchdog timer (such as the assembly instruction CLRWDT for PIC microcontrollers). A watchdog timer should only be kicked if and only if all of the fault detection tests have passed and the system is in the desired state and functional.

TODO(tlc): Finish reading the reset section in Wikipedia.

  • 3 step restart
  • soft reset
  • hard reset

Architecture and Operation

Single-Stage Watchdog

A single-stage watchdog timer is a configuration of a watchdog that shares a common clock signal with the CPU. This configuration is usually used to reset the computer system on timeout.

Multistage Watchdog

Two or more timers can be used in some configuration, sometimes in sequence or “cascaded”, to form a multistage watchdog timer. Each timer is referred to as a timer stage or just a stage. Multistage watchdogs are used to perform a series of corrective actions that each stage triggers. The final timer stage usually triggers a reset of the computer system.