Failure Management

Even in the most reliable computerized systems, there is always a real risk of hardware failure. For example, a high-energy particle of ionizing radiation from space that penetrates the Earth's atmosphere could strike the crystal of a memory chip, altering a bit of information, or hit the processor, disrupting the execution of an operation. No system in the world is completely protected from such events.

The GMP monitoring system, like any equipment or computerized system, can also experience failures. To mitigate risks and ensure the timely identification and resolution of issues, there should be a mechanism in place to monitor the system's own functionality – akin to an internal security service.

An architecture based on independent modules allows for redundancy in system health monitoring, where different modules can oversee each other's performance. In this case, even after a failure, a "live" module running on a separate CPU core remains in the system, capable of taking over control and restoring system functionality automatically. 


Tarqvara GMP Monitoring System

The Tarqvara GMP monitoring system implements distributed monitoring of the system's modules. The modules track each other's performance, and the first "healthy" module that detects a failure will forcibly restart the failed module. The system automatically restores full functionality within 10–20 seconds, while also generating and logging the corresponding error message.

The terminated process instance from the failed module is temporarily retained in the system's memory: the operating system's Garbage Collector frees up the occupied space shortly thereafter. Memory leak testing confirmed no such issue, and the system has demonstrated continuous 24/7 operability for several years without requiring a restart.

See also:
GMP Monitoring Systems
Tarqvara GMP Monitoring System
IT Solutions / GAMP / Data Integrity (RDI)
Computerized Systems Validation (CSV)