Hallo Zusammen,
in der system i ist schon ein Temperaturfühler integriert und wenn 56 Grad erreicht, dann fährt das System
zum eigenen Schutz herunter. Es kommen aber schon zuvor Warnhinweise, wie zum Beispiel bei 40 Grad.
Wir haben diese Fälle öfters und durch unser Überwachungstool, welches wir bei unseren Kunden einsetzen, können wir früh genug reagieren.
Hier die Meldungen:
Document Title: System Shutdown and UPS Messages in History Log

Abstract



Any release of SLIC prior to V5R5M0 will make Thermal events look like UPS
shutdown failures.

Document Description:

Sequence of Events as the Ambient Temperature Rises

Note: The following information was gathered from a 9406-520 at firmware
level SF235_185 (GA6-SP2 marker PTF MH00517) with system value QUPSDLYTIM
set to 900 seconds.


1 SRC11007201 indicates that the ambient temperature measured by the
system is above 104' F (40'C).

a The system fans speed up to increase airflow in an attempt to
correct the high temperature condition.



2 SRC11007203 indicates that the ambient temperature is above 118' F
(48' C).

a The system immediately posts a B6007201 in the PAL (product
activity log).
b QHST and the system operator message queue log a CPF1816 message.
c The system starts to count down 900 seconds (as set in
QUPSDLYTIM).



3 At this point, one of the following events could occur:

o The temperature drops back below the 118' F threshold before the
QUPSDLYTIM limit is reached.
a QHST and the system operator message queue logs a CPF1817
message.
b SRCB6007202 is posted in the PAL.
c The system continues to run normally.

o The temperature remains above the 11007203 threshold to the end
of the 900 second QUPSDLYTIM limit.
a The system powers down.

o The temperature continues to rise past the 11007205 threshold;
the ambient temperature is above 132'F (56'C).

a QHST and the system operator message queue log a CPI0964
message.
b The system powers down immediately, regardless of the
QUPSDLYTIM value.
c CPI0979 is logged in QHST and the system operator message
queue after the system is IPLed again.





In SLIC V5R5M0, a new MI event has been added to cause SLIC to correctly
handle overheating situations. The new event flow puts a new message into
the QSYSOPR message queue that tells the system operator about the
overheating condition. After it does this, the partition will shut down.

So, until then, if a customer system goes down and you see UPS messages in
the history log, examine PAL entries for SRCs 1100720x and B600720x. If
you see these ambient temperature SRCs in PAL (they are not logged in SAL),
then it is possible that the system overheated and shut down.