[collectd] Let IPMI plugin skip periods "unsupported state" periods

Bruno Prémont bonbons at linux-vserver.org
Wed Feb 4 10:11:24 CET 2009


On a server IPMI readings suddenly failed causing the plugin to stop
collecting data.
The following entries got logged:

ipmi plugin: sensor_read_handler: Removing sensor Temp 7 processor (3.6), because it failed with status 0x10000d5.
ipmi plugin: sensor_read_handler: Removing sensor Temp 6 processor (3.5), because it failed with status 0x10000d5.
ipmi plugin: sensor_read_handler: Removing sensor Temp 5 power_supply (10.5), because it failed with status 0x10000d5.
ipmi plugin: sensor_read_handler: Removing sensor Temp 4 processor (3.4), because it failed with status 0x10000d5.
ipmi plugin: sensor_read_handler: Removing sensor Temp 3 processor (3.3), because it failed with status 0x10000d5.
ipmi plugin: sensor_read_handler: Removing sensor Temp 2 external_environment (39.1), because it failed with status 0x10000d5.
ipmi plugin: sensor_read_handler: Removing sensor Temp 1 system_internal_expansion_board (16.1), because it failed with status 0x10000d5.

This patch attempts to provide slightly better error message and just
skip current reading iteration if the error
IPMI_NOT_SUPPORTED_IN_PRESENT_STATE_CC is indicated.

This error happens e.g. when iLo firmware is upgraded (or reboots?) on
HP servers.

---
--- collectd-4.5.2/src/ipmi.c	2009-02-02 13:14:13.898736713 +0100
+++ collectd-4.5.2/src/ipmi.c	2009-02-02 13:21:58.987738365 +0100
@@ -148,11 +148,33 @@ static void sensor_read_handler (ipmi_se
         }
       }
     }
+    else if (IPMI_IS_IPMI_ERR(err) && IPMI_GET_IPMI_ERR(err) == IPMI_NOT_SUPPORTED_IN_PRESENT_STATE_CC)
+    {
+      INFO ("ipmi plugin: sensor_read_handler: Sensor %s not ready",
+          list_item->sensor_name);
+    }
     else
     {
-      INFO ("ipmi plugin: sensor_read_handler: Removing sensor %s, "
-          "because it failed with status %#x.",
-          list_item->sensor_name, err);
+      if (IPMI_IS_IPMI_ERR(err))
+        INFO ("ipmi plugin: sensor_read_handler: Removing sensor %s, "
+            "because it failed with IPMI error %#x.",
+            list_item->sensor_name, IPMI_GET_IPMI_ERR(err));
+      else if (IPMI_IS_OS_ERR(err))
+        INFO ("ipmi plugin: sensor_read_handler: Removing sensor %s, "
+            "because it failed with OS error %#x.",
+            list_item->sensor_name, IPMI_GET_OS_ERR(err));
+      else if (IPMI_IS_RMCPP_ERR(err))
+        INFO ("ipmi plugin: sensor_read_handler: Removing sensor %s, "
+            "because it failed with RMCPP error %#x.",
+            list_item->sensor_name, IPMI_GET_RMCPP_ERR(err));
+      else if (IPMI_IS_SOL_ERR(err))
+        INFO ("ipmi plugin: sensor_read_handler: Removing sensor %s, "
+            "because it failed with RMCPP error %#x.",
+            list_item->sensor_name, IPMI_GET_SOL_ERR(err));
+      else
+        INFO ("ipmi plugin: sensor_read_handler: Removing sensor %s, "
+            "because it failed with error %#x. of class %#x",
+            list_item->sensor_name, err & 0xff, err & 0xffffff00);
       sensor_list_remove (sensor);
     }
     return;




More information about the collectd mailing list