IPMI Sensor & SEL
更新时间: 2024/12/19
在Gitcode上查看源码

IPMI Specification

The Intelligent Platform Management Interface (IPMI) is a set of standardized specifications for server management and monitoring. It provides a unified method for remote management, fault detection, fault recovery, and other tasks related to server hardware management. The IPMI specification is jointly developed by Intel, Dell, HP, NEC, and Supermicro.

Overview of IPMI Specification

Standardized interfaces: IPMI provides a set of standardized interfaces, enabling server hardware from different vendors to be monitored and managed using the same management tools.

Remote management: IPMI supports remote management through the network. Management operations can be performed even if the operating system is not running.

Monitoring and alarm: IPMI can monitor the health status of servers and send alarms in case of faults.

Hardware control: IPMI can control server hardware, such as power-on, power-off, reset, and fan control.

Sensor data: IPMI can collect data from sensors of servers, such as temperature, voltage, and fan speed.

Firmware update: IPMI supports remote firmware update, including the Baseboard Management Controller (BMC) firmware.

Components of IPMI

IPMI message: a message format used to transmit data between the BMC and management software.

IPMI command: a command set used to control and monitor server hardware.

IPMI specification: defines the architecture, interfaces, and command set of IPMI.

For details about the IPMI specification, see the IPMI Specification Official Document.

Sensor

openUBMC has many sensors, all of which comply with the IPMI specification. Currently, openUBMC sensors are classified into the following types:

  • Threshold sensor
  • Discrete sensor

When customizing sensors, developers need to configure sensor properties and sensor data records to monitor the health status of servers for a long time.

Threshold Sensor

A threshold sensor is also called a continuous sensor, which indicates that the value of the sensor changes continuously (for comparison, see the continuous value/curve in mathematics), such as temperature, voltage, power consumption, and rotation speed. When the detected value exceeds the preset threshold, the threshold sensor generates an alarm. For example, a temperature sensor is usually a threshold sensor, which can be configured with a high temperature warning and a high temperature threshold.

The threshold sensor resources in openUBMC on D-Bus are classified into two categories:

  • IPMI specification resources: describe the current sensor and is configured in the CSR.
  • Descriptive resources: provide readable sensor parameters for northbound interfaces. This category does not need to be manually configured. They are automatically parsed and added to the sensor component.

The IPMI Specification Value of the Threshold Sensor

For details, see the IPMI specification, 43.1 SDR Type 01h, Full Sensor Record(P521).

In openUBMC, threshold sensor resources that comply with the IPMI specification are represented by the ThresholdSensor class. This class is managed by the bmc.kepler.sensor service and mounted to the bmc.kepler.Systems.ThresholdSensor interface of the resource collaboration interface. For details about the basic properties of the class, see Sensor Customization and Development.

Descriptive Resources of the Threshold Sensor

Descriptive resources are represented by the ThresholdSensorDisplay class. This class does not need to be manually configured in the CSR. Instead, it is parsed and processed by the sensor component and mounted to the bmc.kepler.Systems.ThresholdSensorDisplay interface of the resource collaboration interface. In addition, this class provides the read-only capability. The following table lists the basic properties.

NameTypeDescription
StatusstringCurrent status of the sensor. Possible values are:
Enabled: The sensor is enabled.
Disabled: The sensor is disabled.
InTest: The sensor is being tested.
Starting: The sensor is being updated.
HealthstringHealth status of the sensor. Possible values are:
Critical: emergency
Major: severe
Minor: general
OK: normal
AssertStatusuint16SEL event status of the sensor, which is a hexadecimal number, for example, 0x0080.
Bit[0:5] corresponds to six threshold event states in sequence, which are:
[5] - The low irreversible value increased.
[4] - The low irreversible value decreased.
[3] - The low threshold value increased.
[2] - The low threshold value decreased.
[1] - The low measured value increased.
[0] - The low measured value decreased.
The read value of each bit is as follows: 1: Assert; 0: Deassert
ReadingDisplaystringReadable description of the sensor reading value. The precision is three valid digits.
UnitDisplaystringReadable description of the sensor unit.
UpperNonrecoverableDisplaystringReadable description of the upper critical threshold of the sensor. The precision is three valid digits.
UpperCriticalDisplaystringReadable description of the upper major threshold of the sensor. The precision is three valid digits.
UpperNoncriticalDisplaystringReadable description of the upper minor threshold of the sensor. The precision is three valid digits.
LowerNonrecoverableDisplaystringReadable description of the lower critical threshold of the sensor. The precision is three valid digits.
LowerNoncriticalDisplaystringReadable description of the lower major threshold of the sensor. The precision is three valid digits.
LowerCriticalDisplaystringReadable description of the lower minor threshold of the sensor. The precision is three valid digits.
PositiveHysteresisDisplaystringReadable description of the positive hysteresis of the sensor. The precision is three valid digits.
NegativeHysteresisDisplaystringReadable description of the negative hysteresis of the sensor. The precision is three valid digits.

Discrete Sensor

A discrete sensor indicates that the sensor value is discrete (for comparison, consider the mathematical concepts of discrete values/curves), such as the running status and isolation value. For example, the power status sensor (on or off) and fan status sensor (normal or faulty).

IPMI Specification Resources of the Discrete Sensor

For details, see the IPMI specification, 43.2 SDR Type 02h, Compact Sensor Record(P528).

In openUBMC, discrete sensor resources that comply with the IPMI specification are represented by the DiscreteSensor class. This class is managed by the bmc.kepler.sensor service and mounted to the bmc.kepler.Systems.DiscreteSensor interface of the resource collaboration interface. For details about the basic properties of this class, see Sensor Customization and Development.

Descriptive Resources of the Discrete Sensor

Descriptive resources are represented by the DiscreteSensorDisplay class. This class does not require manual configuration of the CSR. Instead, the sensor component parses and mounts it to the bmc.kepler.Systems.DiscreteSensorDisplay interface of the resource collaboration interface. In addition, this class provides the read-only mode. The following table lists the basic properties.

NameTypeDescription
StatusstringCurrent status of the sensor:
Enabled: The sensor is enabled.
Disabled: The sensor is disabled.
InTest: The sensor is being tested.
Starting: The sensor is being updated.
HealthstringHealth status of the sensor:
Critical: emergency
Major: severe
Minor: general
OK: normal
AssertStatusuint16SEL event status of the sensor, which is a hexadecimal number, for example, 0x8000.
bit[0:14] corresponds to 15 discrete event states in sequence. The two read values of each bit are as follows:
1: Assert
0: Deassert

Discrete Event Resource

A discrete event is an event source that needs to be carried and triggered for a discrete sensor. Each discrete sensor can listen to **15 **discrete events, for example, XXX. The status of the corresponding discrete event is reflected in the discrete sensor status AssertStatus. Discrete event resources are managed by the **sensor **component and mounted to the bmc.kepler.Systems.DiscreteEvent interface of the resource collaboration interface. For details about the basic properties of discrete events, see Sensor Customization and Development.

Sensor Entity Resource

Sensor entity resource represents the entity description of the hardware which the current sensor depends on or belongs to. The in-position status or power-on/-off status of the entity affects the value and status of the current sensor and the generation status of the corresponding sensor event IPMI SEL. For example:

If the CPU is powered off, the changes of the CPU core temperature sensor are as follows:

  • The reading of the CPU core temperature sensor is na.
  • The status of the CPU core temperature sensor is Disabled.
  • The IPMI SEL of the CPU core high temperature alarm is cleared.

Sensor entity resources are managed by the sensor component and mounted to the bmc.kepler.Systems.Entity interface of the resource collaboration interface. For details about the basic properties of sensor entity resources, see Sensor Customization and Development.

Sensor Data Record

This section refers to the IPMI specification, 43. Sensor Data Record Formats (P520) and provides extended knowledge related to the sensor.

The sensor data record (SDR), stores sensor data in binary mode, which is mainly static data. It consists of three parts: record header, record key, and record body. The IPMI command obtains sensor information through the SDR. The following describes the data format of the SDR and how to obtain sensor information using the IPMI command.

Data Format

Record Header

The record header formats of all SDRs are the same, containing the basic information about a data record. The following table describes the fields in the record header.

Field NameData SizeField Description
RecordId2 bytesRecord ID, which uniquely identifies a data record.
SDRVersion1 byteSDR version, 0x51
RecordType1 byteRecord type. IPMI has 12 record types.
full sensor record - 0x01
compact sensor record - 0x02
device-relative entity association record - 0x09
fru device locator - 0x11
management controller device locator - 0x12
...
RecordLength1 byteRecord length

Record Key

The record key uniquely identifies a data record of the same type of SDR. The identification methods of different types of SDRs are different. The record key has the following four composition modes based on the sensor type:

Full Sensor Record/Compact Sensor Record

Field NameData SizeField Description
OwnerId1 byteID of the sensor owner
OwnerLun1 byteLUN of the sensor owner
Number1 byteSensor ID

DEA: Device-Relative Entity Association Record

Field NameData SizeField Description
EntityId1 byteEntity ID of the managed device
EntityInstance1 byteEntity instance of the managed device
DeviceAddress1 byteSecondary address of the managed device
DeviceChannel1 byteChannel of the managed device
Flags1 byteDevice tag associated with the managed device
Entity1Address1 byteAddress of the first device associated with the managed device
Entity1Channel1 byteChannel of the first device associated with the managed device
Entity1Id1 byteEntity ID of the first device associated with the managed device
Entity1Instance1 byteEntity instance of the first device associated with the managed device

FRU Device Locator

Field NameData SizeField Description
AccessAddress1 byteAccess address of the managed device
FruId1 byteFRU number
LogicalDevice1 byteSpecifies if the FRU is a logical device or a physical device
Channel1 byteChannel of the managed device

Management Controller Device Locator (MCDL)

Field NameData SizeField Description
SlaveAddress1 byteSecondary address of the managed device
Channel1 byteChannel of the managed device

Record Body

The record body is the main content of the data record, including the following information:

Record TypeContent
Full sensor recordEntityId, EntityInstance, Initialization, Capabilities, SensorType, ReadingType, AssertMask, DeassertMask, ReadingMask, Unit, BaseUnit, ModifierUnit, Linearization, M, MT, B, BA, Accuracy, RBExp, Analog, NominalReading, NormalMaximum, NormalMinimum, MaximumReading, MinimumReading, UpperNonrecoverable, UpperCritical, UpperNoncritical, LowerNonrecoverable, LowerNoncritical, LowerCritical, PositiveHysteresis, NegativeHysteresis, SensorName
Compact sensor recordEntityId, EntityInstance, Initialization, Capabilities, SensorType, ReadingType, AssertMask, DeassertMask, DiscreteMask, Unit, BaseUnit, ModifierUnit, RecordSharing, PositiveHysteresis, NegativeHysteresis, SensorName
DEAAddress2, Channel2, Entity2Id, Entity2Instance, Address3, Channel3, Entity3Id, Entity3Instance, Address4, Channel4, Entity4Id, Entity4Instance
FRU device locatorDeviceType, DeviceTypeModifier, FruEntityId, FruEntityInstance
MCDLPowerStateInitialization, Capabilities, EntityId, EntityInstance, DeviceName

Use IPMI to Obtain Sensor Information

The standard command for querying sensor information using IPMI is sensor list. This command obtains static basic information from the SDR, obtains and processes dynamic information, and then combines and formats the information before outputting it. For details about the operation example, see Commissioning Methods.

IPMI SEL

SEL is short for system event log, which is an important function of the IPMI. It is used to record various hardware events in the system, such as overtemperature and power failure, and provides the functions of querying and clearing these events. Therefore, when configuring a sensor, the IPMI SEL associated with the sensor can send related information to the IPMI SEL for recording when the sensor detects an exception or a status change. In this way, the administrator can view the IPMI SEL to learn about the running status and faults of the system.

SEL Event Triggering

Continuous Sensor

Triggering Scenarios

A triggering scenario refers to a scenario where the reading value of a continuous sensor is compared with the threshold value. The scenarios are as follows:

  • The continuous sensor object is registered and initialized.
  • The disable_scanning_local status is updated from disabled to enabled.
  • Listening for property changes: Reading and six threshold properties
  • Enabling and disabling the simulation sensor reading value

Event Generation Conditions

In short, the event generation conditions of a continuous sensor are the conditions for generating or clearing an event after the reading value is compared with the threshold value. When the preceding triggering scenarios are met, the system compares the reading value with the six threshold values in sequence to check whether the generation and clearance conditions are met. A maximum of 12 comparisons are supported. The process of checking whether the event generation conditions are met is as follows:

Discrete Sensor

Triggering Scenarios

The triggering scenarios of discrete sensors are as follows:

  • The discrete event object is registered and initialized.
  • Listening for property changes: Property and EventDir
  • Enabling and disabling the simulation sensor reading value

Event Generation Conditions

In short, the event generation condition of a discrete sensor is whether the direction of the associated discrete event changes. When the preceding trigger scenario is met, the system checks whether the generation and recovery conditions are met.

Listening ModeData Composition
Combined listeningProperty: event_dir, event_data3, event_data2, event_data1
Independent listeningProperties of this object: EventDir, EventData1, EventData2, and EventData3
--------
Event toggleThe lower four bits of Conversion are toggle bits.
Conversion & 0x0F == 1: assert => deassert/deassert => assert

The process of checking whether the event generation conditions are met is as follows:

SEL Data Storage

The SEL data is stored in the non-volatile memory of the BMC. For details about the data format, see the IPMI specification, Section 32 SEL Record Formats (P431).

Use IPMI to Obtain SEL Events

The standard command for obtaining SEL events using IPMI is sel list. The method of using this command is similar to that of obtaining sensor information using IPMI.