IPMI Specification
The Intelligent Platform Management Interface (IPMI) is a set of standardized specifications for server management and monitoring. It provides a unified method for remote management, fault detection, fault recovery, and other tasks related to server hardware management. The IPMI specification is jointly developed by Intel, Dell, HP, NEC, and Supermicro.
Overview of IPMI Specification
Standardized interfaces: IPMI provides a set of standardized interfaces, enabling server hardware from different vendors to be monitored and managed using the same management tools.
Remote management: IPMI supports remote management through the network. Management operations can be performed even if the operating system is not running.
Monitoring and alarm: IPMI can monitor the health status of servers and send alarms in case of faults.
Hardware control: IPMI can control server hardware, such as power-on, power-off, reset, and fan control.
Sensor data: IPMI can collect data from sensors of servers, such as temperature, voltage, and fan speed.
Firmware update: IPMI supports remote firmware update, including the Baseboard Management Controller (BMC) firmware.
Components of IPMI
IPMI message: a message format used to transmit data between the BMC and management software.
IPMI command: a command set used to control and monitor server hardware.
IPMI specification: defines the architecture, interfaces, and command set of IPMI.
For details about the IPMI specification, see the IPMI Specification Official Document.
Sensor
openUBMC has many sensors, all of which comply with the IPMI specification. Currently, openUBMC sensors are classified into the following types:
- Threshold sensor
- Discrete sensor
When customizing sensors, developers need to configure sensor properties and sensor data records to monitor the health status of servers for a long time.
Threshold Sensor
A threshold sensor is also called a continuous sensor, which indicates that the value of the sensor changes continuously (for comparison, see the continuous value/curve in mathematics), such as temperature, voltage, power consumption, and rotation speed. When the detected value exceeds the preset threshold, the threshold sensor generates an alarm. For example, a temperature sensor is usually a threshold sensor, which can be configured with a high temperature warning and a high temperature threshold.
The threshold sensor resources in openUBMC on D-Bus are classified into two categories:
- IPMI specification resources: describe the current sensor and is configured in the CSR.
- Descriptive resources: provide readable sensor parameters for northbound interfaces. This category does not need to be manually configured. They are automatically parsed and added to the sensor component.
The IPMI Specification Value of the Threshold Sensor
For details, see the IPMI specification, 43.1 SDR Type 01h, Full Sensor Record(P521).
In openUBMC, threshold sensor resources that comply with the IPMI specification are represented by the ThresholdSensor class. This class is managed by the bmc.kepler.sensor service and mounted to the bmc.kepler.Systems.ThresholdSensor interface of the resource collaboration interface. For details about the basic properties of the class, see Sensor Customization and Development.
Descriptive Resources of the Threshold Sensor
Descriptive resources are represented by the ThresholdSensorDisplay class. This class does not need to be manually configured in the CSR. Instead, it is parsed and processed by the sensor component and mounted to the bmc.kepler.Systems.ThresholdSensorDisplay interface of the resource collaboration interface. In addition, this class provides the read-only capability. The following table lists the basic properties.
| Name | Type | Description |
|---|---|---|
| Status | string | Current status of the sensor. Possible values are:Enabled: The sensor is enabled.Disabled: The sensor is disabled.InTest: The sensor is being tested.Starting: The sensor is being updated. |
| Health | string | Health status of the sensor. Possible values are:Critical: emergencyMajor: severeMinor: generalOK: normal |
| AssertStatus | uint16 | SEL event status of the sensor, which is a hexadecimal number, for example, 0x0080. Bit[0:5] corresponds to six threshold event states in sequence, which are: [5] - The low irreversible value increased. [4] - The low irreversible value decreased. [3] - The low threshold value increased. [2] - The low threshold value decreased. [1] - The low measured value increased. [0] - The low measured value decreased. The read value of each bit is as follows: 1: Assert; 0: Deassert |
| ReadingDisplay | string | Readable description of the sensor reading value. The precision is three valid digits. |
| UnitDisplay | string | Readable description of the sensor unit. |
| UpperNonrecoverableDisplay | string | Readable description of the upper critical threshold of the sensor. The precision is three valid digits. |
| UpperCriticalDisplay | string | Readable description of the upper major threshold of the sensor. The precision is three valid digits. |
| UpperNoncriticalDisplay | string | Readable description of the upper minor threshold of the sensor. The precision is three valid digits. |
| LowerNonrecoverableDisplay | string | Readable description of the lower critical threshold of the sensor. The precision is three valid digits. |
| LowerNoncriticalDisplay | string | Readable description of the lower major threshold of the sensor. The precision is three valid digits. |
| LowerCriticalDisplay | string | Readable description of the lower minor threshold of the sensor. The precision is three valid digits. |
| PositiveHysteresisDisplay | string | Readable description of the positive hysteresis of the sensor. The precision is three valid digits. |
| NegativeHysteresisDisplay | string | Readable description of the negative hysteresis of the sensor. The precision is three valid digits. |
Discrete Sensor
A discrete sensor indicates that the sensor value is discrete (for comparison, consider the mathematical concepts of discrete values/curves), such as the running status and isolation value. For example, the power status sensor (on or off) and fan status sensor (normal or faulty).
IPMI Specification Resources of the Discrete Sensor
For details, see the IPMI specification, 43.2 SDR Type 02h, Compact Sensor Record(P528).
In openUBMC, discrete sensor resources that comply with the IPMI specification are represented by the DiscreteSensor class. This class is managed by the bmc.kepler.sensor service and mounted to the bmc.kepler.Systems.DiscreteSensor interface of the resource collaboration interface. For details about the basic properties of this class, see Sensor Customization and Development.
Descriptive Resources of the Discrete Sensor
Descriptive resources are represented by the DiscreteSensorDisplay class. This class does not require manual configuration of the CSR. Instead, the sensor component parses and mounts it to the bmc.kepler.Systems.DiscreteSensorDisplay interface of the resource collaboration interface. In addition, this class provides the read-only mode. The following table lists the basic properties.
| Name | Type | Description |
|---|---|---|
| Status | string | Current status of the sensor:Enabled: The sensor is enabled.Disabled: The sensor is disabled.InTest: The sensor is being tested.Starting: The sensor is being updated. |
| Health | string | Health status of the sensor:Critical: emergencyMajor: severeMinor: generalOK: normal |
| AssertStatus | uint16 | SEL event status of the sensor, which is a hexadecimal number, for example, 0x8000. bit[0:14] corresponds to 15 discrete event states in sequence. The two read values of each bit are as follows: 1: Assert0: Deassert |
Discrete Event Resource
A discrete event is an event source that needs to be carried and triggered for a discrete sensor. Each discrete sensor can listen to **15 **discrete events, for example, XXX. The status of the corresponding discrete event is reflected in the discrete sensor status AssertStatus. Discrete event resources are managed by the **sensor **component and mounted to the bmc.kepler.Systems.DiscreteEvent interface of the resource collaboration interface. For details about the basic properties of discrete events, see Sensor Customization and Development.
Sensor Entity Resource
Sensor entity resource represents the entity description of the hardware which the current sensor depends on or belongs to. The in-position status or power-on/-off status of the entity affects the value and status of the current sensor and the generation status of the corresponding sensor event IPMI SEL. For example:
If the CPU is powered off, the changes of the CPU core temperature sensor are as follows:
- The reading of the CPU core temperature sensor is
na.- The status of the CPU core temperature sensor is
Disabled.- The IPMI SEL of the CPU core high temperature alarm is cleared.
Sensor entity resources are managed by the sensor component and mounted to the bmc.kepler.Systems.Entity interface of the resource collaboration interface. For details about the basic properties of sensor entity resources, see Sensor Customization and Development.
Sensor Data Record
This section refers to the IPMI specification, 43. Sensor Data Record Formats (P520) and provides extended knowledge related to the sensor.
The sensor data record (SDR), stores sensor data in binary mode, which is mainly static data. It consists of three parts: record header, record key, and record body. The IPMI command obtains sensor information through the SDR. The following describes the data format of the SDR and how to obtain sensor information using the IPMI command.
Data Format
Record Header
The record header formats of all SDRs are the same, containing the basic information about a data record. The following table describes the fields in the record header.
| Field Name | Data Size | Field Description |
|---|---|---|
| RecordId | 2 bytes | Record ID, which uniquely identifies a data record. |
| SDRVersion | 1 byte | SDR version, 0x51 |
| RecordType | 1 byte | Record type. IPMI has 12 record types. full sensor record - 0x01compact sensor record - 0x02device-relative entity association record - 0x09fru device locator - 0x11management controller device locator - 0x12... |
| RecordLength | 1 byte | Record length |
Record Key
The record key uniquely identifies a data record of the same type of SDR. The identification methods of different types of SDRs are different. The record key has the following four composition modes based on the sensor type:
Full Sensor Record/Compact Sensor Record
| Field Name | Data Size | Field Description |
|---|---|---|
| OwnerId | 1 byte | ID of the sensor owner |
| OwnerLun | 1 byte | LUN of the sensor owner |
| Number | 1 byte | Sensor ID |
DEA: Device-Relative Entity Association Record
| Field Name | Data Size | Field Description |
|---|---|---|
| EntityId | 1 byte | Entity ID of the managed device |
| EntityInstance | 1 byte | Entity instance of the managed device |
| DeviceAddress | 1 byte | Secondary address of the managed device |
| DeviceChannel | 1 byte | Channel of the managed device |
| Flags | 1 byte | Device tag associated with the managed device |
| Entity1Address | 1 byte | Address of the first device associated with the managed device |
| Entity1Channel | 1 byte | Channel of the first device associated with the managed device |
| Entity1Id | 1 byte | Entity ID of the first device associated with the managed device |
| Entity1Instance | 1 byte | Entity instance of the first device associated with the managed device |
FRU Device Locator
| Field Name | Data Size | Field Description |
|---|---|---|
| AccessAddress | 1 byte | Access address of the managed device |
| FruId | 1 byte | FRU number |
| LogicalDevice | 1 byte | Specifies if the FRU is a logical device or a physical device |
| Channel | 1 byte | Channel of the managed device |
Management Controller Device Locator (MCDL)
| Field Name | Data Size | Field Description |
|---|---|---|
| SlaveAddress | 1 byte | Secondary address of the managed device |
| Channel | 1 byte | Channel of the managed device |
Record Body
The record body is the main content of the data record, including the following information:
| Record Type | Content |
|---|---|
| Full sensor record | EntityId, EntityInstance, Initialization, Capabilities, SensorType, ReadingType, AssertMask, DeassertMask, ReadingMask, Unit, BaseUnit, ModifierUnit, Linearization, M, MT, B, BA, Accuracy, RBExp, Analog, NominalReading, NormalMaximum, NormalMinimum, MaximumReading, MinimumReading, UpperNonrecoverable, UpperCritical, UpperNoncritical, LowerNonrecoverable, LowerNoncritical, LowerCritical, PositiveHysteresis, NegativeHysteresis, SensorName |
| Compact sensor record | EntityId, EntityInstance, Initialization, Capabilities, SensorType, ReadingType, AssertMask, DeassertMask, DiscreteMask, Unit, BaseUnit, ModifierUnit, RecordSharing, PositiveHysteresis, NegativeHysteresis, SensorName |
| DEA | Address2, Channel2, Entity2Id, Entity2Instance, Address3, Channel3, Entity3Id, Entity3Instance, Address4, Channel4, Entity4Id, Entity4Instance |
| FRU device locator | DeviceType, DeviceTypeModifier, FruEntityId, FruEntityInstance |
| MCDL | PowerStateInitialization, Capabilities, EntityId, EntityInstance, DeviceName |
Use IPMI to Obtain Sensor Information
The standard command for querying sensor information using IPMI is sensor list. This command obtains static basic information from the SDR, obtains and processes dynamic information, and then combines and formats the information before outputting it. For details about the operation example, see Commissioning Methods.
IPMI SEL
SEL is short for system event log, which is an important function of the IPMI. It is used to record various hardware events in the system, such as overtemperature and power failure, and provides the functions of querying and clearing these events. Therefore, when configuring a sensor, the IPMI SEL associated with the sensor can send related information to the IPMI SEL for recording when the sensor detects an exception or a status change. In this way, the administrator can view the IPMI SEL to learn about the running status and faults of the system.
SEL Event Triggering
Continuous Sensor
Triggering Scenarios
A triggering scenario refers to a scenario where the reading value of a continuous sensor is compared with the threshold value. The scenarios are as follows:
- The continuous sensor object is registered and initialized.
- The
disable_scanning_localstatus is updated fromdisabledtoenabled. - Listening for property changes: Reading and six threshold properties
- Enabling and disabling the simulation sensor reading value
Event Generation Conditions
In short, the event generation conditions of a continuous sensor are the conditions for generating or clearing an event after the reading value is compared with the threshold value. When the preceding triggering scenarios are met, the system compares the reading value with the six threshold values in sequence to check whether the generation and clearance conditions are met. A maximum of 12 comparisons are supported. The process of checking whether the event generation conditions are met is as follows:
Discrete Sensor
Triggering Scenarios
The triggering scenarios of discrete sensors are as follows:
- The discrete event object is registered and initialized.
- Listening for property changes: Property and EventDir
- Enabling and disabling the simulation sensor reading value
Event Generation Conditions
In short, the event generation condition of a discrete sensor is whether the direction of the associated discrete event changes. When the preceding trigger scenario is met, the system checks whether the generation and recovery conditions are met.
| Listening Mode | Data Composition |
|---|---|
| Combined listening | Property: event_dir, event_data3, event_data2, event_data1 |
| Independent listening | Properties of this object: EventDir, EventData1, EventData2, and EventData3 |
| ---- | ---- |
| Event toggle | The lower four bits of Conversion are toggle bits.Conversion & 0x0F == 1: assert => deassert/deassert => assert |
The process of checking whether the event generation conditions are met is as follows:
SEL Data Storage
The SEL data is stored in the non-volatile memory of the BMC. For details about the data format, see the IPMI specification, Section 32 SEL Record Formats (P431).
Use IPMI to Obtain SEL Events
The standard command for obtaining SEL events using IPMI is sel list. The method of using this command is similar to that of obtaining sensor information using IPMI.