Event Customization
更新时间: 2025/04/22
在Gitcode上查看源码

This section describes how to configure a refined event alarm on openUBMC.

openUBMC provides an event management module that delivers refined alarm capabilities in addition to standard IPMI events. This module provides flexible and comprehensive event management capabilities and supports Redfish reporting for simplified maintenance. You are advised to use this module for all event management tasks.

The following sections describe how to configure openUBMC events for sensors. The events are classified into CSR alarm configuration and RPC event alarm based on whether hardware is required.

Static Configuration of Events

Configuration

Static information must be configured regardless of whether CSR or RPC is used. It represents the fixed information of alarms that cannot be modified, including the event definition and description.

  • Event definition, such as the event code, severity, and reporting channel.
  • Event description, which supports Chinese and English by default, including the event description template, suggestion template, impact, and cause.

The following table lists the fields involved in the two types of events.

FieldDescription
EventKeyIdEvent definition
EventNameEvent name (Ensure that the name is unique in the VPD repository and does not conflict with other alarms. Otherwise, event subscription will be affected.)
EventTypeEvent type
0: system event; 1: maintenance event; 2: running event
SeverityIdSeverity
0: normal; 1: minor; 2: major; 3: critical
EventCodeEvent code
OldEventCodeOld event code
ActionIdEvent action
0: No Action 1: Power off the host
2: Restart the host 3: Power cycle the host
LifeCycleIdEvent lifecycle
ReportChannelEvent reporting channel mask, indicating whether to record bit 6 in an alarm (1: yes; 0: no). If bit 6 is not recorded in the alarm, it is displayed only in the historical record and does not appear on the current alarm page. This parameter is used in special scenarios. Generally, the value is 1.
DescriptionEvent description (For external display, consecutive spaces and unnecessary spaces before punctuation such as commas, semicolons, periods are removed in the event description)
SuggestionSuggestion
InfluenceImpact
CauseCause
DeassertFlagWhether the event can be cleared. This parameter is to specify whether a Deassert event is required (0: no; 1: yes). In other words, it controls whether a historical record is generated when the alarm is cleared. It is often misunderstood as meaning that the alarm itself cannot be cleared.

Configuration Example

The static configuration of the event needs to be configured in the VPD. The following is a configuration example.

Precautions

  • After an alarm is added, the version number needs to be updated, for example, 1.0.2.

  • EventDefinition and EventDescription must match each other by EventKeyId.

  • If the openUBMC field is required in the description, set it to {BMC}, which will be automatically replaced to openUBMC.

    json
    "Version": "1.0.1",
    "EventDefinition": [
        {
            "EventKeyId": "Disk.DiskCEHardFailure," -- such as DiskCEHardFailure
            "EventCode": "0x0200001F",
            "ReportChannel": 65535,
            "OldEventCode": "",
            "EventType": 0,  
            "LifeCycleId": 1,
            "DeassertFlag": 1,
            "SeverityId": 0,
            "ActionId": 0,
            "EventName": "DiskCEHardFailure"
        }
    ],
    "EventDescription": [
            {
            "EventKeyId": "Disk.DiskCEHardFailure",
            "Suggestion": {
                "En": "1. Perform maintenance according to the maintenance plan as soon as possible. Power off the server...",
                "Zh": "1. Arrange planned maintenance as soon as possible. Power off the server and check whether there is damage or poor contact between the component and its slot. @#AB;2. Replace the component and observe the status."
            },
            "Description": {
                "En": "The %1 disk %2 health status degradation detected by PFAE.",
                "Zh": "%1硬盘 %2 健康状态降级。"
            },
            "Influence": {
                "En": "",
                "Zh": ""
            },
            "Cause": {
                "En": "",
                "Zh": ""
            }
        }
    ]
  • vendor/Huawei/Server/Kunpeng/openUBMC/event/eventDefList.txt (alarm list)

This list specifies the alarms to be loaded from event_def.json. You only need to add EventKeyId.

Disk.DiskCEHardFailure

CSR Configuration of Events

CSR configurations include the dynamic information of alarms. The information can be configured and changed. CSR configurations are used for events and alarms with specific hardware forms.

In openUBMC, the CSR configurations of power events are different from that of other events. The following describes the CSR configuration methods for both common events and power events. For details about the CSR configuration syntax, see CSR.

CSR Configurations of Common Events

The following table lists the fields of common events.

FieldDescription
EventKeyIdEvent ID, which is used to match the static configuration of events.
ReadingAn alarm value, generally configured as the synchronization syntax of other values. For common events, the value can be used directly, for example, a temperature reading. For other types of events, such as certificate expiration, the value is represented as 0 or 1.
ConditionAlarm threshold
OperatorIdJudgment symbol. The following eight judgment modes are available:
1: < 2: ≤ 3: > 4: ≥ 5: = 6: ≠ 7: 0 to 1 (rising edge), 1 to 0 (recovery) 8: 1 to 0 (falling edge), 0 to 1 (recovery)
HysteresisHysteresis threshold is used when an alarm is cleared. If the value is 0, the alarm is cleared immediately. This value functions as a tolerance.
EnabledEnabling status of an event, or the masking status.
ComponentAssociated component object. For details about the component definition, see FruData.
DescArgx/SuggArgx(Optional) Event description/suggestion parameter, used for message formatting, supporting only string format. A maximum of 10 items can be configured (via the SR format expression).
AdditionalInfo(Optional) Additional information about an event, serving as the Nth dynamic parameter. Multiple parameters can be included by listing their indices (e.g., '1,2'). This parameter is used for distinguishing between different events during FD reporting. For example, if the alarms are the same except for the slot, the slot is used for differentiation. Check whether this field needs to be configured for a new alarm.
LedFaultCode(Optional) Led error code, which can be a fixed value or a dynamic value. The value of x is the instance part in the component.
InvalidReadingIgnore(Optional) Whether to ignore invalid values. 1: enabled; 0: disabled. If this function is enabled and the read value is InvalidReading, the read value is ignored.
InvalidReading(Optional) Invalid value to be ignored.

CSR Configuration Example

Key configuration points:

  1. Where is the location of an event object? The CSR is configured based on the hardware topology. Each event's configuration belongs to the CSR of the specific hardware component that generated it. For example, alarms generated by a PCIe card are located within that card's SR file and can be found in the VPD repository.

    bash
    CSR
    ├── PCIECard
       ├── FRU object
       └── PCIEFru
       ├── Component object
       ├── Com1822
       ├── ComPort1
       └── ComPort2
       ├── ThresholdSensor object
       └── 1822 Core Temperature
       └── Event object
           └── 1822 Core Temperature Major
  2. When configuring an event, check whether the corresponding component object exists. If not, configure one. In normal cases, you only need to configure one event object. Components are in the FruData directory and are usually configured by the FruData.

    Notice
    Objects in the platform can be referenced across files. Therefore, you should reuse the unique objects, such as Component_BMC and Component_System, to avoid redundant code caused by repeated definition.

The following example registers a FanSpeedDeviation event on the fan board. (If a field is not configured, the default value is used. Configure fields as required.)

json
{
    "Objects": {
        "Event_Fan1FStatus": { // **Event** is the class name. All event classes are distributed to the event module for processing. **Fan1FStatus** is the name. The object name must be unique in a single file. The complete resource name is combined by the self-discovery mechanism based on the SR file, for example, **Event_Fan1FStatus_00**.
            "EventKeyId": "Fan.FanSpeedDeviation",
            "Reading": "<=/Fan_1.FrontStatus",
            "Condition": 0,
            "OperatorId": 6,
            "Enabled": true,
            "DescArg1": "#/Fan_1.FanId",
            "DescArg2": "front",
            "Component": "#/Component_Fan1",
            "AdditionalInfo" : "1,2",
            "LedFaultCode": "F01"
        },
        //Except for reading operations that use synchronization syntax, all other attributes use reference syntax. Because the synchronization syntax relies on a polling interval, it may cause information loss or missed updates in the alarm description.
        //Configure the **Event_Fan1FStatus** object based on the existing fan object. The following lists the dependent objects.
        "Component_Fan1": {
            "FruId": 255,
            "Instance": "<=/Fan_1.FanId",
            "Type": 4,
            "Name": "Fan1",
            "Presence": "<=/Fan_1.FrontPresence",
            "Health": 0,
            "PowerState": 1,
            "UniqueId": "N/A",
            "Manufacturer": "",
            "GroupId": 1,
            "Location": "<=/Component_CLU.Name",
            "NodeId": "0"
        },
        "Fan_1": {
            "FanId": 1,
            "Slot": 1,
            "Coefficient": 1,
            "FrontPresence": "<=/Scanner_Fan1_Presence.Value",
            "RearPresence": "<=/Scanner_Fan1_Presence.Value",
            "FrontSpeed": "<=/Scanner_Fan1_FSpeed.Value",
            "RearSpeed": "<=/Scanner_Fan1_RSpeed.Value",
            "HardwarePWM": "#/Accessor_Fan1_PWM.Value",
            "SystemId": 1,
            "FrontStatus": 0,
            "RearStatus": 0,
            "MaxSupportedPWM": 255,
            "IdentifySpeedLevel": 35,
            "Position": "CLU",
            "PowerGood": "#/Scanner_PowerGood.Value"
        },
        "Component_CLU": {
            "FruId": 255,
            "Instance": 255,
            "Type": 196,
            "Name": "CLU${Slot}",
            "Presence": 1,
            "Health": 0,
            "PowerState": 1,
            "BoardId": 65535,
            "UniqueId": "N/A",
            "Manufacturer": "",
            "GroupId": 1,
            "Location": "chassis"
        },
        "Scanner_Fan1_FSpeed": {
            "Chip": "#/Smc_FanBoardSMC",
            "Offset": 402657025,
            "Size": 4,
            "Mask": 4294901760,
            "Type": 0,
            "Period": 1000,
            "Debounce": "None",
            "Value": 0
        },
        "Scanner_Fan1_RSpeed": {
            "Chip": "#/Smc_FanBoardSMC",
            "Offset": 402657025,
            "Size": 4,
            "Mask": 65535,
            "Type": 0,
            "Period": 1000,
            "Debounce": "None",
            "Value": 0
        },
        "Accessor_Fan1_PWM": {
            "Chip": "#/Smc_FanBoardSMC",
            "Offset": 402657281,
            "Size": 1,
            "Mask": 255,
            "Type": 0,
            "Value": 0
        },
    }
}

Notice
If DescArgx/SuggArgx is selected, SN/BN is added to the event description (this information is not displayed on the OMRP), which is obtained from SerialNumber/PartNumber of the associated component. If the value is empty, SN/BN is not displayed.
The processing logic of SN/BN is as follows: The data source varies according to the component type. The SN/BN of a component is configurable and displayed according to the SR syntax. If the value is empty, it is not shown. The SN/BN value logic in events is not restricted. It can be added, deleted, or modified, and adapted to different products based on the SR.

How to Generate/Clear an Alarm

  1. Generating an alarm

    In short, focus on the Reading OperatorId Condition.

    In the preceding example, OperatorId is 6, indicating that the not-equal-to operator is used. Reading!= Condition indicates that 1 != 0. If the expression returns true, the alarm threshold is reached. If the expression returns false, Reading is updated but the threshold is not reached. In the preceding example, Reading changes to 1, and an alarm is generated.

  2. Clearing an alarm

    The following is a simple example. If the recovery policy is not considered, Reading + Hysteresis == Condition means that 1 + 0 == 2. If false is returned, the alarm is cleared. After the reading changes, the alarm is automatically cleared if the trigger threshold is not met.

The alarm condition is determined entirely by the configuration. You can use the scanned value or define a value.

Anti-Jitter Configuration

If the monitored value changes frequently, alarms may be falsely reported multiple times. openUBMC supports the anti-jitter policy for frequently changing values. The anti-jitter policy is configured in the Debounce attribute of the Scanner object. You can configure five types of anti-jitter policies: MidAvg, Median, Cont, ContBin, and None.

Anti-Jitter TypeDescriptionParameterConfiguration Example
MidAvgMean averageWindowSize: window size
DefaultValue: default value
IsSigned: whether a number is signed
"MidAvg": {
  "WindowSize": 6,
  "DefaultValue": 11
  "IsSigned": true
}
MedianMedian filterWindowSize: window size
DefaultValue: default value
"Median": {
  "WindowSize": 6,
  "DefaultValue": 11
}
ContContinuous consistencyNum: number of anti-jitter times
DefaultValue: default value
"Cont": {
  "Num": 6,
  "DefaultValue": 11
}
ContBinBinary continuous consistencyNumH: number of anti-jitter times with high-level inputs
NumL: number of anti-jitter times with low-level inputs
DefaultValue: default value
"ContBin": {
  "NumH": 6,
  "NumL": 6,
  "DefaultValue": 11
}
NoneNo anti-jitterDefaultValue: default value"None": {
  "DefaultValue": 11
}

Power Event Configuration

Power events are a special type of events provided by openUBMC. They are a set of multiple events of the same category, with different thresholds and LED display codes. In addition to the attributes of common events, power events also contain the Mappings fields:

  • Mappings.Reading: the condition of an event. If the threshold is reached, an event is generated.
  • Mappings.LedFaultCode: the LedFaultCode of an event. The LED ID needs to be displayed after the event is generated.
  • Mappings.DescArgs: the DescArgs of an event. It is a string list with up to 10 elements, used as dynamic parameters in the event description.

CSR Configuration Example

The following example registers a power event. For details, see vpd/vendor/Huawei/TianChi/BCU/PsEvent_BC83AMDA_0_soft.sr.

json
{
    "Objects": {
        "PowerEvent_BCUPwrFaultMntr": {
            "EventKeyId": "System.SystemPowerFailure",
            "Component": "#/Component_ComSystem",
            "Reading": "<=/Scanner_BCUPwrSigDrop.Value",
            "AdditionalInfo": "2",
            "Mappings": [
                {
                    "Reading": 136,
                    "LedFaultCode": "U10",
                    "DescArgs": [
                        "",
                        "BCU_V_VCC_12V0_1"
                    ]
                },
                {
                    "Reading": 137,
                    "LedFaultCode": "U10",
                    "DescArgs": [
                        "",
                        "BCU_V_VCC_12V0_2"
                    ]
                },
                {
                    "Reading": 138,
                    "LedFaultCode": "U10",
                    "DescArgs": [
                        "",
                        "BCU_V_VCC_12V0_3"
                    ]
                },
                ...
                {
                    "Reading": 182,
                    "LedFaultCode": "U00",
                    "DescArgs": [
                        "",
                        "BCU_V_STBY_1V8"
                    ]
                }
            ]
        }
    }
}

How to Generate/Clear an Alarm

The logic for generating and clearing a power event is similar to that of a common event. The difference is that the power event requires condition comparison between Reading and Mappings.Reading.

Event Configuration and Commissioning

You can check whether the event configuration is successful through package generation and board commissioning. After the component is built and the whole package is built and upgraded (for details, see Integrate a Device), check whether the event object is mounted in the environment. You can manually construct an error value to trigger an event alarm. The procedure is as follows:

powershell
> busctl --user tree bmc.kepler.event

> busctl --user introspect bmc.kepler.event /bmc/kepler/Systems/1/Events

RPC Event Alarm

Configuration

Software event configuration is used to configure system-level or software-level events that cannot be accurately described in CSRs. Generally, such alarms or events are determined based on the running status data of a component during running. Therefore, software event configuration is not recommended for events/alarms with specific hardware.

Interface Usage Constraints

  1. The corresponding owner is responsible for the lifecycle (**generation/clearance) of software alarms. (The clearance action is required regardless of whether the Deassert event or alarm clearance event code exists.)
  2. For software alarms, ComponentName and SubjectType are used to match the first matching Component object. The value of ComponentName must be unique for components of the same type.
  3. Assert/Deassert cannot be repeated.

Other Features

  1. The unique event is determined by ComponentName, EventKeyId, and MessageArgs.
  2. The software alarm reset is persistent. Therefore, the component must track whether an alarm has been added or cleared to avoid repeated additions or incorrect reporting (e.g., persistent information).
  3. The time window provided by the external interface is affected by SR distribution. Components that must run during service startup should add retry logic and pcall protection.

Configuration Example

The following example shows how the network_adapter component processes alarms for link exceptions. You can learn about the function logic of RPC event alarm by referring to the check_oam_lost_link_state_alarm. The code is extracted from network_adapter/src/lualib/event/event_mgmt.lua. For details, see the source code.

lua
function event_mgmt:add_event(params)
    local event_obj
    client:ForeachEventsObjects(function(o)
        event_obj = o -- This object is unique.
    end)
    if not event_obj then
        log:error('get events object failed')
        return
    end

    local ok, res = pcall(function ()
        return event_obj:AddEvent_PACKED(ctx.new(), params):unpack()
    end)
    if not ok then
        log:error('add events failed, %s', res)
        return false
    end

    log:notice('add event successfully, record id [%s]', res)
    return true
end

-- Link exception alarm
function event_mgmt:check_oam_lost_link_state_alarm(state, device_name, port_id)
    local args = json.encode({device_name, '', 'Port ' .. (port_id + 1)})
    local assert = state == 1 -- 0: no alarm 1: alarm
    local alarm_state = alarm_states[args]
    if not assert == not alarm_state then   -- The default value is nil. The value is negated here.
        return
    end

    local params = {
        {'ComponentName', 'Port'.. (port_id + 1)}, -- The port resource collaboration interface ID starts from 0, and the component ID starts from 1.
        {'State', assert and 'true' or 'false'},
        {'EventKeyId', 'Port.PortOAMLostLink'},
        {'MessageArgs', args},
        {'SystemId', ''},
        {'ManagerId', ''},
        {'ChassisId', ''},
        {'NodeId', ''}
    }
    local is_ok = self:add_event(params)
    if not is_ok then
        return false
    end
    -- Updating local alarm information
    alarm_states[args] = assert
    self:update_alarm_msg(assert, args, '') --Dynamic parameters are unique and can be used as keys. Therefore, values are not required.
    return true
end

Interface Calling Demonstration

Software alarms can be added by calling an interface. Invoke AddEvent method of the bmc.kepler.Systems.Events interface at the /bmc/kepler/Systems/:SystemId/Events path of the resource collaboration interface. The interface parameters are as follows:

ParameterDescriptionDescription (String Type)
ComponentEvent entity nameMandatory. Name of the component associated with the event.
StateIncident statusMandatory (true/false)
EventKeyIdEvent IDMandatory (same as the static configuration)
SubjectTypeEvent entity typeOptional. If no entity type is provided, the high-order bits of the event code are used for matching.
SuggestionArgsEvent suggestion parameterOptional. The format needs to be converted using json.encode.
MessageArgsEvent description parameterMandatory. The format needs to be converted using json.encode. If this parameter is not involved, an empty table needs to be uploaded.
SystemIdSystem ID of the eventSee the notes below.
ManagerIdManager ID of the eventSee the notes below.
ChassisIdChassis ID of the eventSee the notes below.
NodeIdNode ID of the eventSee the notes below.
LedFaultCodeLED fault codeOptional

The values of SystemId, ManagerId, ChassisId, and NodeId are described as follows:

  1. If the event source comes from the resource collaboration interface object, SystemId, ManagerId, ChassisId, and NodeId must be synchronized.
  2. If there is no event source, select one of SystemId, ManagerId, and ChassisId based on the resource category. NodeId can be empty.

Calling example

  • The following uses busctl as an example to describe how to add an event.
powershell
> busctl --user call bmc.kepler.event /bmc/kepler/Systems/1/Events bmc.kepler.Systems.Events AddEvent 'a{ss}a(ss)' 3 Interface cli UserName Administrator ClientAddr 127.0.0.1 8 ComponentName 'BMC' State 'true' EventKeyId 'BMC.InsecureCryptographicAlgorithm' 'MessageArgs' '["test"]' 'SystemId' '1' 'ManagerId' '1' 'ChassisId' '1' 'NodeId' '1'

Calling the RPC Event Interface

In the code, operations on software alarms are based on the actual situation. The following describes how to invoke the event interface through RPC.

If the event interface needs to be invoked in the service code, subscribe to the interface and implement the interface using the client provided by the framework. The following is an example:

lua
-- Obtaining the current alarm
local amx, rsp = rpc_client.GetAlarmList_PACKED(1):unpack()
-- Adding an alarm event
local context = require 'mc.context'
local event_obj = rpc_client.GetEventsObects()
local params = {
    'ComponentName': 'BMC',
    'State': 'true',
    'EventKeyId': 'BMC.InsecureCryptographicAlgorithm',
    ...
}
local record = event_obj:AddEvent_PACKED(context.new(), params):unpack()

Summary

The preceding description outlines the event customization workflow, enabling you to configure a more refined event alarm.