RAID卡适配指导
更新时间: 2025/11/12
在Gitcode上查看源码

RAID卡适配指导

适配RAID卡需要关注三个模块 vpd,pcie_device,storage

vpd模块的sr文件,负责找到PCIe卡

正常的结构TOPO是:root->EXU->BCU->Riser->PCIe,先要配置Connector

举例:S920X20的设备,从root.sr开始,找到Connector_EXU_1,这个是"IdentifyMode": 3,代表从天池获取下一级节点; 从busctl --user tree bmc.kepler.hwdiscovery | cat命令查询Connector_EXU_1_01, 再继续查询busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_EXU_1_01 | cat 得到扩展板信息

text
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_EXU_1_01 | cat
NAME                                TYPE      SIGNATURE   RESULT/VALUE                             FLAGS
bmc.kepler.Connector                interface -           -                                        -
.Reload                             method    a{ss}sssy   -                                        -
.AuxId                              property  s           ""                                       emits-change writable
.Bom                                property  s           "14100513"                               emits-change
.Buses                              property  as          33 "I2c_1" "I2c_2" "I2c_3" "I2c_4" "I2c… emits-change
.ChassisId                          property  s           "1"                                      emits-change
.GroupId                            property  u           4                                        emits-change
.GroupPosition                      property  s           "0101"                                   emits-change
.Id                                 property  s           "00000001010302044492"                   emits-change writable
.IdentifyMode                       property  y           3                                        emits-change
.LoadStatus                         property  y           0                                        emits-change
.ManagerId                          property  s           "1"                                      emits-change
.Presence                           property  y           1                                        emits-change writable
.SilkText                           property  s           "J6023"                                  emits-change
.Slot                               property  y           1                                        emits-change writable
.SystemId                           property  y           1                                        emits-change
.Type                               property  s           "ExpandBoard"                            emits-change

分析14100513_00000001010302044492.sr 找到了Connector_BCU_1,这个是"IdentifyMode": 3,代表从天池获取下一级节点,得到基础板信息;

text
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_BCU_1_0101 | cat
NAME                                TYPE      SIGNATURE   RESULT/VALUE                             FLAGS
bmc.kepler.Connector                interface -           -                                        -
.Reload                             method    a{ss}sssy   -                                        -
.AuxId                              property  s           ""                                       emits-change writable
.Bom                                property  s           "14060876"                               emits-change
.Buses                              property  as          26 "I2c_1" "I2c_2" "I2c_8" "JtagMux_Jta… emits-change
.ChassisId                          property  s           "1"                                      emits-change
.GroupId                            property  u           8                                        emits-change
.GroupPosition                      property  s           "010101"                                 emits-change
.Id                                 property  s           "00000001020302031825"                   emits-change writable
.IdentifyMode                       property  y           3                                        emits-change
.LoadStatus                         property  y           0                                        emits-change
.ManagerId                          property  s           "1"                                      emits-change
.Presence                           property  y           1                                        emits-change writable
.SilkText                           property  s           "BCU"                                    emits-change
.Slot                               property  y           1                                        emits-change writable
.SystemId                           property  y           1                                        emits-change
.Type                               property  s           "CPUBoard"                               emits-change

分析14060876_00000001020302031825.sr 可以看到很多的PCIeRiserCard,"IdentifyMode": 3,还是加载的天池组件,这里已经是基础板的对象了,里面会包含多个RiserCard的对象,拿一个Connector_A2a举例

json
    "Connector_A2a": {
      "Bom": "14100513",
      "Slot": 1,
      "Position": 3,
      "Presence": "<=/Scanner_A2a.Value",
      "Id": "",
      "AuxId": "",
      "Buses": [
        "Hisport_18"
      ],
      "SystemId": 1,
      "SilkText": "CpuBoard${Slot}",
      "IdentifyMode": 3,
      "Type": "PCIeRiserCard"
    },

这里的Buses是Hisport_18,可以简单理解成某一路i2c总线,查询后得到PCIeRiserCard对象;

text
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_A2a_010101 | cat
NAME                                TYPE      SIGNATURE   RESULT/VALUE           FLAGS
bmc.kepler.Connector                interface -           -                      -
.Reload                             method    a{ss}sssy   -                      -
.AuxId                              property  s           ""                     emits-change writable
.Bom                                property  s           "14100513"             emits-change
.Buses                              property  as          1 "Hisport_18"         emits-change
.ChassisId                          property  s           ""                     emits-change
.GroupId                            property  u           49                     emits-change
.GroupPosition                      property  s           "01010103"             emits-change
.Id                                 property  s           "00000001040302044498" emits-change writable
.IdentifyMode                       property  y           3                      emits-change
.LoadStatus                         property  y           0                      emits-change
.ManagerId                          property  s           "1"                    emits-change
.Presence                           property  y           1                      emits-change writable
.SilkText                           property  s           "CpuBoard1"            emits-change
.Slot                               property  y           1                      emits-change writable
.SystemId                           property  y           1                      emits-change
.Type                               property  s           "PCIeRiserCard"        emits-change

pcie_device模块负责更新pcie设备,发现RAID卡

1、业务拓扑建立​

pcie_device组件需要根据vpd模块内部配置的csr文件,实现PCIe设备链路的动态获取,向BIOS提供PCIe设备槽位信息,此处已经分析到RiserCard,这个模块是分析加载csr,属于基础代码,不用修改,了解学习即可;

分析14100513_00000001040302044498.sr 当前是单个RiserCard的对象,每个RiserCard对象里面可以接多个PCIe卡,所以会有Pca9545对象,传入的buses其实是Hisport_18,由Pca9545分成了多路i2c总线;

json
      "I2cMux_Pca9545_PCA9545_1": {
        "Connectors": [
          "Connector_PCIE_SLOT1"
        ]
      },
      "I2cMux_Pca9545_PCA9545_2": {
        "Connectors": [
          "Connector_PCIE_SLOT2"
        ]
      },
      "I2cMux_Pca9545_PCA9545_3": {
        "Connectors": [
          "Connector_PCIE_SLOT3"
        ]
      }

当前Connector_PCIE_SLOT3对象,里面使用的是"IdentifyMode": 2,不再是天池组件,下面是PCIe,这个卡要加载哪个csr,需要获取Id和AuxId属性,这两个属性对于RAID卡来说,是从BIOS那边获取的;

json
      "Connector_PCIE_SLOT3": {
        "Bom": "14140130",
        "Slot": 3,
        "Position": 3,
        "Presence": 0,
        "Buses": [
          "I2cMux_Pca9545_PCA9545_3"
        ],
        "SystemId": 1,
        "SilkText": "RiserCard${Slot}",
        "IdentifyMode": 2,
        "Container": "Component_RiserCard",
        "Type": "PCIe"
      },

2、PCIe设备加载,获取和设置id和auxid ​

在业务拓扑建立完成之后,BIOS从BMC获取到PCIe设备槽位和CPU资源配置信息,BIOS向BMC上报PCIe设备的deviceBDF信息。

BMC通过deviceBDF信息向PMU发出请求,查询到PCIe设备四元组信息,通过PCIe设备四元组即可完成对应设备CSR的加载;

代码中parse_pcie_card_bdf_data函数解析bdf信息,然后调用task_load_unload_device函数,加载pcie设备,并设置id和auxid属性值,设置Presence值为1

load_unload_device函数中加载时有如下打印

text
device_loader.lua(214): [BizTopoLoader] Load PCIeCard, Slot=3, path=/bmc/kepler/Connector/Connector_PCIE_SLOT3_01010103, Id-AuxId=100010e2-10004010

基于BIOS上报BDF对标准PCIe设备进行管理的方案,查询到的Connector属性,里面包含了Id,AuxId,Presence;

text
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_PCIE_SLOT3_01010103 | cat
NAME                                TYPE      SIGNATURE   RESULT/VALUE                          FLAGS
bmc.kepler.Connector                interface -           -                                     -
.Reload                             method    a{ss}sssy   -                                     -
.AuxId                              property  s           "10004010"                            emits-change writable
.Bom                                property  s           "14140130"                            emits-change
.Buses                              property  as          1 "I2cMux_Pca9545_PCA9545_3_01010103" emits-change
.ChassisId                          property  s           ""                                    emits-change
.GroupId                            property  u           57                                    emits-change
.GroupPosition                      property  s           "0101010303"                          emits-change
.Id                                 property  s           "100010e2"                            emits-change writable
.IdentifyMode                       property  y           2                                     emits-change
.LoadStatus                         property  y           0                                     emits-change
.ManagerId                          property  s           "1"                                   emits-change
.Presence                           property  y           1                                     emits-change writable
.SilkText                           property  s           "RiserCard1"                          emits-change
.Slot                               property  y           3                                     emits-change writable
.SystemId                           property  y           1                                     emits-change
.Type                               property  s           "PCIe"                                emits-change

这里能够正确完成,表示已经正确识别到RAID卡对应的csr,加载csr成功。 app.log里面会有如下打印:

text
[BizTopoLoader] Load PCIeCard, Slot=3, path=/bmc/kepler/Connector/Connector_PCIE_SLOT3_01010103, Id-AuxId=100010e2-10004010

framework.log里面有如下打印:

text
hwdiscovery NOTICE: hwcomponent.lua(309): [self-discovery] name: Connector_PCIE_SLOT3_01010103, position: 0101010303, current: 1, previous: 0,uptime: 125 s
hwdiscovery NOTICE: init.lua(162): position: 0101010303, get csr data from /opt/bmc/sr/14140130_100010e2_10004010.sr, format version: 3.00, data version: 3.00
hwdiscovery NOTICE: hwcomponent.lua(205): position: 0101010303, load sr data successfully, uptime: 125 s, cost: 20ms
hwdiscovery NOTICE: hwcomponent.lua(226): position: 0101010303, start to process sr data, source: /opt/bmc/sr/14140130_100010e2_10004010.sr, format version: 3.00, data version: 3.00, uptime: 125 s
hwdiscovery NOTICE: hwcomponent.lua(309): [self-discovery] name: Connector_PCIE_SLOT1_01010103, position: 0101010301, current: 1, previous: 0,uptime: 125 s

storage负责RAID卡初始化,以及具体功能

pcie_device识别到对应的RAID卡csr之后,会调用storage模块,获取RAID卡信息

代码获取方式:storage

1、初始化RAID卡

检测到开机后,会触发相应函数,有如下打印

text
storage NOTICE: bus_monitor_service.lua(84): [monitor-power] set power state from ONING to ON
storage NOTICE: controller_object.lua(150): controller init obj.Id = 255, object_id = 1
storage NOTICE: controller_object.lua(698): Controller_0, RefChip.Path:/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010303
storage NOTICE: controller_object.lua(334): ctrl0 add_controller_to_link_topo successfully
storage NOTICE: controller_object.lua(336): ctrl0 add_controller_to_sml successfully

如果配置错误,会导致如下现象:

text
storage NOTICE: init.lua(534): sml: set i2c chip, ctrl_idx=0, chip=/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010303
storage NOTICE: controller_object.lua(698): Controller_0, RefChip.Path:/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010303
storage NOTICE: controller_object.lua(334): ctrl0 add_controller_to_link_topo successfully
storage ERROR: tasks.lua(83): task [Controller.register_controller.0] error: ...bmc/apps/storage/lualib/controller/controller_object.lua:723: [Storage] Failed to add controller 0, ret: 4357

在framework.log中也能看到相应的错误信息:

text
framework NOTICE: l_sml_adapter.cpp(240): register_sml_adapter_function: controller type id is 14
framework ERROR: l_sml_adapter.cpp(159): register_sml_adapter_fun_table: g_sml_adapter is already generate
framework ERROR: l_sml_adapter.cpp(181): register_pd_log_parse_table: g_pd_log_adapter is already generate
framework ERROR: l_sml_adapter.cpp(71): register_pd_log_parse: g_pd_log_parse is already generate
framework ERROR: adapter.c(603): Failed to load lsi sml library /usr/lib64/libsml_lsi.so for MegaRAID SAS Controller. error : /usr/lib64/libsml_lsi.so: cannot open shared object file: No such file or directory
framework ERROR: adapter.c(918): smlib : Add controller management [Ctrl index 0, Ctrl ID 0] failed, return 0x11d8

修改方法是在manifest.yaml中,libmgmt_protocol下增加配置:

text
  - conan: libmgmt_protocol
    options:
      storelib_enable: true

增加配置后,重新编译,验证,就可以得到正确的流程,查询相关的DBUS 信息,可以得到相关RAID卡信息;

text
~ ~ $ busctl --user introspect bmc.kepler.storage /bmc/kepler/Systems/1/Storage/Controllers/Controller_1_0101010303 | cat
NAME                                                   TYPE      SIGNATURE            RESULT/VALUE                             FLAGS
bmc.kepler.Inventory.Hardware                          interface -                    -                                        -
.AssetName                                             property  s                    "MegaRAID 9560-8i 4GB"                   emits-change
.AssetTag                                              property  s                    "N/A"                                    emits-change
.AssetType                                             property  s                    "PCIe RAID Card"                         emits-change
.FirmwareVersion                                       property  s                    "5.290.02-3997"                          emits-change
.ManufactureDate                                       property  s                    "N/A"                                    emits-change
.Manufacturer                                          property  s                    "Broadcom"                               emits-change
.Model                                                 property  s                    "SAS3908"                                emits-change
.PCBVersion                                            property  s                    "N/A"                                    emits-change
.PartNumber                                            property  s                    "06030622"                               emits-change
.SerialNumber                                          property  s                    "SPD3511231"                             emits-change
.Slot                                                  property  s                    "0"                                      emits-change
.UUID                                                  property  s                    "N/A"                                    emits-change

2、获取RAID卡信息

初始化完成后,开始更新RAID卡信息,调用c_controller:start()如下:

text
self:start_update_task()
self:start_update_pd_list_task()
self:start_update_ld_list_task()
self:start_update_phy_err_task()

这些任务负责周期性获取RAID卡信息,比如RAID卡的控制卡、逻辑盘、物理盘,物理盘错误信息等 举例:start_update_task调用get_ctrl_info函数,通过lua和c代码的映射,最终会调用到对应卡库里面的c函数;

获取到信息后,通过on_update:on函数,触发刷新数据;

text
self:update_static_controller_info(info)
self:update_controller_info(info)
self:update_asset_data_info()

这些update函数负责更新数据到自身对象的属性中,从dbus上可以获取到相关数据;

3、设置RAID卡信息

RAID卡设置命令的入口是rpc_service_controller文件,需要关注这个文件中定义的各种接口 举例:ctrl_task_operate是一个入口,通过异步的任务方式执行,避免阻塞; 举例:ClearForeignConfig接口,是直接执行的,并没有异步执行;

配置RAID卡需要关注的点

如何查找对应的csr

首先看vpd里面是否已经支持了相应的RAID卡,先拿到RAID卡对应的四元组信息,比如 制造商ID 0x1000 设备ID 0x10e2 子厂商ID 0x1000 子设备ID 0x4010 对应的csr文件的名称是:14140130_100010e2_10004010.sr

当前底层通信只支持了i2c模式和mctp over pcie模式

如果需要增加其他模式,需要另外开发;

博通的RAID板卡,底层使用的i2c,i2c地址为0x02,如果用i2cdetect看是0x1的位置有相应,最新的hba卡9600使用的mctp协议,还未适配; PMC的板卡,底层使用的mctp over pcie,首先需要正常建立起MCTP链路后才能正常进行数据更新,可查看busctl --user tree bmc.kepler.mctpd | cat是否正常; 华为的卡,底层使用的mctp over pcie,但是提供了一些i2c地址,可以用来单独查询Lm75,eeprom等相关信息;

参考文档