RAID卡适配指导
适配RAID卡需要关注三个模块 vpd,pcie_device,storage
vpd模块的sr文件,负责找到PCIe卡
正常的结构TOPO是:root->EXU->BCU->Riser->PCIe,先要配置Connector
举例:S920X20的设备,从root.sr开始,找到Connector_EXU_1,这个是"IdentifyMode": 3,代表从天池获取下一级节点; 从busctl --user tree bmc.kepler.hwdiscovery | cat命令查询Connector_EXU_1_01, 再继续查询busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_EXU_1_01 | cat 得到扩展板信息
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_EXU_1_01 | cat
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
bmc.kepler.Connector interface - - -
.Reload method a{ss}sssy - -
.AuxId property s "" emits-change writable
.Bom property s "14100513" emits-change
.Buses property as 33 "I2c_1" "I2c_2" "I2c_3" "I2c_4" "I2c… emits-change
.ChassisId property s "1" emits-change
.GroupId property u 4 emits-change
.GroupPosition property s "0101" emits-change
.Id property s "00000001010302044492" emits-change writable
.IdentifyMode property y 3 emits-change
.LoadStatus property y 0 emits-change
.ManagerId property s "1" emits-change
.Presence property y 1 emits-change writable
.SilkText property s "J6023" emits-change
.Slot property y 1 emits-change writable
.SystemId property y 1 emits-change
.Type property s "ExpandBoard" emits-change分析14100513_00000001010302044492.sr 找到了Connector_BCU_1,这个是"IdentifyMode": 3,代表从天池获取下一级节点,得到基础板信息;
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_BCU_1_0101 | cat
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
bmc.kepler.Connector interface - - -
.Reload method a{ss}sssy - -
.AuxId property s "" emits-change writable
.Bom property s "14060876" emits-change
.Buses property as 26 "I2c_1" "I2c_2" "I2c_8" "JtagMux_Jta… emits-change
.ChassisId property s "1" emits-change
.GroupId property u 8 emits-change
.GroupPosition property s "010101" emits-change
.Id property s "00000001020302031825" emits-change writable
.IdentifyMode property y 3 emits-change
.LoadStatus property y 0 emits-change
.ManagerId property s "1" emits-change
.Presence property y 1 emits-change writable
.SilkText property s "BCU" emits-change
.Slot property y 1 emits-change writable
.SystemId property y 1 emits-change
.Type property s "CPUBoard" emits-change分析14060876_00000001020302031825.sr 可以看到很多的PCIeRiserCard,"IdentifyMode": 3,还是加载的天池组件,这里已经是基础板的对象了,里面会包含多个RiserCard的对象,拿一个Connector_A2a举例
"Connector_A2a": {
"Bom": "14100513",
"Slot": 1,
"Position": 3,
"Presence": "<=/Scanner_A2a.Value",
"Id": "",
"AuxId": "",
"Buses": [
"Hisport_18"
],
"SystemId": 1,
"SilkText": "CpuBoard${Slot}",
"IdentifyMode": 3,
"Type": "PCIeRiserCard"
},这里的Buses是Hisport_18,可以简单理解成某一路i2c总线,查询后得到PCIeRiserCard对象;
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_A2a_010101 | cat
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
bmc.kepler.Connector interface - - -
.Reload method a{ss}sssy - -
.AuxId property s "" emits-change writable
.Bom property s "14100513" emits-change
.Buses property as 1 "Hisport_18" emits-change
.ChassisId property s "" emits-change
.GroupId property u 49 emits-change
.GroupPosition property s "01010103" emits-change
.Id property s "00000001040302044498" emits-change writable
.IdentifyMode property y 3 emits-change
.LoadStatus property y 0 emits-change
.ManagerId property s "1" emits-change
.Presence property y 1 emits-change writable
.SilkText property s "CpuBoard1" emits-change
.Slot property y 1 emits-change writable
.SystemId property y 1 emits-change
.Type property s "PCIeRiserCard" emits-changepcie_device模块负责更新pcie设备,发现RAID卡
1、业务拓扑建立
pcie_device组件需要根据vpd模块内部配置的csr文件,实现PCIe设备链路的动态获取,向BIOS提供PCIe设备槽位信息,此处已经分析到RiserCard,这个模块是分析加载csr,属于基础代码,不用修改,了解学习即可;
分析14100513_00000001040302044498.sr 当前是单个RiserCard的对象,每个RiserCard对象里面可以接多个PCIe卡,所以会有Pca9545对象,传入的buses其实是Hisport_18,由Pca9545分成了多路i2c总线;
"I2cMux_Pca9545_PCA9545_1": {
"Connectors": [
"Connector_PCIE_SLOT1"
]
},
"I2cMux_Pca9545_PCA9545_2": {
"Connectors": [
"Connector_PCIE_SLOT2"
]
},
"I2cMux_Pca9545_PCA9545_3": {
"Connectors": [
"Connector_PCIE_SLOT3"
]
}当前Connector_PCIE_SLOT3对象,里面使用的是"IdentifyMode": 2,不再是天池组件,下面是PCIe,这个卡要加载哪个csr,需要获取Id和AuxId属性,这两个属性对于RAID卡来说,是从BIOS那边获取的;
"Connector_PCIE_SLOT3": {
"Bom": "14140130",
"Slot": 3,
"Position": 3,
"Presence": 0,
"Buses": [
"I2cMux_Pca9545_PCA9545_3"
],
"SystemId": 1,
"SilkText": "RiserCard${Slot}",
"IdentifyMode": 2,
"Container": "Component_RiserCard",
"Type": "PCIe"
},2、PCIe设备加载,获取和设置id和auxid
在业务拓扑建立完成之后,BIOS从BMC获取到PCIe设备槽位和CPU资源配置信息,BIOS向BMC上报PCIe设备的deviceBDF信息。
BMC通过deviceBDF信息向PMU发出请求,查询到PCIe设备四元组信息,通过PCIe设备四元组即可完成对应设备CSR的加载;
代码中parse_pcie_card_bdf_data函数解析bdf信息,然后调用task_load_unload_device函数,加载pcie设备,并设置id和auxid属性值,设置Presence值为1
load_unload_device函数中加载时有如下打印
device_loader.lua(214): [BizTopoLoader] Load PCIeCard, Slot=3, path=/bmc/kepler/Connector/Connector_PCIE_SLOT3_01010103, Id-AuxId=100010e2-10004010基于BIOS上报BDF对标准PCIe设备进行管理的方案,查询到的Connector属性,里面包含了Id,AuxId,Presence;
~ ~ $ busctl --user introspect bmc.kepler.hwdiscovery /bmc/kepler/Connector/Connector_PCIE_SLOT3_01010103 | cat
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
bmc.kepler.Connector interface - - -
.Reload method a{ss}sssy - -
.AuxId property s "10004010" emits-change writable
.Bom property s "14140130" emits-change
.Buses property as 1 "I2cMux_Pca9545_PCA9545_3_01010103" emits-change
.ChassisId property s "" emits-change
.GroupId property u 57 emits-change
.GroupPosition property s "0101010303" emits-change
.Id property s "100010e2" emits-change writable
.IdentifyMode property y 2 emits-change
.LoadStatus property y 0 emits-change
.ManagerId property s "1" emits-change
.Presence property y 1 emits-change writable
.SilkText property s "RiserCard1" emits-change
.Slot property y 3 emits-change writable
.SystemId property y 1 emits-change
.Type property s "PCIe" emits-change这里能够正确完成,表示已经正确识别到RAID卡对应的csr,加载csr成功。 app.log里面会有如下打印:
[BizTopoLoader] Load PCIeCard, Slot=3, path=/bmc/kepler/Connector/Connector_PCIE_SLOT3_01010103, Id-AuxId=100010e2-10004010framework.log里面有如下打印:
hwdiscovery NOTICE: hwcomponent.lua(309): [self-discovery] name: Connector_PCIE_SLOT3_01010103, position: 0101010303, current: 1, previous: 0,uptime: 125 s
hwdiscovery NOTICE: init.lua(162): position: 0101010303, get csr data from /opt/bmc/sr/14140130_100010e2_10004010.sr, format version: 3.00, data version: 3.00
hwdiscovery NOTICE: hwcomponent.lua(205): position: 0101010303, load sr data successfully, uptime: 125 s, cost: 20ms
hwdiscovery NOTICE: hwcomponent.lua(226): position: 0101010303, start to process sr data, source: /opt/bmc/sr/14140130_100010e2_10004010.sr, format version: 3.00, data version: 3.00, uptime: 125 s
hwdiscovery NOTICE: hwcomponent.lua(309): [self-discovery] name: Connector_PCIE_SLOT1_01010103, position: 0101010301, current: 1, previous: 0,uptime: 125 sstorage负责RAID卡初始化,以及具体功能
pcie_device识别到对应的RAID卡csr之后,会调用storage模块,获取RAID卡信息
代码获取方式:storage
1、初始化RAID卡
检测到开机后,会触发相应函数,有如下打印
storage NOTICE: bus_monitor_service.lua(84): [monitor-power] set power state from ONING to ON
storage NOTICE: controller_object.lua(150): controller init obj.Id = 255, object_id = 1
storage NOTICE: controller_object.lua(698): Controller_0, RefChip.Path:/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010303
storage NOTICE: controller_object.lua(334): ctrl0 add_controller_to_link_topo successfully
storage NOTICE: controller_object.lua(336): ctrl0 add_controller_to_sml successfully如果配置错误,会导致如下现象:
storage NOTICE: init.lua(534): sml: set i2c chip, ctrl_idx=0, chip=/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010303
storage NOTICE: controller_object.lua(698): Controller_0, RefChip.Path:/bmc/kepler/Chip/Complex/Chip_RaidChip_0101010303
storage NOTICE: controller_object.lua(334): ctrl0 add_controller_to_link_topo successfully
storage ERROR: tasks.lua(83): task [Controller.register_controller.0] error: ...bmc/apps/storage/lualib/controller/controller_object.lua:723: [Storage] Failed to add controller 0, ret: 4357在framework.log中也能看到相应的错误信息:
framework NOTICE: l_sml_adapter.cpp(240): register_sml_adapter_function: controller type id is 14
framework ERROR: l_sml_adapter.cpp(159): register_sml_adapter_fun_table: g_sml_adapter is already generate
framework ERROR: l_sml_adapter.cpp(181): register_pd_log_parse_table: g_pd_log_adapter is already generate
framework ERROR: l_sml_adapter.cpp(71): register_pd_log_parse: g_pd_log_parse is already generate
framework ERROR: adapter.c(603): Failed to load lsi sml library /usr/lib64/libsml_lsi.so for MegaRAID SAS Controller. error : /usr/lib64/libsml_lsi.so: cannot open shared object file: No such file or directory
framework ERROR: adapter.c(918): smlib : Add controller management [Ctrl index 0, Ctrl ID 0] failed, return 0x11d8修改方法是在manifest.yaml中,libmgmt_protocol下增加配置:
- conan: libmgmt_protocol
options:
storelib_enable: true增加配置后,重新编译,验证,就可以得到正确的流程,查询相关的DBUS 信息,可以得到相关RAID卡信息;
~ ~ $ busctl --user introspect bmc.kepler.storage /bmc/kepler/Systems/1/Storage/Controllers/Controller_1_0101010303 | cat
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
bmc.kepler.Inventory.Hardware interface - - -
.AssetName property s "MegaRAID 9560-8i 4GB" emits-change
.AssetTag property s "N/A" emits-change
.AssetType property s "PCIe RAID Card" emits-change
.FirmwareVersion property s "5.290.02-3997" emits-change
.ManufactureDate property s "N/A" emits-change
.Manufacturer property s "Broadcom" emits-change
.Model property s "SAS3908" emits-change
.PCBVersion property s "N/A" emits-change
.PartNumber property s "06030622" emits-change
.SerialNumber property s "SPD3511231" emits-change
.Slot property s "0" emits-change
.UUID property s "N/A" emits-change2、获取RAID卡信息
初始化完成后,开始更新RAID卡信息,调用c_controller:start()如下:
self:start_update_task()
self:start_update_pd_list_task()
self:start_update_ld_list_task()
self:start_update_phy_err_task()这些任务负责周期性获取RAID卡信息,比如RAID卡的控制卡、逻辑盘、物理盘,物理盘错误信息等 举例:start_update_task调用get_ctrl_info函数,通过lua和c代码的映射,最终会调用到对应卡库里面的c函数;
获取到信息后,通过on_update:on函数,触发刷新数据;
self:update_static_controller_info(info)
self:update_controller_info(info)
self:update_asset_data_info()这些update函数负责更新数据到自身对象的属性中,从dbus上可以获取到相关数据;
3、设置RAID卡信息
RAID卡设置命令的入口是rpc_service_controller文件,需要关注这个文件中定义的各种接口 举例:ctrl_task_operate是一个入口,通过异步的任务方式执行,避免阻塞; 举例:ClearForeignConfig接口,是直接执行的,并没有异步执行;
配置RAID卡需要关注的点
如何查找对应的csr
首先看vpd里面是否已经支持了相应的RAID卡,先拿到RAID卡对应的四元组信息,比如 制造商ID 0x1000 设备ID 0x10e2 子厂商ID 0x1000 子设备ID 0x4010 对应的csr文件的名称是:14140130_100010e2_10004010.sr
当前底层通信只支持了i2c模式和mctp over pcie模式
如果需要增加其他模式,需要另外开发;
博通的RAID板卡,底层使用的i2c,i2c地址为0x02,如果用i2cdetect看是0x1的位置有相应,最新的hba卡9600使用的mctp协议,还未适配; PMC的板卡,底层使用的mctp over pcie,首先需要正常建立起MCTP链路后才能正常进行数据更新,可查看busctl --user tree bmc.kepler.mctpd | cat是否正常; 华为的卡,底层使用的mctp over pcie,但是提供了一些i2c地址,可以用来单独查询Lm75,eeprom等相关信息;