Home Data Center Project: Persistent kernel panic and upgrading R710 BIOS

The following is part of a series of posts I wrote called "Building a data center at home".

Living in the SF bay area, you can acquire 2nd hand data center equipment relatively cheap. The following are a series of posts detailing my deep dive in building a cluster with data-center equipment at home (often called a homelab) which consists of 48 CPU cores, 576Gb RAM, 33.6TB storage across 60 x 6Gb/s HDD's with a combined weight of just over 380lb/170kg within a budget of $3000.

I’ve been working on my home data center project for a number of weeks now and came across an interesting issue in the last week.

I accidentally pressed F10 during a boot of one of the R710s which put the server in a bit of a spin. After exiting this given boot cycle, the bios was continually showing a message during the POST sequence saying that system services needed updating. Attempting to update the system services continually gave an error along the lines of requiring some form of hardware licencing. Confused and somewhat worried that I’d bricked the R710, I started to search for answers to this issue.

I read that changing the boot mode back to Boot instead of UEFI can help this situation but in my case, it didn’t initially. The boot sequence couldn’t find the required isolinux.bin file. To solve this issue on the R710, I needed to switch the boot mode for the USB drive from Auto to Hard-drive so that the USB is not seen as removable storage.

selecting a USB 'harddisk' on boot up

After making this change, the boot sequence worked but I started getting issues when booting any OS, typically with a Kernel Panic error.

Kernel Panic of CENTOS during boot

No matter what I try, I can not seem to get anything to boot, even the Dell Support LiveCD image.

I check the dell support site for the R710 and see that has a bunch of 11th Generation downloads but realize that the current version of the Support LiveCD image (version 3) doesn’t support 11th generation servers anymore, which is the generation the 710 series belongs to. After a bit of further digging I realize I can get the older 2.2 version from here which does support 11th generation.

After I flash the older version, I try booting again and am still getting Kernel panic when starting the Support LiveCD image. Weird! I start to think that maybe the issue could be hardware based? As booting the LiveCD image doesn’t work, I try running a memory test instead to see if that gives any hints as to the issue.

After a bit over an hour of running memtest86, I see an error come up with an address range. One issue with memtest86 is that it doesn’t tell you which stick of RAM was problematic, but luckily, this error also appears to be picked up by the motherboard and the error shows up on the LCD screen on the front of the server telling me which stick has bitten the dust.

I replace the stick in question with a known good one and then reboot. I try the LiveCD and finally, this time it boots up! Great.

Seems that I’ve solved the initial problem of the Kernel Panic, but I might as well update the BIOS while I’ve got everything ready and waiting to go.

From within the OS, I launch OMSA. The credentials are root:dell which are found in the manual which can be seen here.

OMSA gives a bunch of good information regarding the server, but doesn’t allow actions. To update the BIOS, I need to download the update binary directly while in the LiveCD OS and then run it.

The binary file can be found here

After downloading its a case of chmoding and running the binary. The update takes a little while and just for reference, I recorded it in case anything went sideways (which it luckily didn’t).