The following is part of a series of posts called "Building a data center at home".
Living in the SF bay area, you can acquire 2nd hand data center equipment relatively cheap. The following are a series of posts detailing my deep dive in building a cluster with data-center equipment at home (often called a homelab) which consists of 48 CPU cores, 576Gb RAM, 33.6TB storage across 60 x 6Gb/s HDD's with a combined weight of just over 380lb/170kg within a budget of $3000.
I’ve been working on my home data center project for a number of weeks now and came across an interesting issue in the last week.
I accidentally pressed F10 during a boot of one of the R710
s which put the server in a bit of a
spin. After exiting this given boot cycle, the bios was continually showing a message during the
POST sequence saying that system services needed updating. Attempting to update the system services
continually gave an error along the lines of requiring some form of hardware licencing. Confused and
somewhat worried that I’d bricked the R710
, I started to search for answers to this issue.
I read that changing the boot mode back to Boot
instead of UEFI
can help this situation but in
my case, it didn’t initially. The boot sequence couldn’t find the required isolinux.bin
file.
To solve this issue on the R710
, I needed to switch the boot mode for the USB drive from Auto to
Hard-drive so that the USB is not seen as removable storage.
After making this change, the boot sequence worked but I started getting issues when booting any OS, typically with a Kernel Panic error.
No matter what I try, I can not seem to get anything to boot, even the Dell Support LiveCD image.
I check the dell support site for the R710
and see that has a bunch of 11th Generation downloads but realize that the current version of the
Support LiveCD image (version 3) doesn’t support 11th generation servers anymore, which is the
generation the 710 series belongs to. After a bit of further digging I realize I can get the older
2.2 version from
here
which does support 11th generation.
After I flash the older version, I try booting again and am still getting Kernel panic when starting the Support LiveCD image. Weird! I start to think that maybe the issue could be hardware based? As booting the LiveCD image doesn’t work, I try running a memory test instead to see if that gives any hints as to the issue.
After a bit over an hour of running memtest86
, I see an error come up with an address range. One
issue with memtest86
is that it doesn’t tell you which stick of RAM was problematic, but luckily,
this error also appears to be picked up by the motherboard and the error shows up on the LCD
screen on the front of the server telling me which stick has bitten the dust.
I replace the stick in question with a known good one and then reboot. I try the LiveCD and finally, this time it boots up! Great.
Seems that I’ve solved the initial problem of the Kernel Panic, but I might as well update the BIOS while I’ve got everything ready and waiting to go.
From within the OS, I launch OMSA. The credentials are root:dell
which are found in the
manual which can be seen here.
OMSA gives a bunch of good information regarding the server, but doesn’t allow actions. To update the BIOS, I need to download the update binary directly while in the LiveCD OS and then run it.
The binary file can be found here
After downloading its a case of chmoding and running the binary. The update takes a little while and just for reference, I recorded it in case anything went sideways (which it luckily didn’t).