Ubuntu 8.04.1 on Dell Poweredge R300
this tech-doc should apply to a variety of servers as its focused on the SAS 6/iR-Raidcontroller that is widely used even in bigger systems and can be utilized using the tools described here. In Linux the controller is known as LSI-Controller.
my system here:
- Dell PowerEdge R300.
- 12GB RAM
- SAS 6/iR with two 750GB SATA hotplug
- XEON quadcore @2.83GHz
- ubuntu 8.04.1 64bit server
raid
configuring the raid and installing ubuntu
I booted from the utility-cd that comes with the server and there I found a tool to setup the raid and I created a virtual-disk out of the two SATA-disks as RAID1 (mirroring). Then I rebooted and started installing ubuntu. The virtual harddisk was detected as scsi-disk /dev/sda without any problems. Installing was as easy as riding a bicycle. (which means that cardrivers might face problems ;) )
installing tools to control the raid inside ubuntu
rkeller has put an amazing and incredible helpful howto online at http://www.goingwip.de/.
With this howto I was able to learn about available tools and install them within one hours !! Thnx a lot and all credits goes to rkeller. I followed his howto and only modified 1 or 2 minor things. All credits goes to rkeller and none to me !!!!
Note that this tools are essential tools for you server - if you ever run into troubles with your raid you will need this tools !!
I installed:
- mpt-status which comes as package
apt-get install mpt-status
. I also removed exim4 which is installed with mpt-status, cause I plan to use sendmail later. I guess exim4 is installed cause there is mpt-status-daemon which needs mail ...
- modprobe mptctl
- add mpctl to /etc/modules
- run mpt-status to check if things work
- optional : you can add the mptctl-module to your initramdisk to have the module available when OpenManage-Server (see below) is started. See /etc/initramfs-tools/modules and update-initramfs for details.
- lsiutils which has to be downloaded and installed manually
wget ftp://ftp.lsil.com/HostAdapterDrivers/linux/lsiutil/lsiutil.tar.gz tar xvfz lsiutil.tar.gz cd lsiutil rm lsiutil make cp lsiutil/lsiutil /usr/local/sbin/
- run lsiutil to check if things work
- DELL-OMSA
- add deb ftp://ftp.sara.nl/pub/sara-omsa dell sara to /etc/apt/sources
apt-get update apt-get install openipmi ia32-libs lib32ncurses5 rpm procmail apt-get install dellomsa /etc/init.d/dataeng enablesnmp /etc/init.d/snmpd restart /etc/init.d/dataeng restart /etc/init.d/dsm_om_connsvc start
- now check if you can logon on http://yourhost:1311 as root. If you cannot, which seems to be true on all 64bit-systems follow the following steps to make omsa-pam using 32libs.
cd /tmp/ wget http://ftp.de.debian.org/debian/pool/main/p/pam/libpam-modules_0.79-5_i386.deb dpkg -x libpam-modules_0.79-5_i386.deb ./ cp lib/security/pam_unix.so /lib32/security/ cp lib/security/pam_nologin.so /lib32/security wget http://ftp.de.debian.org/debian/pool/main/libs/libsepol/libsepol1_1.14-2_i386.deb dpkg -x libsepol1_1.14-2_i386.deb ./ cp lib/libsepol.so.1 /lib32/ wget http://ftp.de.debian.org/debian/pool/main/libs/libselinux/libselinux1_1.32-3_i386.deb dpkg -x libselinux1_1.32-3_i386.deb ./ cp lib/libselinux.so.1 /lib32/
- change all /lib/ in /etc/pam.d/omauth with /lib32/ and run ldconfig
- now you should be able to login at port 1311 as root
Note that mpctl has to be loaded as module before all the Dell-OMSA services are started if you want to access your raid-status via the OMSA-interface. You can do this manually after startup and then restart the services as described above or you can add the module to your initramfs (as described under mpt-status-setup above). Putting it in /etc/modules will not be sufficiant.
using this tools
Dell OpenManage Server Administrator
This is the Webbased OpenManage Server that also exists on windows and if you have to call dell support this is the tool they will be very happy about. You can query loads of status-information about the system itself like temperature and you can query information about your raid-controller. unfortunately it will *not* allow you to perform any complex operations on the raid-controller like configuring disks as spare-disk or stuff like this. You will have to use lsiutil for this (see below). At least the OpenManageServer can show you the status of your raid and physical disks and the percentage of resync.
For accessing Dell OpenManageServer go to https://YOURSERVER.COM:1311 and log in as root.
If this page is not available be sure that mptctl is loaded as module and restart the related snmp-daemons and the service itself:
/etc/init.d/dataeng enablesnmp /etc/init.d/snmpd restart /etc/init.d/dataeng restart /etc/init.d/dsm_om_connsvc start
mpt-status
mpt-status will give you a short and brief information about you raid-controller-status. Nothing less and nothing more. If your controller is not on scsi-id=0 you need to specify the id-number using the -i flag and you can use the -p flag to find out what id your controller is on.
- mpt-status -p
- mpt-status -ni 1 (or whatever the above command recommends) will show you what you want to see
there is a daemon mpt-statusd that will probe the controller for changes and email problems to you. be sure to setup the daemon in /etc/default/mtp-statusd. especially the id is of uttermost importance. After you changed this settings you can (re)start it with /etc/init.d/mpt-statusd restart
Note that using the -n flag mpt-status is the only tool that will also list old missing physical disks Neither the OpenManage-Interface nor lsiutil reveal this information !!.
lsiutil
lsiutil this is the tool you want - it can do much more then the webbased dell-Dell OpenManage Server Administrator !! Its a textbased menu-style tool that can perfom the full range of raid-operations. Which means that its mighty and dangerous. It can delete raid, create raid and most important create hot spares which means : add lost disks to your raid.
the menu is simple:
- choose the controler (most systems will have only one anyway)
- now you are in the mainmenu. for me the following are the most interesting:
- 8 ... scan for devices
- 21 .. RAID actions
- if you enter menu 21 = RAID actions you can choose (beside others)
- 1,2,3 to read the status of your stuff
- 50 to add a new unused disk to the hot-spare-pool. Choose 0 as pool-number and watch in menu 1,3 how a degraded raid will use this new disk immediately
- pressing 0 will bring you up on menu-level (or exit on the top-level) and pressing return will show all available options
upgrade a degraded raid
bad things happen: one disk fails and you raid is running degraded. You replace the failed disk and ... nothing happen. You need to tell the raid that it is allowed to use the new disk as replacement of the failed disk !! Which means you means that you need to create hot spare which is menue 21->50 in lsiutil. Only disks are listed that are not in use anyway but to be sure check B T L if its the right disk (I use the OpenManageServer to get the proper number here). Then you choose the disk (usually there will be only one new disk) and the raidcontroller will use it immediately to replace the failed disk and start rebuilding the raid. This will take hours on large disks, so dont panic if the status stay at 0% in OpenManageServer for 15minutes or longer. Thats fine.
debug strange errors
my server crashed and after reboot came up fine, but there was only one physical disk available any more and the raid was degraded. Dell-support was good but very rational ....
- day1 : the send me a new disk to replace the one that was not available any more. Didnt change anything. The new disk was not available anyway. I rebooted the server several times and checked every corner of the SAS-controllers-bios and even crosschanged the disks to see that the problem is not the disk but the bay where the disk is in. Any disk in this bay is not recognized by the system at all. Server is started again and runs fine in degraded mode.
- day2 : dell sends a subcontract-technican with a new backplane. The friendly and compentent man changes the backplane only to discover that the disk is still not recognized. Dell-support advices us to reset the sas-controller-config. We do it but nothing changes so we agree with the dell-support that it seems that the controller is the problem. Unfortunately the technican does not have a new controller with him cause this was not the problem Dell expected. Dell will send it the next day. So we start the server and run it in degraded mode.
- day3 : the technican brings the new controller and replaces it and finally the disk is recognized but its not possible to mark it as hot spare in the controller bios or find another way to add it to the raid. Dell-support seems not available so we agree to meet again on the next day which would have been day4.
I finally manage to mark the disk as “hot-spare” using lsiutils and the raid starts resyncing.
The id-of the sas-controller has changed, the old (removed) disk is still visible with mpt-status -n but not in lsiutil but the controller is smart enough to put it all together fine.
Conclusion:
Plus: Dell-Support is fine. They promise next-day-service and they keep their promise. Telephone-support is friendly and competent and no long-waiting. At-Place Technicans are not from Dell but subcontracted. They are friendly and compentent.
Minus: I had one failed disk and it took 3 workdays and many many workhours in a remote serverfarm and many hours with dell-support and many reboots and down-hours to fix this. Actually I would have preferred one technican coming with a variety of replacement-tools the first day and fix the server immediately.
Neutral: Dell does not actively support linux, but they offer their OpenManageServer for linux and they dont blame the problems on linux.



