Storage on Cluster DRBD and GFS2

Da PiemonteWireless.

Indice

How create a good shared storage to be used in a cluster (using DRBD and GFS2)

I tried different way to create a shared storage for my cluster.
My first and most important rule (after performance) was synchronization in both direction, so I can't use simple rsync command. I tested Glusterfs but performances was very low, then I tried Unison but it's pretty unstable and I had alot of problem just to compile it.
So I decided to use a block level Distributed file system and I choose DRBD with OCFS2, becouse I read alot of good news about them.

The following text is my experience with them, using my prefered linux distro: Ubuntu.

DRBD part

Install prerequisites (needed packages)

apt-get install ethstats dpatch patchutils cogito git-core sp docbook-utils docbook build-essential flex dpkg-dev fakeroot module-assistant

Compile and install DRBD

This is the Ubuntu (Debian-like) way to compile both kernel module and package:

mkdir /opt/source/drbd
cd /opt/source/drbd
git clone git://git.drbd.org/drbd-8.3
cd drbd-8.3/
dpkg-buildpackage -rfakeroot -b -uc
cd ..
ddpkg -i drbd8-module-source_8.3.0-0_all.deb drbd8-utils_8.3.0-0_amd64.deb
module-assistant auto-install drbd8


Configure DRBD

This example show a simple 2 machine configuration file. Some definition:

Server 1

''name'': machineA
''ip'': 192.168.50.10
''disk'': /dev/sda1

Server 2

''name'': machineB
''ip'': 192.168.50.12
''disk'': /dev/sda3

You need to edit the file /etc/drbd.conf with the following content:

# drbd.conf edited by Simone (2009)

global {
  usage-count no;
}
common {
  protocol C;
  syncer { rate 10M; }
}
resource r0 {
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
  }
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
  syncer {
    rate 10M;
  }
  on machinaA {
    device     /dev/drbd0;
    disk       /dev/sda1;
    address    192.168.50.10:7788;
    flexible-meta-disk  internal;
  }
  on machineB {
    device    /dev/drbd0;
    disk      /dev/sda3;
    address   192.168.50.11:7788;
    meta-disk internal;
  }
}


Create two partitions on the two servers with the same size (roughly).

On machineA:

mkfs.ext3 /dev/sda1

On machineB:

mkfs.ext3 /dev/sda5


The partition have exactly the same size

Run this command on both machines:

drbdadm create-md r0


The partition are a roughly differently in size

If the partition are a roughly differently in size do the follow command FIRST on the machine with smaller disk:

drbdadm create-md r0

If it complains about disk size, like:

The server's response is:

you are the 2nd user to install this version
md_offset 146778664960
al_offset 146778632192
bm_offset 146774151168

Found ext3 filesystem which uses 143338544 kB
current configuration leaves usable '''143334132''' kB

Device size would be truncated, which
would corrupt data and result in
'access beyond end of device' errors.
You need to either
   * use external meta data (recommended)
   * shrink that filesystem first
   * zero out the device (destroy the filesystem)
Operation refused.

then (only if previous command complain about) run this command on both machines:

e2fsck -f /dev/cciss/c0d1p1 && resize2fs /dev/cciss/c0d1p1 143334132K

where 143334132 is the suggested size in Kb that you can find in the previous command output.


Start DRBD

Run this command on both machines:

/etc/init.d/drbd start

Then run ONLY ON machineA:

drbdsetup /dev/drbd0 primary -o

WAIT THE END OF THE SYNCHRONIZATION!!! You can check the status:

cat /proc/drbd 

my output in this stadium was:

version: 8.3.0 (api:88/proto:86-89)
GIT-hash: fb12a0c50f88409dab4779169698b82909e21eb0 build by root@crono, 2009-01-17 03:41:20
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---
    ns:3911264 nr:0 dw:0 dr:3911264 al:0 bm:238 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:139422868
	[>....................] sync'ed:  2.8% (136155/139974)M
	finish: 2:32:52 speed: 15,040 (4,016) K/sec

Then run:

drbdadm primary r0

At the end both nodes should be in the following state:

/etc/init.d/drbd status

drbd driver loaded OK; device status:
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre,
2008-11-12 16:40:33
m:res  cs         st               ds                 p  mounted  fstype
0:r0   Connected  Primary/Primary  UpToDate/UpToDate  C


Install GFS2

mkdir build
cd build

apt-get source gfs2-tools libopenais-dev libvolume-id-dev 

apt-get install quilt libselinux1-dev linux-libc-dev libvirt-dev libxml2-dev libncurses5-dev libnss3-dev libnspr4-dev libslang2-dev psmisc libnet-snmp-perl libnet-telnet-perl python-pexpect sg3-utils

cd openais-0.81
cp Makefile.inc Makefile.inc.old
sed s/CFLAGS\ +=\ -O3\ -Wall/CFLAGS\ +=\ -O0\ -Wall/ Makefile.inc.old > Makefile.inc
dpkg-buildpackage -rfakeroot -b -uc
cd ..
dpkg -i libopenais-dev_0.81-0ubuntu5_amd64.deb libopenais2_0.81-0ubuntu5_amd64.deb

cd udev-113
dpkg-buildpackage -rfakeroot -b -uc
cd ..
dpkg -i libvolume-id-dev_113-0ubuntu17.2_amd64.deb libvolume-id0_113-0ubuntu17.2_amd64.deb

cd redhat-cluster-suite-2.20070823.1
dpkg-buildpackage -rfakeroot -b -uc
cd ..
dpkg -i gfs2-tools_2.20070823.1-0ubuntu1_amd64.deb libcman2_2.20070823.1-0ubuntu1_amd64.deb libdlm2_2.20070823.1-0ubuntu1_amd64.deb cman_2.20070823.1-0ubuntu1_amd64.deb openais_0.81-0ubuntu5_amd64.deb

mkdir /etc/cluster
cp cluster.conf /etc/cluster/

vim /etc/cluster/cluster.conf

Change cluster.conf according to your hostnames and your disks.
Make sure your hostname are in /etc/hosts.

GFS2 needs cman/openais cluster, so start it:

/etc/init.d/cman start

You can check the node are up:

cman_tool nodes

Node  Sts   Inc   Joined               Name
   1   M     32   2009-01-13 17:30:42  hostname1
   2   M     28   2009-01-13 17:30:42  hostname2


Create and Start GFS2 on top of DRBD

Create the GFS2 filesystem on the DRBD device using dlm lock manager. MAKE THE FILESYSTEM ONLY ON ONE NODE.

mkfs.gfs2 -t cluster:gfs1 -p lock_dlm -j 2 /dev/drbd0

Finally mount the device on both servers:

mount -t gfs2 /dev/drbd0 /mnt

You can use the file mountgfs2.sh (at the end of this page) to mount GFS2 at boot:

cp mountgfs2.sh /etc/init.d/
update-rc.d mountgfs2.sh start 70 2 3 4 5 . stop 07 0 1 6 .



Failures

If hostname2 failed for one or another reason, following lines will appear in /var/syslog of hostname1:

Jan 13 22:04:10 hostname1 kernel: dlm: closing connection to node 2
Jan 13 22:04:10 hostname1 fenced[2543]: hostname2 not a cluster member after 0 sec post_fail_delay
Jan 13 22:04:10 hostname1 fenced[2543]: fencing node "hostname2"
Jan 13 22:04:10 hostname1 fenced[2543]: fence "hostname2" failed

During this time, access to GFS2 filesystem is frozen.
A manual fencing (executed on functional machine) is needed to get access again to the shared partition:

fence_ack_manual -n hostname2

Once this is done, repair the failed node and connect it with the valid one only when you are sure it is ok, necessary to avoid corruption!


FILES

  • drbd.conf
========================== <cut here> ============================
# DRBD8 HA /etc/drbd.conf configuration file
resource r0 {
  protocol C;				# protocol between devices
  startup {
    wfc-timeout 120; 			# wait 2min for other peers
    degr-wfc-timeout 120; 		# wait 2min if peer was already 
    					# down before this node was rebooted
    become-primary-on both;
  }
  net {
    allow-two-primaries;
    cram-hmac-alg "sha1";		# algo to enable peer authentication
    shared-secret "123456";
    
    # handle split-brain situations
    after-sb-0pri discard-least-changes;# if no primary auto sync from the 
    					# node that touched more blocks during
                         		# the split brain situation.
    after-sb-1pri discard-secondary;	# if one primary
    after-sb-2pri disconnect;		# if two primaries
    
    # solve the cases when the outcome
    # of the resync decision is incompatible
    # with the current role assignment in
    # the cluster
    rr-conflict disconnect;		# no automatic resynchronization
    					# simply disconnect
  }
  disk {
    on-io-error detach;			# detach the device from its
    					# backing storage if the driver of 
					# the lower_device reports an error 
					# to DRBD
  }
  syncer {
    rate 100M;
  }
  
  on hostname1 {
    device    /dev/drbd0;
    disk      /dev/sdb5;
    address   192.168.9.xx:7789;
    meta-disk internal;
  }
  
  on hostname2 {
    device    /dev/drbd0;
    disk      /dev/sdb5;
    address   192.168.9.xx:7789;
    meta-disk internal;
  }
}
========================== </cut here> ============================
  • cluster.conf
========================== <cut here> ============================
<?xml version="1.0"?>
<cluster name="cluster" config_version="1">
  <!-- post_join_delay: number of seconds the daemon will wait before
                        fencing any victims after a node joins the domain
       post_fail_delay: number of seconds the daemon will wait before
            	        fencing any victims after a domain member fails
       clean_start    : prevent any startup fencing the daemon might do.
		        It indicates that the daemon should assume all nodes
		        are in a clean state to start. -->
  <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <clusternodes>
    <clusternode name="hostname1" votes="1" nodeid="1">
      <fence>
        <!-- Handle fencing manually -->
        <method name="human">
          <device name="human" nodename="hostname1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="hostname2" votes="1" nodeid="2"> 
      <fence>
        <!-- Handle fencing manually -->
        <method name="human">
          <device name="human" nodename="hostname2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <!-- cman two nodes specification -->
  <cman expected_votes="1" two_node="1"/>
  <fencedevices>
    <!-- Define manual fencing -->
    <fencedevice name="human" agent="fence_manual"/>
  </fencedevices>
</cluster>
========================== </cut here> ============================
  • mountgfs2.sh
========================== <cut here> ============================
#! /bin/sh
# /etc/init.d/mountgfs2.sh
#
# Needs to be mounted after drbd start and
# unmounted before drbd stop
# update-rc.d mountgfs2.sh start 70 2 3 4 5 . stop 07 0 1 6 .
#

# Mount gfs2 partition on /synchronized
case "$1" in
  start)
    echo "Mounting gfs2 partition"
    mount -t gfs2 /dev/drbd0 /synchronized
    ;;
  stop)
    echo "Umounting gfs2 partition"
    umount /dev/drbd0
    ;;
  *)
    echo "Usage: /etc/init.d/mountgfs2.sh {start|stop}"
    exit 1
    ;;
esac

exit 0
========================== </cut here> ============================



Riferimenti:


Name (required):

Website:

Comment:

Discussione:Storage on Cluster DRBD and GFS2

204 Rating: 2.0/5 (4 votes cast)

Strumenti personali