Like us on Facebook!

Saturday, 10 September 2016

How-to guide on creating a Replicated Glusterfs Volume using GlusterFS on Centos7 using Digital Ocean and Block Storage


Second part of the exercise is to create a Replicated Glusterfs Volume to overcome the data loss problem faced in the distributed volume. For detailed architectural overview visit glusterfs page.

Click here to view first part

Replicated Glusterfs Volume

Steps:

  1. Create trusted storage pool
  2. Create mounts using block storage created
  3. Create Replicated Volume
  4. Mount volume on a client
  5. Verification

To make it easy for you to follow I have:
  • highlighted commands in BLUE
  • highlighted expected output in GREEN
  • made server name in bold

Prerequisite:


  1. Basic command on linux commands and remote access to servers
  2. This demo can be followed using local VMs using VirtualBox or VM Fusion etc
  3. Understand of LVM

1. Replicate Volume Setup - create a replicated volume with two storage servers. Once again we'll be assigning block storage to these servers. Add repl1 and repl2 to trusted storage pool. 


[eedevs@gluster1 ~]$ sudo gluster peer probe repl1
peer probe: success. 

[eedevs@gluster1 ~]$ sudo gluster peer probe repl2
peer probe: success

2. Create mounts

2.1 on repl1


[eedevs@gluster1-replica ~]$ sudo mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_volume-nyc1-02
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1048576 inodes, 4194304 blocks
209715 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
128 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

[eedevs@gluster1-replica ~]$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda       8:0    0  16G  0 disk 
vda     253:0    0  20G  0 disk 
├─vda1  253:1    0  20G  0 part /
└─vda15 253:15   0   1M  0 part 

[eedevs@gluster1-replica ~]$ sudo pvcreate /dev/sda
WARNING: ext4 signature detected on /dev/sda at offset 1080. Wipe it? [y/n]: y
  Wiping ext4 signature on /dev/sda.
  Physical volume "/dev/sda" successfully created

[eedevs@gluster1-replica ~]$ sudo vgcreate vg_bricks /dev/sda
  Volume group "vg_bricks" successfully created

[eedevs@gluster1-replica ~]$ sudo lvcreate -L 14G -T vg_bricks/brickpool3
  Logical volume "brickpool3" created.

[eedevs@gluster1-replica ~]$ sudo lvcreate -V 3G -T vg_bricks/brickpool3 -n shadow_brick1
  Logical volume "shadow_brick1" created.

[eedevs@gluster1-replica ~]$ sudo mkfs.xfs -i size=512 /dev/vg_bricks/shadow_brick1
meta-data=/dev/vg_bricks/shadow_brick1 isize=512    agcount=8, agsize=98288 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=786304, imaxpct=25
         =                       sunit=16     swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[eedevs@gluster1-replica ~]$ sudo mkdir -p /bricks/shadow_brick1

[eedevs@gluster1-replica ~]$ sudo mount /dev/vg_bricks/shadow_brick1 /bricks/shadow_brick1/

[eedevs@gluster1-replica ~]$ sudo mkdir /bricks/shadow_brick1/brick

[eedevs@gluster1-replica ~]$ lsblk
NAME                           MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda                              8:0    0  16G  0 disk 
├─vg_bricks-brickpool3_tmeta   252:0    0  16M  0 lvm  
│ └─vg_bricks-brickpool3-tpool 252:2    0  14G  0 lvm  
│   ├─vg_bricks-brickpool3     252:3    0  14G  0 lvm  
│   └─vg_bricks-shadow_brick1  252:4    0   3G  0 lvm  /bricks/shadow_brick1
└─vg_bricks-brickpool3_tdata   252:1    0  14G  0 lvm  
  └─vg_bricks-brickpool3-tpool 252:2    0  14G  0 lvm  
    ├─vg_bricks-brickpool3     252:3    0  14G  0 lvm  
    └─vg_bricks-shadow_brick1  252:4    0   3G  0 lvm  /bricks/shadow_brick1
vda                            253:0    0  20G  0 disk 
├─vda1                         253:1    0  20G  0 part /
└─vda15                        253:15   0   1M  0 part 

2.1.2 Optional - add to /etc/fstab

[eedevs@gluster1-replica ~]$ sudo vim /etc/fstab 

[eedevs@gluster1-replica ~]$ cat /etc/fstab 

#
# /etc/fstab
# Created by anaconda on Tue Aug 30 23:46:07 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=c4a662b3-efba-485b-84a6-26e1477a6825 /                       ext4    defaults        1 1

/dev/vg_bricks/shadow_brick1  /bricks/shadow_brick1/  xfs  rw,noatime,inode64,nouuid 1 2

2.2 repeat on repl2

[eedevs@gluster2-replica ~]$ sudo mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_volume-nyc1-01
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1048576 inodes, 4194304 blocks
209715 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
128 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

[eedevs@gluster2-replica ~]$ sudo pvcreate /dev/sda
WARNING: ext4 signature detected on /dev/sda at offset 1080. Wipe it? [y/n]: y
  Wiping ext4 signature on /dev/sda.
  Physical volume "/dev/sda" successfully created

[eedevs@gluster2-replica ~]$ sudo vgcreate vg_bricks /dev/sda
  Volume group "vg_bricks" successfully created

[eedevs@gluster2-replica ~]$ sudo lvcreate -L 14G -T vg_bricks/brickpool4
  Logical volume "brickpool4" created.

[eedevs@gluster2-replica ~]$ sudo lvcreate -V 3G -T vg_bricks/brickpool4 -n shadow_brick2
  Logical volume "shadow_brick2" created.

[eedevs@gluster2-replica ~]$ sudo mkfs.xfs -i size=512 /dev/vg_bricks/shadow_brick2
meta-data=/dev/vg_bricks/shadow_brick2 isize=512    agcount=8, agsize=98288 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=786304, imaxpct=25
         =                       sunit=16     swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[eedevs@gluster2-replica ~]$ sudo mkdir -p /bricks/shadow_brick2

[eedevs@gluster2-replica ~]$ sudo mount /dev/vg_bricks/shadow_brick2 /bricks/shadow_brick2/

[eedevs@gluster2-replica ~]$ sudo mkdir /bricks/shadow_brick2/brick

[eedevs@gluster2-replica ~]$ lsblk
NAME                           MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda                              8:0    0  16G  0 disk 
├─vg_bricks-brickpool4_tmeta   252:0    0  16M  0 lvm  
│ └─vg_bricks-brickpool4-tpool 252:2    0  14G  0 lvm  
│   ├─vg_bricks-brickpool4     252:3    0  14G  0 lvm  
│   └─vg_bricks-shadow_brick2  252:4    0   3G  0 lvm  /bricks/shadow_brick2
└─vg_bricks-brickpool4_tdata   252:1    0  14G  0 lvm  
  └─vg_bricks-brickpool4-tpool 252:2    0  14G  0 lvm  
    ├─vg_bricks-brickpool4     252:3    0  14G  0 lvm  
    └─vg_bricks-shadow_brick2  252:4    0   3G  0 lvm  /bricks/shadow_brick2
vda                            253:0    0  20G  0 disk 
├─vda1                         253:1    0  20G  0 part /
└─vda15                        253:15   0   1M  0 part 

2.2.1 Optional - mount to /etc/fstab

[eedevs@gluster2-replica ~]$ sudo vim /etc/fstab 

[eedevs@gluster2-replica ~]$ cat /etc/fstab 

#
# /etc/fstab
# Created by anaconda on Tue Aug 30 23:46:07 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=c4a662b3-efba-485b-84a6-26e1477a6825 /                       ext4    defaults        1 1

/dev/vg_bricks/shadow_brick2  /bricks/shadow_brick2/  xfs  rw,noatime,inode64,nouuid 1 2

3. Create Replicated Volume using below gluster command on repl1

[eedevs@gluster1-replica ~]$ sudo gluster volume create shadowvol replica 2 repl1:/bricks/shadow_brick1/brick repl2:/bricks/shadow_brick2/brick
[sudo] password for eedevs: 
volume create: shadowvol: success: please start the volume to access data

[eedevs@gluster1-replica ~]$ sudo gluster volume start shadowvol
volume start: shadowvol: success

[eedevs@gluster1-replica ~]$ sudo gluster volume status
Status of volume: distvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick server1:/bricks/dist_brick1/brick     49152     0          Y       3916 
Brick server2:/bricks/dist_brick2/brick     49152     0          Y       3872 

Task Status of Volume distvol
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: shadowvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick repl1:/bricks/shadow_brick1/brick     49152     0          Y       6561 
Brick repl2:/bricks/shadow_brick2/brick     49152     0          Y       7342 
Self-heal Daemon on localhost               N/A       N/A        Y       6581 
Self-heal Daemon on gluster1.nyc.eedevs     N/A       N/A        Y       5722 
Self-heal Daemon on server2                 N/A       N/A        Y       5634 
Self-heal Daemon on repl2                   N/A       N/A        Y       7364 

Task Status of Volume shadowvol
------------------------------------------------------------------------------
There are no active volume tasks

[eedevs@gluster1-replica ~]$ sudo gluster volume info shadowvol

Volume Name: shadowvol
Type: Replicate
Volume ID: 329ce3e4-8a9d-48dd-9dd7-e026dbbbd3ac
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: repl1:/bricks/shadow_brick1/brick
Brick2: repl2:/bricks/shadow_brick2/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

3.1 Set nfs.disable to off to access mount over NFS on repl1

[eedevs@gluster1-replica ~]$ sudo gluster volume set shadowvol nfs.disable off

volume set: success

4. On client set Defaultvers=3 in nfsmount.conf

4.1 Setup nfs and firewall

[eedevs@gluster-client ~]$ tee gfs.fw <<-'EOF'
sudo yum -y install vim
sudo yum -y install nfs-utils
sudo systemctl enable firewalld
sudo systemctl start firewalld
sudo firewall-cmd --permanent --zone=public --add-port=24007-24008/tcp
sudo firewall-cmd --permanent --zone=public --add-port=24009/tcp
sudo firewall-cmd --permanent --zone=public --add-service=nfs --add-service=samba --add-service=samba-client
sudo firewall-cmd --permanent --zone=public --add-port=111/tcp --add-port=139/tcp --add-port=445/tcp --add-port=965/tcp --add-port=2049/tcp --add-port=38465-38469/tcp --add-port=631/tcp --add-port=111/udp --add-port=963/udp --add-port=49152-49251/tcp
sudo systemctl reload firewalld
sudo systemctl status firewalld
sudo systemctl stop nfs-lock.service
sudo systemctl stop nfs.target
sudo systemctl disable nfs.target
sudo systemctl start rpcbind.service
EOF


[eedevs@gluster-client ~]$ sudo sh gfs.fw

4.2 Change nfs defaultvers

[eedevs@gluster-client ~]$ sudo sed -i 's/# Defaultvers=4/Defaultvers=3/' /etc/nfsmount.conf

[eedevs@gluster-client ~]$ grep Defaultver /etc/nfsmount.conf

Defaultvers=3

4.3 Mount volume

[eedevs@gluster-client ~]$ sudo mkdir /mnt/shadowvol

[eedevs@gluster-client ~]$ sudo mount -t nfs -o vers=3 repl2:/shadowvol /mnt/shadowvol/

[eedevs@gluster-client ~]$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda     253:0    0  20G  0 disk 
├─vda1  253:1    0  20G  0 part /
└─vda15 253:15   0   1M  0 part 

[eedevs@gluster-client ~]$ df -Th
Filesystem       Type            Size  Used Avail Use% Mounted on
/dev/vda1        ext4             20G  1.2G   18G   7% /
devtmpfs         devtmpfs        236M     0  236M   0% /dev
tmpfs            tmpfs           245M     0  245M   0% /dev/shm
tmpfs            tmpfs           245M   21M  225M   9% /run
tmpfs            tmpfs           245M     0  245M   0% /sys/fs/cgroup
tmpfs            tmpfs            49M     0   49M   0% /run/user/1000
server1:/distvol fuse.glusterfs  6.0G   66M  6.0G   2% /mnt/distvol

repl2:/shadowvol nfs             3.0G   33M  3.0G   2% /mnt/shadowvol

4.3 Optional - add mount to /etc/fstab

[eedevs@gluster-client ~]$ sudo vim /etc/fstab 

[eedevs@gluster-client ~]$ cat /etc/fstab 

#
# /etc/fstab
# Created by anaconda on Tue Aug 30 23:46:07 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=c4a662b3-efba-485b-84a6-26e1477a6825 /                       ext4    defaults        1 1
server1:/distvol  /mnt/distvol    glusterfs     _netdev    0 0

repl3:/shadowvol  /mnt/shadowvol/  nfs vers=3  0 0

5. Verify by adding a file on the client and confirm that it is available on both repl1 and repl2 servers

5.1 on client

[eedevs@gluster-client ~]$ sudo touch /mnt/shadowvol/replicated

[eedevs@gluster-client ~]$ ls -lrt /mnt/shadowvol/replicated 
-rw-r--r-- 1 root root 0 Sep 10 08:53 /mnt/shadowvol/replicated

5.2 on repl1

[eedevs@gluster1-replica ~]$ ls -lrt /bricks/shadow_brick1/brick/
total 0
-rw-r--r-- 2 root root 0 Sep 10 08:53 replicated

[eedevs@gluster1-replica ~]$ sudo lvdisplay
[sudo] password for eedevs: 
  --- Logical volume ---
  LV Name                brickpool3
  VG Name                vg_bricks
  LV UUID                G9E8Wa-V0Kr-tY8m-fUiW-hotH-gs8P-4ZoV3E
  LV Write Access        read/write
  LV Creation host, time gluster1-replica.nyc.eedevs, 2016-09-10 08:30:27 +0000
  LV Pool metadata       brickpool3_tmeta
  LV Pool data           brickpool3_tdata
  LV Status              available
  # open                 2
  LV Size                14.00 GiB
  Allocated pool data    0.08%
  Allocated metadata     0.59%
  Current LE             3584
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           252:2
   
  --- Logical volume ---
  LV Path                /dev/vg_bricks/shadow_brick1
  LV Name                shadow_brick1
  VG Name                vg_bricks
  LV UUID                LKpKJ5-SeSW-xgts-QqMp-hpch-eW1G-Or2ElN
  LV Write Access        read/write
  LV Creation host, time gluster1-replica.nyc.eedevs, 2016-09-10 08:30:32 +0000
  LV Pool name           brickpool3
  LV Status              available
  # open                 1
  LV Size                3.00 GiB
  Mapped size            0.36%
  Current LE             768
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192

  Block device           252:4

5.2 on repl2

[eedevs@gluster2-replica ~]$ ls -lrt /bricks/shadow_brick2/brick/
total 0

-rw-r--r-- 2 root root 0 Sep 10 08:53 replicated

[eedevs@gluster2-replica ~]$ sudo lvdisplay 
[sudo] password for eedevs: 
  --- Logical volume ---
  LV Name                brickpool4
  VG Name                vg_bricks
  LV UUID                cpKfgV-EePo-U8HX-4cIe-om3L-Y243-r4Xm1Q
  LV Write Access        read/write
  LV Creation host, time gluster2-replica.nyc.eedevs, 2016-09-10 08:35:29 +0000
  LV Pool metadata       brickpool4_tmeta
  LV Pool data           brickpool4_tdata
  LV Status              available
  # open                 2
  LV Size                14.00 GiB
  Allocated pool data    0.08%
  Allocated metadata     0.59%
  Current LE             3584
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           252:2
   
  --- Logical volume ---
  LV Path                /dev/vg_bricks/shadow_brick2
  LV Name                shadow_brick2
  VG Name                vg_bricks
  LV UUID                PMP3tn-gIHv-4h5o-JMcq-7NWx-IlxF-PokaaL
  LV Write Access        read/write
  LV Creation host, time gluster2-replica.nyc.eedevs, 2016-09-10 08:35:36 +0000
  LV Pool name           brickpool4
  LV Status              available
  # open                 1
  LV Size                3.00 GiB
  Mapped size            0.36%
  Current LE             768
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192

  Block device           252:4



No comments:

Post a Comment

Have your say!