AKAI TSUKI

System development or Technical something

study pacemaker(resouce setting, failover)

cluster resource for pacemaker

There is a cluster by pacemaker/corosync. The cluster consist of 3 node and doesn't have any cluster resources.

Let's try to set dummy resource.

Before load crm file, initial status is below

[root@vm01 ~]# crm_mon -1fA
Stack: corosync
Current DC: vm03.localdomain (version 1.1.21-1.el7-f14e36f) - partition with quorum
Last updated: Wed Apr 29 17:10:15 2020
Last change: Wed Apr 29 16:58:02 2020 by hacluster via crmd on vm03.localdomain

3 nodes configured
0 resources configured

Online: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

No active resources


Node Attributes:
* Node vm01.localdomain:
* Node vm02.localdomain:
* Node vm03.localdomain:

Migration Summary:
* Node vm03.localdomain:
* Node vm02.localdomain:
* Node vm01.localdomain:
[root@vm01 ~]#

This is crm configuration file.

[root@vm01 ~]# cat dummy.crm
### Cluster Option ###
property stonith-enabled="false"

### Resource Defaults ###
rsc_defaults resource-stickiness="INFINITY" \
    migration-threshold="1"

### Group Configuration ###
group grp \
    resource1 \
    resource2

### Clone Configuration ###
clone clnResource \
    resource3

### Primitive Configuration ###
primitive resource1 ocf:heartbeat:Dummy \
    op start interval="0s" timeout="300s" on-fail="restart" \
    op monitor interval="10s" timeout="60s" on-fail="restart" \
    op stop interval="0s" timeout="300s" on-fail="block"

primitive resource2 ocf:heartbeat:Dummy \
    op start interval="0s" timeout="300s" on-fail="restart" \
    op monitor interval="10s" timeout="60s" on-fail="restart" \
    op stop interval="0s" timeout="300s" on-fail="block"

primitive resource3 ocf:heartbeat:Dummy \
    op start interval="0s" timeout="300s" on-fail="restart" \
    op monitor interval="10s" timeout="60s" on-fail="restart" \
    op stop interval="0s" timeout="300s" on-fail="block"

### Resource Location ###
location rsc_location-1 grp \
    rule 300: #uname eq vm01.localdomain \
    rule 200: #uname eq vm02.localdomain \
    rule 100: #uname eq vm03.localdomain

### Resource Colocation ###
colocation rsc_colocation-1 INFINITY: grp clnResource

### Resource Order ###
order rsc_order-1 0: clnResource grp symmetrical=false

[root@vm01 ~]#

Load crm configuration file

[root@vm01 ~]# crm configure load update dummy.crm
[root@vm01 ~]#

After load crm file, Let's check.

[root@vm01 ~]# crm_mon -1fA
Stack: corosync
Current DC: vm03.localdomain (version 1.1.21-1.el7-f14e36f) - partition with quorum
Last updated: Wed Apr 29 17:10:30 2020
Last change: Wed Apr 29 17:10:27 2020 by root via cibadmin on vm01.localdomain

3 nodes configured
5 resources configured

Online: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

Active resources:

 Resource Group: grp
     resource1  (ocf::heartbeat:Dummy): Started vm01.localdomain
     resource2  (ocf::heartbeat:Dummy): Started vm01.localdomain
 Clone Set: clnResource [resource3]
     Started: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

Node Attributes:
* Node vm01.localdomain:
* Node vm02.localdomain:
* Node vm03.localdomain:

Migration Summary:
* Node vm03.localdomain:
* Node vm02.localdomain:
* Node vm01.localdomain:
[root@vm01 ~]#

and confirm configuration by "crm configure show" command.

[root@vm01 ~]# crm configure show
node 1: vm01.localdomain
node 2: vm02.localdomain
node 3: vm03.localdomain
### Primitive Configuration ###
primitive resource1 Dummy \
        op start interval=0s timeout=300s on-fail=restart \
        op monitor interval=10s timeout=60s on-fail=restart \
        op stop interval=0s timeout=300s on-fail=block
primitive resource2 Dummy \
        op start interval=0s timeout=300s on-fail=restart \
        op monitor interval=10s timeout=60s on-fail=restart \
        op stop interval=0s timeout=300s on-fail=block
primitive resource3 Dummy \
        op start interval=0s timeout=300s on-fail=restart \
        op monitor interval=10s timeout=60s on-fail=restart \
        op stop interval=0s timeout=300s on-fail=block
### Group Configuration ###
group grp resource1 resource2
### Clone Configuration ###
clone clnResource resource3
### Resource Location ###
location rsc_location-1 grp \
        rule 300: #uname eq vm01.localdomain \
        rule 200: #uname eq vm02.localdomain \
        rule 100: #uname eq vm03.localdomain
### Resource Colocation ###
colocation rsc_colocation-1 inf: grp clnResource
### Resource Order ###
order rsc_order-1 0: clnResource grp symmetrical=false
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.21-1.el7-f14e36f \
        cluster-infrastructure=corosync \
        stonith-enabled=false
### Resource Defaults ###
rsc_defaults rsc-options: \
        resource-stickiness=INFINITY \
        migration-threshold=1
[root@vm01 ~]#

try failover

delete a file to occur failure of cluster resouce as followings.

[root@vm01 ~]# rm -f /var/run/resource-agents/Dummy-resource1.state
[root@vm01 ~]#

We can found failed action.

[root@vm01 ~]# crm_mon -f1A
Stack: corosync
Current DC: vm03.localdomain (version 1.1.21-1.el7-f14e36f) - partition with quorum
Last updated: Wed Apr 29 17:27:35 2020
Last change: Wed Apr 29 17:10:27 2020 by root via cibadmin on vm01.localdomain

3 nodes configured
5 resources configured

Online: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

Active resources:

 Resource Group: grp
     resource1  (ocf::heartbeat:Dummy): Started vm02.localdomain
     resource2  (ocf::heartbeat:Dummy): Started vm02.localdomain
 Clone Set: clnResource [resource3]
     Started: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

Node Attributes:
* Node vm01.localdomain:
* Node vm02.localdomain:
* Node vm03.localdomain:

Migration Summary:
* Node vm03.localdomain:
* Node vm02.localdomain:
* Node vm01.localdomain:
   resource1: migration-threshold=1 fail-count=1 last-failure='Wed Apr 29 17:27:19 2020'

Failed Resource Actions:
* resource1_monitor_10000 on vm01.localdomain 'not running' (7): call=18, status=complete, exitreason='No process state file found',
    last-rc-change='Wed Apr 29 17:27:19 2020', queued=0ms, exec=0ms
[root@vm01 ~]#

A point is "Failed Resource Actions".

Migration Summary:
* Node vm03.localdomain:
* Node vm02.localdomain:
* Node vm01.localdomain:
   resource1: migration-threshold=1 fail-count=1 last-failure='Wed Apr 29 17:27:19 2020'

Failed Resource Actions:
* resource1_monitor_10000 on vm01.localdomain 'not running' (7): call=18, status=complete, exitreason='No process state file found',
    last-rc-change='Wed Apr 29 17:27:19 2020', queued=0ms, exec=0ms
[root@vm01 ~]#

check fail-count for resource1 -> but value=0

[root@vm01 ~]# crm resource failcount resource1 show vm01.localdomain
scope=status  name=fail-count-resource1 value=0
[root@vm01 ~]#

try to cleanup -> OK.

[root@vm01 ~]# crm resource cleanup resource1 vm01.localdomain
Cleaned up resource1 on vm01.localdomain
.Cleaned up resource2 on vm01.localdomain
Waiting for 1 reply from the CRMd. OK
[root@vm01 ~]#

[root@vm01 ~]# crm_mon -f1A
Stack: corosync
Current DC: vm03.localdomain (version 1.1.21-1.el7-f14e36f) - partition with quorum
Last updated: Wed Apr 29 17:43:27 2020
Last change: Wed Apr 29 17:43:17 2020 by hacluster via crmd on vm01.localdomain

3 nodes configured
5 resources configured

Online: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

Active resources:

 Resource Group: grp
     resource1  (ocf::heartbeat:Dummy): Started vm01.localdomain
     resource2  (ocf::heartbeat:Dummy): Started vm01.localdomain
 Clone Set: clnResource [resource3]
     Started: [ vm01.localdomain vm02.localdomain vm03.localdomain ]

Node Attributes:
* Node vm01.localdomain:
* Node vm02.localdomain:
* Node vm03.localdomain:

Migration Summary:
* Node vm03.localdomain:
* Node vm02.localdomain:
* Node vm01.localdomain:
[root@vm01 ~]#

That's all.