pdoa startup and shutdown

  • basic hardware
  • Linux System Admin

Powering on

This assumes that the PDOA is in a totally power off state.

  • Power to the rack
  • 3 Phase supply
  • built in PDU (Power DIST UNIT) in the rack
  • Network switches will automatically start
  • Servers will get power, and green flashing light will start
  • Storage V7000
  • Start in this order
    • Controller (2 Supplies), it is the middle unit in the V7000
    • Expansion 1 (2 Supplies), take the top of the 3 V7000 units
    • Expansion 2 (2 Supplies), take the bottom of the 3 V7000 units
  • HMC Startup (both 1 & 2)
  • These are
  • Press & hold the Flashing Green light
  • The light will change to constant
  • If you want to view the boot up process then use gthe built in console and select the HMC (Press Control 2 times to select which HMC).
  • The HMC boot process is a RHEL boot process
  • Server Start
  • It is recommended to power on the Server using the HMC Control interface. However it is noted that the Server can be manually powered on - but this is not the recommended manner to start the server.
  • Log into the HMC
    • Left list .... select the Server
    • On the right hand window - the LPAR are displayed.
    • End if the server name small Button
    • Left Click
      • Operations
        • Power On
        • There are now some options
        • Choose Normal
        • As soon as your power on - you can see the reference codes
        • The reference codes get updates as the boot carries on
        • reference codes can be used to diagnose problems
    • Select next server
    • repeat the same steps
    • This can happen in parallel with the 1st server being started

At this point

We should have

  • Network
  • Storage
  • 2 HMC's
  • Physical Frames
  • Logical Frames and NOT STARTED YET

We should be able to ping the IP addresses at this point. So if there is a connectivity problem now - we should address this before carrying.

Servers are ON - BUT NO OS Booted

The "Servers" need to be powered on - but this does not mean that the OS has been started. This is VERY different say than VM tye servers.

We can see the state of the server by looking at the HMC screen selecting a Server - we want to see the server saying

Server Status Ref Code Status
Standby Standby Phyical frame has started. But Logical Frame has not been activated

LPAR Power on Sequence

  • System Node
  • look for sysnode
  • Select Node
    • operations
    • activate
      • profile
      • select the profile you want (At the moment there is just 1 profile)
      • select "open terminal window" (So we can see the boot process)
      • Click ok
      • Reference code now gets updated, plus the output is shown in the teminal

When Booted ?

The machine is booted when the login prompt is shown.

Boot Order

The machines need to be started in the following order.

Order Node Name Parallel Allowed Type host
sysnode sysnode NO admin aurora101
adminnode_2 adminnode_2 NO management aurora102
standbynode_3 standbynode_3 YES admin aurora103
standbynode_4 standbynode_4 YES management aurora104

Status

At this point we have

  • network
  • storage
  • physical frame started
  • logical frames started
  • DB2 is NOT started

V7000 Notes

The V7000 does not show errors upon hardware power on. The errors are only displayed when the controller has fully booted and the power on health checks have been completed. This took aprox 3-5 mins. And the drive with the error has clearly been displayed.

HMC Machine

Username: hscroot Password: abc1234

PDOA Testing

ssh to server with

ssh root@172.23.1.1
password: passw0rd

Distributed Command

Very nice command 0 similar to parallel in Linux

For example look at the date on all the machines in the cluster

dsh -n $ALL date
aurora101: Tue Apr 19 11:02:00 GST 2016
aurora103: Tue Apr 19 11:02:01 GST 2016
aurora102: Tue Apr 19 11:02:01 GST 2016
aurora104: Tue Apr 19 11:02:01 GST 2016

(0) root @ aurora101: 7.1.0.0: /

AIX for idiots

Simple command line option to make AIX like vi editor

set -o vi

GPFS Service Check

These are the command to check GPFS

dsh -n ${BCUALL},${BCUMGMT},${BCUMGMTSTDBY} "/usr/lpp/mmfs/bin/mmgetstate"

THe output is

aurora101: 
aurora101:  Node number  Node name        GPFS state 
aurora101: ------------------------------------------
aurora101:        1      aurora101        active
aurora103: 
aurora103:  Node number  Node name        GPFS state 
aurora103: ------------------------------------------
aurora103:        2      aurora103        active
aurora104: 
aurora104:  Node number  Node name        GPFS state 
aurora104: ------------------------------------------
aurora104:        4      aurora104        active
aurora102: 
aurora102:  Node number  Node name        GPFS state 
aurora102: ------------------------------------------
aurora102:        2      aurora102        active

Or

dsh -n ${BCUALL},${BCUMGMT},${BCUMGMTSTDBY} "/usr/lpp/mmfs/bin/mmgetstate" | dshbak -c

HOSTS -------------------------------------------------------------------------
aurora101
-------------------------------------------------------------------------------

 Node number  Node name        GPFS state 
------------------------------------------
       1      aurora101        active

HOSTS -------------------------------------------------------------------------
aurora102
-------------------------------------------------------------------------------

 Node number  Node name        GPFS state 
------------------------------------------
       2      aurora102        active

HOSTS -------------------------------------------------------------------------
aurora103
-------------------------------------------------------------------------------

 Node number  Node name        GPFS state 
------------------------------------------
       2      aurora103        active

HOSTS -------------------------------------------------------------------------
aurora104
-------------------------------------------------------------------------------

 Node number  Node name        GPFS state 
------------------------------------------
       4      aurora104        active

As they are all active this shows GPFS is all active and running.

To Mount GPFS on 4 nodes (this 1, plus 3 others)

dsh -n ${BCUALL},${BCUMGMT},${BCUMGMTSTDBY} "/usr/lpp/mmfs/bin/mmmount all"
aurora104: Tue Apr 19 11:14:23 GST 2016: 6027-1623 mmmount: Mounting file systems ...
aurora102: Tue Apr 19 11:14:23 GST 2016: 6027-1623 mmmount: Mounting file systems ...
aurora103: Tue Apr 19 11:14:23 GST 2016: 6027-1623 mmmount: Mounting file systems ...
aurora101: Tue Apr 19 11:14:22 GST 2016: 6027-1623 mmmount: Mounting file systems ...

Again this shows that GPFS has mounted successfully on all 4 nodes

Check the number of GPFS mounts on all ther servers

The correct Mounts numbers should be 5 (Managment and Management Standby) and 23 (on admin and adminstandby).

dsh -n ${BCUALL},${BCUMGMT},${BCUMGMTSTDBY} "mount | grep  -c mmfs" | dshbak -c
HOSTS -------------------------------------------------------------------------
aurora101, aurora103
-------------------------------------------------------------------------------
5

HOSTS -------------------------------------------------------------------------
aurora102, aurora104
-------------------------------------------------------------------------------
23

Mount all the Flash Drives

dsh -n ${BCUALL} 'for i in 0 1 2 3 4 5 6 7; do mount -t db2ssd$i; done'

This outputs nothing as mount only shows errors.

To View the mount state of the SSD Drives

$ dsh -n $BCUALL "df -g | grep ssd"
aurora104: /dev/lvssd0      135.38      2.85   98%        8     1% /db2ssd/bcuaix/ssd0
aurora104: /dev/lvssd1      135.38      2.85   98%        8     1% /db2ssd/bcuaix/ssd1
aurora104: /dev/lvssd2      135.38      2.85   98%        8     1% /db2ssd/bcuaix/ssd2
aurora104: /dev/lvssd3      135.38      2.85   98%        8     1% /db2ssd/bcuaix/ssd3
aurora104: /dev/lvssd4      135.38      2.85   98%        8     1% /db2ssd/bcuaix/ssd4
aurora104: /dev/lvssd5      135.38    134.85    1%        6     1% /db2ssd/bcuaix/ssd5
aurora104: /dev/lvssd6      135.38    134.85    1%        6     1% /db2ssd/bcuaix/ssd6
aurora104: /dev/lvssd7      135.38    134.85    1%        6     1% /db2ssd/bcuaix/ssd7
aurora102: /dev/lvssd0      135.38      2.85   98%        9     1% /db2ssd/bcuaix/ssd0
aurora102: /dev/lvssd1      135.38      2.85   98%        9     1% /db2ssd/bcuaix/ssd1
aurora102: /dev/lvssd2      135.38      2.85   98%        9     1% /db2ssd/bcuaix/ssd2
aurora102: /dev/lvssd3      135.38      2.85   98%        9     1% /db2ssd/bcuaix/ssd3
aurora102: /dev/lvssd4      135.38      2.85   98%        9     1% /db2ssd/bcuaix/ssd4
aurora102: /dev/lvssd5      135.38    134.85    1%        6     1% /db2ssd/bcuaix/ssd5
aurora102: /dev/lvssd6      135.38    134.85    1%        6     1% /db2ssd/bcuaix/ssd6
aurora102: /dev/lvssd7      135.38    134.85    1%        6     1% /db2ssd/bcuaix/ssd7

Starting Services

We now need to start some Special Services - which are connected to maintain high availability functionalty for this cluster,

TSA (Tivoli System Automation)

The need to be done on the sysnode and standby3

We need to start TSA, this command is run on the Management Node (aurora102 or aurora104)

startrpdomain mgmtdomain

There is no output - so we need to use another command to check the TSA Service.

Check TSA

lsrpdomain
Name       OpState RSCTActiveVersion MixedVersions TSPort GSPort 
mgmtdomain Online  3.2.0.4           No            12347  12348  

As we can see Online this service is alive and working, else we would have seen Offline

System Console to Start

This command takes 5 mins or so to execute.

mistart

To check the Web Console status use the command

mistatus

typical output look like this - note the mimon takes a little longer to start.

$ mistart && mistatus
CDTFS000065I The system console was started successfully.
CDTFS000060I The following modules have been started:
isas.server
isas.trap
isas.async
isas.console.system

CDTFS000062I The following modules are in the unknown state:
isas.mimon

Waiting 5 mins and then

$ mistatus
CDTFS000063I The system console is started.

Which indicates all is well.

To Stop this service

mistop

Db2 Core Warehouse

The Db2 Core warehouse - needs to be started on admin and standby_4

Check the Status of the Core Warehouse

lsrpdomain
Name        OpState RSCTActiveVersion MixedVersions TSPort GSPort 
bcudomain01 Offline 3.2.0.4           No            12347  12348  

This is clearly Offline.

Starting the Core Warehouse

startrpdomain bcudomain01

There is no output - so we need to repeat the status command

lsrpdomain
Name        OpState RSCTActiveVersion MixedVersions TSPort GSPort 
bcudomain01 Online  3.2.0.4           No            12347  12348  

We can see it is now Online.

Check the Mount points in the Db

lsrg -m
Displaying Member Resource information:
Class:Resource:Node[ManagedResource]                  Mandatory MemberOf                          OpState WinSource Location 
IBM.Application:db2_bcuaix_0_1_2_3_4-rs               True      db2_bcuaix_0_1_2_3_4-rg           Offline                    
IBM.ServiceIP:db2ip_172_23_1_42-rs                    True      db2_bcuaix_0_1_2_3_4-rg           Offline                    
IBM.Application:db2mnt_bkpfs_bcuaix_NODE0004-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2mlog_bcuaix_NODE0004-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2path_bcuaix_NODE0004-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2fs_bcuaix_NODE0004-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_bkpfs_bcuaix_NODE0003-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2mlog_bcuaix_NODE0003-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2path_bcuaix_NODE0003-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2fs_bcuaix_NODE0003-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_bkpfs_bcuaix_NODE0002-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2mlog_bcuaix_NODE0002-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2path_bcuaix_NODE0002-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2fs_bcuaix_NODE0002-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_bkpfs_bcuaix_NODE0001-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2mlog_bcuaix_NODE0001-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2path_bcuaix_NODE0001-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2fs_bcuaix_NODE0001-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_bkpfs_bcuaix_NODE0000-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2mlog_bcuaix_NODE0000-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2path_bcuaix_NODE0000-rs     True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2fs_bcuaix_NODE0000-rs       True      db2mnt_bcuaix_0_1_2_3_4-rg        Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd7_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd6_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd5_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd4_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd3_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd2_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd1_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2ssd_bcuaix_ssd0_hagroup1-rs True      db2mnt_db2ssd_bcuaix_hagroup1-rg  Online                     
IBM.Application:db2mnt_db2home_hagroup1-rs            True      db2mnt_db2home_hagroup1-rg        Online   

Access via Web Interface

This can only be done when the mistart has been run.

the web address is Management Host , the username is admin, password also admin.

Another view

$ lssam

When view on AIX terminal there are Green (ok) - BLue (not started) and Red (Oh Dear - there is an issue).

Offline IBM.ResourceGroup:db2_bcuaix_0_1_2_3_4-rg Nominal=Offline
        |- Offline IBM.Application:db2_bcuaix_0_1_2_3_4-rs
                |- Offline IBM.Application:db2_bcuaix_0_1_2_3_4-rs:aurora102
                '- Offline IBM.Application:db2_bcuaix_0_1_2_3_4-rs:aurora104
        '- Offline IBM.ServiceIP:db2ip_172_23_1_42-rs
                |- Offline IBM.ServiceIP:db2ip_172_23_1_42-rs:aurora102
                '- Offline IBM.ServiceIP:db2ip_172_23_1_42-rs:aurora104
Online IBM.ResourceGroup:db2mnt_bcuaix_0_1_2_3_4-rg Nominal=Online
        |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0000-rs
                |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0000-rs:aurora102
                '- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0000-rs:aurora104
        |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0001-rs
                |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0001-rs:aurora102
                '- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0001-rs:aurora104
        |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0002-rs
                |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0002-rs:aurora102
                '- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0002-rs:aurora104
        |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0003-rs
                |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0003-rs:aurora102
                '- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0003-rs:aurora104
        |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0004-rs
                |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0004-rs:aurora102
                '- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0004-rs:aurora104
        |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0000-rs
                |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0000-rs:aurora102
                '- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0000-rs:aurora104
        |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0001-rs
                |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0001-rs:aurora102
                '- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0001-rs:aurora104
        |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0002-rs
                |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0002-rs:aurora102
                '- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0002-rs:aurora104
        |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0003-rs
                |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0003-rs:aurora102
                '- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0003-rs:aurora104
        |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0004-rs
                |- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0004-rs:aurora102
                '- Online IBM.Application:db2mnt_db2fs_bcuaix_NODE0004-rs:aurora104
        |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0000-rs
                |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0000-rs:aurora102
                '- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0000-rs:aurora104
        |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0001-rs
                |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0001-rs:aurora102
                '- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0001-rs:aurora104
        |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0002-rs
                |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0002-rs:aurora102
                '- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0002-rs:aurora104
        |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0003-rs
                |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0003-rs:aurora102
                '- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0003-rs:aurora104
        |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0004-rs
                |- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0004-rs:aurora102
                '- Online IBM.Application:db2mnt_db2mlog_bcuaix_NODE0004-rs:aurora104
        |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0000-rs
                |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0000-rs:aurora102
                '- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0000-rs:aurora104
        |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0001-rs
                |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0001-rs:aurora102
                '- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0001-rs:aurora104
        |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0002-rs
                |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0002-rs:aurora102
                '- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0002-rs:aurora104
        |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0003-rs
                |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0003-rs:aurora102
                '- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0003-rs:aurora104
        '- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0004-rs
                |- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0004-rs:aurora102
                '- Online IBM.Application:db2mnt_db2path_bcuaix_NODE0004-rs:aurora104
Online IBM.ResourceGroup:db2mnt_db2home_hagroup1-rg Nominal=Online
        '- Online IBM.Application:db2mnt_db2home_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2home_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2home_hagroup1-rs:aurora104
Online IBM.ResourceGroup:db2mnt_db2ssd_bcuaix_hagroup1-rg Nominal=Online
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd0_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd0_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd0_hagroup1-rs:aurora104
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd1_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd1_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd1_hagroup1-rs:aurora104
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd2_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd2_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd2_hagroup1-rs:aurora104
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd3_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd3_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd3_hagroup1-rs:aurora104
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd4_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd4_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd4_hagroup1-rs:aurora104
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd5_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd5_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd5_hagroup1-rs:aurora104
        |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd6_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd6_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd6_hagroup1-rs:aurora104
        '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd7_hagroup1-rs
                |- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd7_hagroup1-rs:aurora102
                '- Online IBM.Application:db2mnt_db2ssd_bcuaix_ssd7_hagroup1-rs:aurora104
Online IBM.Equivalency:db2_FCM_network
        |- Online IBM.NetworkInterface:en11:aurora104
        '- Online IBM.NetworkInterface:en11:aurora102
Online IBM.Equivalency:db2_bcuaix_0_1_2_3_4-rg_group-equ
        |- Online IBM.PeerNode:aurora104:aurora104
        '- Online IBM.PeerNode:aurora102:aurora102

Start Db2

Wow - it has taken a long time to get here - but we now can finally start Db2 !!!

This can be started on either the management node or the admin node

hastartdb2

Output looks like

Starting DB2.....DB2 resources online
Activating DB EIADB
DB20000I  The ACTIVATE DATABASE command completed successfully.
CORE DOMAIN
+============+===========+===========+=============+=================+=================+=============+
| PARTITIONS | CURRENT   | STANDBY   | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+===========+===========+=============+=================+=================+=============+
| 0-4        | aurora104 | aurora102 | bcudomain01 | Online          | Normal          | -           |
+============+===========+===========+=============+=================+=================+=============+

Db2 remembers which server was last active and will try and make that server active again when the cluster is restarted.

Recheck with lssam

Output of lssam now with Db2 Start is

$ lssam
Online IBM.ResourceGroup:db2_bcuaix_0_1_2_3_4-rg Nominal=Online
        |- Online IBM.Application:db2_bcuaix_0_1_2_3_4-rs
                |- Offline IBM.Application:db2_bcuaix_0_1_2_3_4-rs:aurora102
                '- Online IBM.Application:db2_bcuaix_0_1_2_3_4-rs:aurora104
        '- Online IBM.ServiceIP:db2ip_172_23_1_42-rs
                |- Offline IBM.ServiceIP:db2ip_172_23_1_42-rs:aurora102
                '- Online IBM.ServiceIP:db2ip_172_23_1_42-rs:aurora104
Online IBM.ResourceGroup:db2mnt_bcuaix_0_1_2_3_4-rg Nominal=Online
        |- Online IBM.Application:db2mnt_bkpfs_bcuaix_NODE0000-rs

Note: The are Online and Offline parts - as the Database is online using admin_standby and standby_4 - the same data was show in the output of hastartdb2

High Availablilty LS - HALS

$ hals
CORE DOMAIN
+============+===========+===========+=============+=================+=================+=============+
| PARTITIONS | CURRENT   | STANDBY   | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+===========+===========+=============+=================+=================+=============+
| 0-4        | aurora104 | aurora102 | bcudomain01 | Online          | Normal          | -           |
+============+===========+===========+=============+=================+=================+=============+

Starting the Service on the Management Node

We need to switch to the management node - and issue the command

hastartapp

Output looks like

$ hastartapp
Starting APP and APP instance.........................APP resources online
MANAGEMENT DOMAIN
+============+===========+===========+===========+=================+=================+=============+
| COMPONENT  | PRIMARY   | STANDBY   | CURRENT   | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+===========+===========+===========+=================+=================+=============+
| WASAPP     | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
| DB2APP     | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
| DPM        | aurora101 | N/A       | N/A       | Offline         | Offline         | -           |
| DB2DPM     | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
+============+===========+===========+===========+=================+=================+=============+

note DPM has not started - that is because we are about to start it with the next command !!

DPM Start

hastartdpm
Starting DPM and DB2 instance............................Resources online
MANAGEMENT DOMAIN
+============+===========+===========+===========+=================+=================+=============+
| COMPONENT  | PRIMARY   | STANDBY   | CURRENT   | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+===========+===========+===========+=================+=================+=============+
| WASAPP     | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
| DB2APP     | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
| DPM        | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
| DB2DPM     | aurora101 | aurora103 | aurora101 | Online          | Normal          | -           |
+============+===========+===========+===========+=================+=================+=============+

Manual Failover

incase 1 LPar fails - then the system will automatically fail-over to the backup server set.

To test this - we need to force a Failover....

Before we start lets have a look at the cluster

hals
CORE DOMAIN
+============+===========+===========+=============+=================+=================+=============+
| PARTITIONS | CURRENT   | STANDBY   | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+===========+===========+=============+=================+=================+=============+
| 0-4        | aurora104 | aurora102 | bcudomain01 | Online          | Normal          | -           |
+============+===========+===========+=============+=================+=================+=============+

Here I can see that aurora104 is the active machine.

We want to force a failover to the standby machine so we issue a command like

$ hafailover aurora104
Moving resources from aurora104 to aurora102
..............Done
CORE DOMAIN
+============+===========+===========+=============+=================+=================+=============+
| PARTITIONS | CURRENT   | STANDBY   | DOMAIN      | OPSTATE         | HA STATUS       | RG REQUESTS |
+============+===========+===========+=============+=================+=================+=============+
| 0-4        | aurora102 | aurora104 | bcudomain01 | Online          | Normal          | -           |
+============+===========+===========+=============+=================+=================+=============+

This took less than 2 minutes !!

You can configm this is the lssam command.

Shutting Down

The overall process for shutting down the cluster is as follows, This is only ho

  • DB2 stopped (both machines)
  • Aurora1 & 2 Shutdown
  • Svr1 & 2 Shutdown
  • HMC1 shutdown
  • HMC2 Shutdown
  • V7000 Storage powered off

    • Manually powered off at the switch at the back end of the machine
    • Expansion powered off
    • Controller powered off

    Note: The storage is arranged as Expansion Controller Expansion

There are dual power supplies for both

Stopping the PDOA

Basically the reverse of the start

On Management Node:

hastopdpm
hastopapp

On Admin Node:

hastopdb2
stoprpdomain -f bcudomain01

Check rpdomain has stopped

lsrpdomain

On Management Node

stoprpdomain -f mgmtdomain

Check mgmtdomain has stopped with

lsrpdomain

On Management Node

mistop

Now to shutdown the Nodes we can use HMC's, or command line - I prefer command line so

dsh -n $ALL shutdown -Fh

At this point the OS are shutdown - but the Physical Partitions are still active. We can shut down the Frame - either being using the HMC console or by using the HMC Web interface (needs a configured IP Address - on the coorporate network).

Then on the HMC Web interface. Select the Server and Operation -> Shutdown.

Stopping Just DB2

We needed to update the transaction log file and instead of restarting the whole system we stopped and restarted DB2 using the following steps.

On Management Node:

hastopdpm
hastopapp

On Admin Node:

hastopdb2

WARNING this terminates the ipaddress in 172.23.109.x interface!! You need to go to the Computer room.

DB2 Info

Account which owns the db2 instance

bcuaix, is the priv account name.

$ su - bcuaix
$ pwd
/db2home/bcuaix
$ ls -l  
total 32
drwxr-xr-x    3 bcuaix   bcuigrp        8192 Jul 17 2015  ha_setup
drwxrwsr-t   24 bcuaix   bcuigrp        8192 Apr 18 10:02 sqllib
$ db2 list db directory

Output yields

 System Database Directory

 Number of entries in the directory = 1

Database 1 entry:

 Database alias                       = EIADB
 Database name                        = EIADB
 Local database directory             = /db2path
 Database release level               = f.00
 Comment                              =
 Directory entry type                 = Indirect
 Catalog database partition number    = 0
 Alternate server hostname            =
 Alternate server port number         =

Server Power Status

The servers have a flashing Green light - when flashing this indicates that there is not booted . When it is a constant light then this indicates that tghe server is booted/booting.