Dec 26, 2017

New approach for data migration using Redirected Recovery


As mentioned in many of my previous blog posts, we use Redirected Recovery of Recovery Expert for migrating data from one Db2 sub-system to other sub-system. 
We are facing some hurdles whenever we do data migration using Redirected Recovery. Some of the major hurdles are

  •    TAPE handling: All our Image copy datasets are written to TAPE with STACK YES option. Currently, Redirected Recovery is not able to generate the Recovery JCLs appropriately when the Image Copy datasets are on TAPE. I created RFE 114350 for the product to do upfront analysis on the objects (included for data migration) to ensure all the objects on the same TAPE are included in the same Recovery job.
  •    Data verification after migration: There is no simple approach for verifying that the data is copied properly from source database to target database.


 I devised a new approach to overcome the hurdles. The new approach is lengthy in execution but makes the data migration process less painful.
The process will be executed in the below mentioned sequence of steps, whenever a data migration is requested.

1.   Create DISK backups
As Redirected Recovery is unable to handle the TAPE image copies properly (for now), we decided to create DISK backups. 
We may go with DISK backups even after TAPE related problems are resolved as data migration will be faster with DISK backups.

IBM DB2 Automation Tool for z/OS doesn’t have the capability (as of now) to generate COPYTOCOPY…TOCOPY JCLs (I created RFE 114086 for including this capability into the Tool) So, I coded a REXX routine that will build COPYTOCOPY JCLs for us. The REXX routine will build JCLs to create LB copies if there are no LB copies already for the source database, and will build JCLs to create RP copies if there are LB copies on TAPE already for the source database.

2.   Execute Checklist
Checklist is for ensuring all the Tablespaces included in the data migration have image copies. More details about the Checklist can be found in this blog post.

3.   Execute Object structure comparison
We usually have object structure mismatch between source database and target database. So, we execute object structure comparison to find out the objects (Tablespaces, Tables & Indexes) that have different structures. Such objects will be excluded from the data migration process. 
More details about the Checklist can be found in this blog post.

4.   Create Application profiles
Application profiles with the objects needed for data migration are created in the Recovery Expert panels (objects with structure mismatches will be excluded).

5.   Build Recovery JCLs
A JCL with Application profile details is submitted, which will build Recovery JCLs for us.

6.   Execute Recovery JCLs
Recovery JCLs built in above step are executed. These JCLs use the DISK backups (created in step 1) as source. 
Even after excluding the objects with structure mismatches, there are chances of data migration jobs failing due to version mismatches. REPAIR CATALOG utility will be executed on the objects failed due to version mismatches.

7.   Verify data in target database
This is a major concern for us as it’s really difficult to verify data in target database after data migration is complete to make sure all the data is migrated from source database to target database. After discussion with Recovery Expert developers and IBM tech expert Robert Catterall, I came up with the following ideas for post-migration verification.
  • Comparing source’s COPYPAGESF (from SYSCOPY) with target’s NACTIVE (from SYSTABLESPACESTATS)
          I coded a SQL query that can compare source database with target database. 
          But, the results are not 100% accurate as COPY utility will not copy empty pages.

WITH
   TSSTAT(NAME, NACTIVE, TYPE) AS
     (SELECT 
         ST.NAME
        ,SUM(ST.NACTIVE)
        ,TS.TYPE
      FROM SYSIBM.SYSTABLESPACESTATS ST
             INNER JOIN
           SYSIBM.SYSTABLESPACE TS ON
               ST.DBNAME = TS.DBNAME AND
               ST.NAME   = TS.NAME 
      WHERE 
         ST.DBNAME = <target db> AND
         TS.NTABLES > 0
      GROUP BY ST.NAME, TS.TYPE)
  ,TSCOPY(NAME, COPYPAGES) AS
     (SELECT 
         TSNAME
        ,SUM(COPYPAGESF) 
      FROM SYSIBM.SYSCOPY
      WHERE 
         DBNAME = <source db> AND
         ICDATE = 171226   AND
         ICTYPE = 'F' AND
         ICBACKUP <> 'LB'
      GROUP BY TSNAME)
SELECT 
   TSSTAT.NAME
  ,TSSTAT.TYPE
  ,TSCOPY.COPYPAGES AS SRC_ICPAGES
  ,TSSTAT.NACTIVE AS TGT_TSPAGES
FROM 
    TSSTAT
     INNER JOIN
    TSCOPY ON 
       TSSTAT.NAME=TSCOPY.NAME
WHERE TSSTAT.NACTIVE <> TSCOPY.COPYPAGES;

  • Comparing source’s CARDF (from SYSIBM.SYSTABLES_HIST) with target’s CARDF (from SYSIBM.SYSTABLES after RUNSTATS are done)
          This requires RUNSTATS to be executed on all databases everyday to update stats in History Tables.

  • Reading source’s SYSTABLESPACESTATS data into a personal Table on daily basis, and comparing the data in personal Table with target’s real-time stats data (when needed)

  • Take backups of target after data migration is done, and compare source’s COPYPAGESF (from SYSCOPY) with target’s COPYPAGESF (from SYSCOPY) 
We decided to go with the third idea as that is easy to implement.

8.   Execute RUNSTATS
We just execute regular RUNSTATS JCLs

9.   Create Image copies for target database
We just execute regular image copies (LP)

10. Delete DISK backups
Once verification is done and application team is happy with the data in target database, DISK backups created in step 1 will be deleted.



Jun 29, 2017

Caché License Key monitoring


One of our Caché database servers stopped working last week. Because, the License key got expired. There was an oversight in checking the License key expiration date.

To avoid any such issues in future, I coded a shell script that runs everyday on Caché database servers and does the following:
       1. Compare License key expiration date with future date (current date + 1 week)
       2. Trigger an email to DBA if the License key expiration date is less than the future date

I'm sure that the Date handling (in the script) could be done in a better way. Still working on improving my scripting skills.
The script as below.

#!/bin/ksh
#coder: Bharath Nunepalli

LICKEY_EXP=/home//lickey_exp
LICKEY_EXP1=/home/lickey_exp1
LICKEY_EXP2=/home/lickey_exp2
LICKEY_EXP3=/home/lickey_exp3
LICKEY_EXP_DATE=/home/lickey_date

timezone=$(date +%Z)
NEXT_WEEK=$(TZ=$timezone-168 date +"%Y%m%d")

csession cache "^CKEY" | grep -i ExpirationDate > $LICKEY_EXP
 cat $LICKEY_EXP | read LINE
  LIC_EXP_DT=$(echo $LINE | awk '{print $3}')

echo $LIC_EXP_DT|awk -F'/' '{ print $1 }' > $LICKEY_EXP1
 cat $LICKEY_EXP1 | read LINE
  LIC_EXP_DT1=$(echo $LINE | awk '{print $1}')
   if (( $LIC_EXP_DT1 < 10 )) then
     LIC_EXP_MON=$(echo "0"$LIC_EXP_DT1)
   fi

echo $LIC_EXP_DT|awk -F'/' '{ print $2 }' > $LICKEY_EXP2
 cat $LICKEY_EXP2 | read LINE
  LIC_EXP_DT2=$(echo $LINE | awk '{print $1}')
   if (( $LIC_EXP_DT2 < 10 )) then
     LIC_EXP_DAY=$(echo "0"$LIC_EXP_DT2)
   fi

echo $LIC_EXP_DT|awk -F'/' '{ print $3 }' > $LICKEY_EXP3
 cat $LICKEY_EXP3 | read LINE
  LIC_EXP_DT3=$(echo $LINE | awk '{print $1}')

echo $LIC_EXP_DT3$LIC_EXP_MON$LIC_EXP_DAY > $LICKEY_EXP_DATE
 cat $LICKEY_EXP_DATE | read LINE
  LIC_EXP_DATE=$(echo $LINE | awk '{print $1}')
  if (( $LIC_EXP_DATE < $NEXT_WEEK )) then
    mail -s "LICENSE KEY ABOUT TO EXPIRE ON $HOSTNAME "  <email ID>
  fi
  else exit
exit

The above script can be coded as below.

#!/bin/ksh
#Coder: Bharath Nunepalli

DB_STATUS=/home/db_status
ccontrol list | grep -i running > $DB_STATUS
if [[ -s $DB_STATUS ]] then
  #collect expiration date from cache
  EXPR_DATE=$(csession cache "^CKEY"|grep -i ExpirationDate|awk '{print $3}'|awk -F'/' '{printf("%d%02d%02d\n",$3,$2,$1)}')
  #get future date
  NEXT_WEEK=$(TZ="$(date +%Z)-168" date "+%Y%m%d")
  # compare date
  if [[ $EXPR_DATE -lt $NEXT_WEEK ]] then
    echo mail -s "LICENSE KEY ABOUT TO EXPIRE ON $HOSTNAME "  <email ID>
    exit 0
   else exit 0
  fi
 else exit 0
fi



Jun 23, 2017

Dropped-object recovery using Recovery Expert for z/OS


Dropped-object recovery is one such activity that no DBA would ever want to happen. Unfortunately, my team faced a dreadful situation recently of recovering 2 dropped databases (total 277 Tablespaces).
We used Recovery Expert for z/OS tool for this activity. This Tool has its own repository/database called Schema-level repository (SLR), holding replica of all DB2 catalog Tables (with some additional columns). Smart thing we are doing is updating the SLR on daily basis (Data from all DB2 catalog Tables will get copied into the SLR), and this helped us a lot in recovering the dropped databases.

There are 2 approaches to perform dropped-object recovery using Recovery Expert for z/OS.
                     1. Log-based recovery
                     2. Standard recovery

1. Log-based recovery: This approach can be used when the SLR is not getting updated on regular basis. 
This approach involves these steps.
a. Select Log based as the Recovery Type in Create Application Profile panel.
b. Select the time window during which the objects got dropped. 
    This will submit a Log analysis job, that searches in logs for the dropped objects.
         
               
c. Once the Log analysis job is completed, dropped objects can be selected into the Application Profile from the LBDR Scanned Log Range created by Log analysis job.
d. Recovery PIT timestamp will be automatically populated by the Tool. Using that, Recovery Plans will be generated.
e. Appropriate Recovery Plan should be selected for building the Recovery JCLs.
            Recovery JCLs have job steps for
            i.   Re-creation of dropped objects (DDLs execution)
            ii.  Recovery
            iii. REBUILD INDEX

2. Standard recovery: This approach can be used when the SLR is getting updated on regular basis.  
This approach involves these steps.
a. Select Standard as the Recovery Type in Create Application Profile panel.
b. Create an Application Profile to add the dropped objects to that (Tool will read the objects' details from SLR as no entries will be present in DB2 catalog Tables).
c. Update the Recovery options, select PIT timestamp and generate the Recovery Plans.
d. The, build the Recovery JCLs.
            Recovery JCLs have job steps for
            i.   Re-creation of dropped objects (DDLs execution)
            ii.  Recovery
            iii. REBUILD INDEX


So, updating SLR regularly makes the dropped-object recovery activity easy.

May 8, 2017

IBM DB2 Automation Tool for z/OS for TS REORG


We were going with basic REORG approach for all our Prod Tablespaces; REORG all Tablespaces every month. 
We recently changed the basic approach to more sophisticated approach using DB2 Automation Tool for z/OS.

As you can see in the below screenshot, DB2 Automation Tool can be used for setting up various DB2 utility JCLs (besides REORG).




For details about how to setup REORG utility JCLs using the Tool, refer this link for my presentation (published in IDUG 2017).


Apr 27, 2017

Checklist for data migration using Redirected Recovery


In many of my previous blogs I mentioned that we (DBA team) use Redirected Recovery of Recovery Expert for z/OS v3.2 for data migrations from one DB2 sub-system to another DB2 sub-system. The data migration (using Redirected Recovery) is done in 3 phases.



      Phase 1: Create Profiles and update Recovery options in Recovery Expert
This is pretty straight-forward process. We add all the Tablespaces in a database needed for data migration to an Application profile. Usually, nothing can go wrong in this phase.


      Phase 2: Execute Build JCL (that builds/generates the data migration JCLs)
In the Build JCL, we provide the following details to build/generate the data migration JCLs.
       a)      Application profile name (having the objects to be migrated)
       b)     Timestamp for PIT recovery 
             (Redirected Recovery can do only PIT recovery to another sub-system)
       c)      PDS name (data migration JCLs will be created in this PDS)
       d)     Plan number for the Recovery (its 29 for Redirected Recovery)

This job can fail for many reasons and the frustrating part is that it’s not easy to find out the reason for job failure. Most of the time we see just the ARYZ018E error message, with list of Objects that can’t be recovered. And, reason why the Objects are recoverable will be not be provided in the job output.

After discussions with Rocket Software tool support team, got the checklist to go through for Build job to be successful. The checklist provides the details of Objects that can lead to failure of Build job. So, these Objects will be excluded from data migration process. With the help of JCL and REXX, I automated the checklist. The JCL will have multiple steps to verify the below.

1. Check whether all TS’ in the Source database has ICTYPE=’F’ entries in SYSCOPY for a given ICDATE. I use this SQL query for verification.
SELECT
   DBNAME
  ,NAME
FROM SYSIBM.SYSTABLESPACE
WHERE
   DBNAME=<dbname> AND
   NAME NOT IN
              (SELECT
                 TSNAME
               FROM SYSIBM.SYSCOPY
               WHERE
                  DBNAME=<dbname> AND
                  ICDATE=<date>)
WITH UR;

2. Check whether all Image copy datasets (of the TS’ got from the above SQL query) are present in ICF catalog. I coded the below REXX routine for this verification.
LIST_PARA:                                                                     
 SAY '*****SYSCOPY ENTRIES NOT IN ICF CATALOG*****'                            
 DO I = 1 TO INP.0                                                              
   IC_DSN = STRIP(INP.I,"B")                                                   
   X = OUTTRAP(LSTC.)                                                          
    ADDRESS TSO "LISTCAT ENT('"IC_DSN"') "                                     
   X=OUTTRAP('OFF')                                                            
   Y = STRIP(WORD(LSTC.1,1),"B")                                               
   IF Y <> 'NONVSAM'                                                            
     THEN SAY IC_DSN NOT FOUND IN ICF CATALOG                                  
 END                                                                           
 SAY '********************************************'                             
RETURN;

3. Check whether any Objects (of TS’ included in the Application profile) are altered.
The CURRENT_VERSION of the TS can change for many reasons. Redirected Recovery can’t migrate data from versioned TS to non-versioned TS. I coded Objects’ structures comparison process to verify this.
Versioning can also be found using the below query.
SELECT
   TS.DBNAME
  ,TS.NAME
  ,TB.NAME
  ,TS.OLDEST_VERSION
  ,TS.CURRENT_VERSION
  ,TB.VERSION
FROM SYSIBM.SYSTABLESPACE TS
     INNER JOIN
SYSIBM.SYSTABLES TB ON
   TS.DBNAME = TB.CREATOR AND
   TS.DBNAME = TB.DBNAME  AND
   TS.NAME = TB.TSNAME
WHERE
   TS.DBNAME='dbname'  AND
   TB.TYPE = 'T'       AND
   TS.CURRENT_VERSION <> TB.VERSION;

4. PIT timestamp used in Build job must be greater than the highest START_RBA for a given DBNAME and ICDATE. I use this SQL query to get the value.
SELECT TIMESTAMP((MAX(START_RBA))||X'0001') + CURRENT TIMEZONE
FROM SYSIBM.SYSCOPY
WHERE
   DBNAME=<DBNAME> AND
   ICDATE=<DATE>
WITH UR;


         Phase 3: Execution of the data migration JCLs
This phase can go smooth if we follow the checklist.

Improvements to DB2 objects' structure comparison


In my previous blog I mentioned how and why I developed an alternate process to Object Capture using DB2 Admin tool. After some discussion and based on feedback from my team, I made some improvements to this process.


               1. Source and Target comparison
The old version of the process compared Source with Target, but not vice-versa. And, the job has to be submitted again with Source and Target names swapped.There can some objects present in Target but not in Source. So, these were getting missed. 
          
              Improvement: The new version of the process compares Source with Target and vice-versa.


               2. Tablespaces and Columns comparison
The old version of the process just tried to find whether the Tablespace and Columns existed in Target or not (same is done for Tables and Indexes). This may not be enough in our ERP environment as we can have versioned Tablespaces and Columns in different sequence (due to something like patches applied in DEV but not yet in QA and Prod).

            Improvement: The new version of the process compares the following for Tablespaces and Columns.
                               For Tablespaces:
                               a. Name
                               b. OLDEST_VERSION and CURRENT_VERSION
                               c. Type (UTS or non-UTS)

                               For Columns
                               a. Name
                               b. Sequence number
                               c. Data type
                               d. Length and Scale
                               e. Nulls


                3. Storing the structures comparison report for historical purpose
The old version of the process wrote the comparison report to a Sequential dataset. Referring the comparison report will be helpful (and needed) in our ERP world whenever a new upgrade or patch is applied. But, it is cumbersome to refer the datasets for historical purposes.

            Improvement: The new version of the process loads the comparison report into a DB2 table. It will be easy to query a DB2 Table to get the comparison details and is a better approach for storing the reports for historical purposes. The DB2 table has the following columns.
                                                     i.          Source DB
                                                     ii.         Target DB
                                                     iii.        Object type (TS/TB/IX/CL)
                                                     iv.        Object name
                                                     v.         Table name (only for columns)
                                                     vi.        Present in Source (Y/N) – for TS/TB/IX/CL
                                                     vii.       Present in Target (Y/N) – for TS/TB/IX/CL
                                                     viii.      Type mismatch (Y/N) – only for TS
                                                     ix.        Version mismatch (Y/N) – only for TS
                                                     x.         Sequence mismatch (Y/N) – only for cols
                                                     xi.        Data type mismatch (Y/N) – only for cols
                                                     xii.       Length and Scale mismatch (Y/N) – only for cols
                                                     xiii.      Nulls mismatch (Y/N) – only for cols
                                                     xiv.      Row entry time stamp

Apr 8, 2017

DB2 objects' structure comparison using ISRSUPC


My team does frequent data migrations across DB2 sub-systems, for ERP application. We use Redirected Recovery component of Recovery Expert for z/OS for data migrations. Its an acceptable situation where the objects' structures in source sub-system will not be in sync with that of target sub-system. But, data migration jobs will fail if we try to migrate data for not-in-sync objects.

Recovery Expert doesn't have in-built object structure comparison capability. So, we use DB2 Object Comparison Tool present in DB2 Administration Tool menu for comparing the objects' structures before data migration. The activities involved in generating the report of mismatches using the Tool are
                - Specify compare source (new)
                - Specify compare target (old)
                - Specify compare masks       
                - Specify ignores             
                - Generate compare job           

The Tool performs fine. But, I didn't like going through this whole process in DB2 Object Comparison Tool and felt that its complicated and time consuming. I want to get the same results in less time and with less prep work. So, I tried and built a new process for Objects' structure comparison using REXX.

The valid question is , do we really need an alternate process when there is a sophisticated Tool available already. My answer is, to keep it simple *GRIN*



The first hurdle I faced in developing the new process is, handling cross sub-systems connections. 
As we do cross sub-systems data migrations, Source and Target databases can be on different sub-systems. Location transparency using three-part names is not enabled in our Mainframes setup. And, REXX can't handle connections to multiple DB2 sub-systems at same time. So, I need to open a connection to DB2 sub-system, do the processing, close the connection and then open new connection to other DB2 sub-system.
I coded separate PARAs in the REXX routine to handle the cross connections.


The second hurdle I faced in developing the new process is, batch job's (that executes the REXX routine) performance. 
First version of my REXX routine worked as below (in the exact order).
          1. REXX routine reads input (Source and Target database names)
          2. Connect to Source DB2 sub-system
          3. Get TS, TB, IX, Columns details into arrays for the Source Database
          4. Close connection to Source DB2 sub-system
          5. Connect to Target DB2 sub-system
          6. Get TS, TB, IX, Columns details into arrays for the Target Database
          7. Close connection to Target DB2 sub-system
          8. Compare TS' details and write mismatches to output
          9. Compare TB' details and write mismatches to output
          10. Compare IX' details and write mismatches to output
          11. Compare Columns' details and write mismatches to output

I coded simple DO loops in the COMPARE_PARA, which worked fast for TS, TB & IX comparisons, but the performance took a worst blow for Columns' comparison (there are 50,000+ Columns in every database). We have regular REXX and System REXX (that come by default with z/OS installation), but don't have a REXX compiler which could have helped in making REXX execution faster. The job started taking hours for completion, which is not at all expected. 


After going through the great suggestions I got for my posts in LinkedIn, re-coded my REXX routine replacing the DO loop in COMPARE_PARA with call to ISRSUPC. Second version of my REXX routine has the below new steps included into it.
         1. Allocate NEWDD, OLDDD, OUTDD datasets
         2. Read the TS' details of Source database into NEWDD file
         3. Read the TS' details of Target database into OLDDD file
         4. Setting PARM values to DELTAL,LINECMP
         5. Execute the blow statement
             ADDRESS ISPEXEC "SELECT PGM(ISRSUPC) PARM("PARMS")"
         6. Process OUTDD dataset to get details of mismatches. 
         7. Free NEWDD, OLDDD, OUTDD datasets.

The same order is followed for TB, IX and Columns details comparison.
ISRSUPC did the magic. The job that took hours for completion started completing in minutes.


The JCL used for REXX routine execution is below.
JOBLIB datasets, Input Database names and SYSUT2 dataset name change based on requirement.
//PDB00000 JOB (0000,00000000),'OBJECT COMPARE',             
//         CLASS=E,MSGCLASS=X,NOTIFY=&SYSUID,REGION=0M       
//JOBLIB   DD  DSN=EGP1X.SDSNEXIT,DISP=SHR                   
//                 DD  DSN=DSNEGP1.SDSNLOAD,DISP=SHR                 
//                 DD  DSN=EGQ1X.SDSNEXIT,DISP=SHR                   
//                 DD  DSN=DSNEGQ1.SDSNLOAD,DISP=SHR                 
//STEP010  EXEC PGM=IKJEFT01,DYNAMNBR=10                     
//ISPMLIB  DD DISP=SHR,DSN=ISP.SISPMENU                      
//                  DD DISP=SHR,DSN=ISF.SISFMLIB                      
//ISPSLIB   DD DISP=SHR,DSN=ISP.SISPSENU                      
//                  DD DISP=SHR,DSN=ISP.SISPSLIB                      
//ISPTLIB   DD DISP=SHR,DSN=ISP.SISPTENU                      
//                  DD DISP=SHR,DSN=ISF.SISFTLIB                      
//ISPPLIB   DD DISP=SHR,DSN=ISP.SISPPENU                      
//                 DD DISP=SHR,DSN=ISF.SISFPLIB                      
//ISPLOG   DD DCB=(RECFM=VA,LRECL=125,BLKSIZE=129),SYSOUT=*  
//ISPLIST   DD DCB=(RECFM=FBA,LRECL=121,BLKSIZE=1210),SYSOUT=*
//ISPPROF  DD RECFM=FB,LRECL=80,SPACE=(TRK,(2,2,2))          
//SYSPROC  DD  DISP=SHR,DSN=RFO7936.PROCAUTO.OBJCOMP         
//SYSTSPRT DD  SYSOUT=*                                      
//SYSOUT   DD  SYSOUT=*                                      
//SYSTSIN  DD  *                                   
 ISPSTART CMD(%RXOBJCMP LPALIF LQALTAX)            
//SYSUT1   DD DUMMY                               
//SYSUT2   DD DSN=RFO7936.SUPC.OUTP,               
//            DISP=(NEW,CATLG,DELETE),UNIT=DISK,   
//            DCB=(RECFM=FB,LRECL=80,BLKSIZE=3600),
//            SPACE=(CYL,(5,5),RLSE)               
//*

Finally, I'm getting the desired output within minutes by submitting just a JCL *WINK*