Feb 26, 2018

Personal experience with working of MODIFY RECOVERY on non-GDS LP copy


In Feb 1st week of this year, we were asked by Application team to migrate data between 2 databases which are on different Db2 sub-systems (source database is on Prod and target database is on QA), using the backups (for source database) created on 27-Dec-2017.

These are the details about our backup strategy in Prod.
  1. All Tablespaces in Prod database are backed up daily (ICTYPE='F' only).
  2. LP are on DISK and LB are on TAPE
  3. Previous day's LP datasets are deleted before creating current day's LP backups.
  4. LB datasets are GDS and GDGLIMIT is 45 
These are the details about our MODIFY RECOVERY job in Prod
  1. Job is executed on quarterly basis.
  2. This is the syntax used in MODIFY RECOVERY job.                                                     RETAIN GDGLIMIT LAST (3)

I found that all the entries are gone from SYSCOPY for the date 27-Dec-2017, for the source database in Prod. The MODIFY RECOVERY job was executed on 6-Jan-2018.
My initial thought was that 27-Dec-2017 falls within 45 days and entries for LB backups have to be present in SYSCOPY as the GDGLIMIT is 45.

I realized after some digging that there was an oversight from my side in understanding the syntax used in our MODIFY RECOVERY job.
In the daily backup setup, LP copies are written to DISK and are non-GDS. As the LP copies are non-GDS, GDGLIMIT parameter was ignored by Db2 and just recent 3 records were retained in SYSCOPY for LP and LB copies.

To avoid such issues in future, we decided to change the syntax in MODIFY RECOVERY job to DELETE AGE (45). With this new syntax, all records that are older than 45 days will be deleted from SYSCOPY.

More details about MODIFY RECOVERY utility can be found here.



Jan 16, 2018

Post-migration verification, Redirected Recovery


In my previous blog, I mentioned about the concern we have whether the data got copied from source to target or not, and also mentioned the 4 ideas/approaches we came up with for the verification.

After internal team discussion, we decided to implement the below approach.
  • Take backups of target after data migration is done, and compare source’s COPYPAGESF (from SYSCOPY) with target’s COPYPAGESF (from SYSCOPY) 

This is how I implemented the approach.
This process will be executed after data migration.
  1. Create image copies for the Target database.
  2. Execute a JCL that has 3 steps
  • First step executes a SELECT query on source's SYSIBM.SYSCOPY to get data (based on the ICDATE we used for data migration) from these columns into a PS file.
DBNAME
TSNAME
ICDATE
ICTIME
DSNAME
DSNUM
COPYPAGESF
  • Second step executes SELECT query on target's SYSIBM.SYSCOPY to get data (for the image copies created just before starting this process) from the same columns mentioned above into a PS file.
  • Third step executes a REXX routine (I coded) that will compare the PS files generated in above  mentioned steps.


This approach is chosen as our backup strategy.
  • Reading source’s SYSTABLESPACESTATS data into a personal Table on daily basis, and comparing the data in personal Table with target’s real-time stats data (when needed)
This approach is bit complicated to implement, as we need to get the RTS details at the time of image copies done for the databases.
This approach can be implemented in 2 ways.

  1. Schedule a job that extracts the details from SYSTABLESPACESTATS and loads into a user-defined Table. This job will be triggered by the image copies jobs. 
          The main concerns with this way of implementation are
  • Elaborate planning as the schedule of image copy jobs has to be changed.
  • RTS details may be of not much use for verification if image copies are not SHARELEVEL REFERENCE

    2. In our Mainframes setup, RTS details are externalized every 30 minutes. So, a job can be executed every 25 minutes, which extracts the details from SYSTABLESPACESTATS and loads into a user-defined Table. 
          The main concerns with this way of implementation are
  • Time constraint. If the extract job runs long due to low priority in the system, then we may miss the RTS details.
  • We may be getting too many details than needed, as RTS details get loaded every 30 minutes.