CALL NOW! 17328234456
Course Details - ilabs Technology Solutions

Application packaging and Sequencing

Course Fee: 2500INR      Course Duration: Regular- 45 days, Fast Track-30 days      Lab Timings: 1pm to 2pm


Course Introduction

Application packaging and sequencing


Description

Module I: Introduction on DataStage
• Brief history of DataStage
• where does DataStage fits with in data ware housing contest?
• What is IBM information server 8.0 and Web Sphere DataStage?
• Where web sphere DataStage packs fit with in the IBM Information server architecture
• various versions available

• Introduction to DataStage server components
o Repository
o DataStage server
o DataStage package installer

• Introduction to DataStage client components
o DataStage administrator
o DataStage designer
o DataStage director
o DataStage manager (removed in latest version)

IBM web sphere DataStage and quality stage designer overview
• How to connect to a project
• The designer A quick tour
• IBM information server repository
• Developing a job
• Introduction to job properties
• Introduction to job parameters
• Introduction to table definitions
• Importing and exporting from the repository
• Report assistance and documentation tools
• Configuration files editor.

Module III: Web Sphere DataStage Server Jobs:
Introduction to server jobs
handling databases in server jobs
handling databases in server jobs
• Handling special characters (#and$)
• Loading tables.
• Data type conversion- writing to oracle
• Data type conversion- reading from oracle
• Looking up on oracle table
• Updating on oracle table
• ODBC Stage
• Universe stage
• Handling files in server jobs

• Sequential file stage
o How to use sequential file stage
o Defining sequential file input data
o Defining sequential file output data
o How the sequential stage behaves
o Folder stages

• Handling processing stages in server jobs

• Transformer stage
o How to use transformer stage
o Transformer editor components
o The DataStage expression editor
o Transformer stage properties
o Overview of transformer function
o Using transformer as a look up stage

• Aggregator stage
o How to use aggregator stage
o Defining the input column sort order
o Aggregating data

• Merge stage
• Sort stage

Parallel processing in DataStage

Infrastructure as a foundation for data warehousing
• various hardware and the operating systems available
• What are the various platform option
• Client server architecture for data warehouse
o Various server hardware available
o SMP (symmetric multiprocessing)
o Clusters
o MPP ( massively parallel processing)
o CCNUMA OR NUMA(cache-coherent non – Uniform memory architecture)

Types of parallel processing in DataStage
• pipeline parallelism
• partition parallelism
• Combining pipeline and partition parallelism
• Repartitioning data
• Parallel processing environments
• The configuration file

Types of partitioning in DataStage
• round robin
• random
• same
• entire
• hash by field
• modulus
• range
• DB2
• Auto

Type of collecting in DataStage
  • round robin
  • ordered
  • sorted merge
  • auto
  • The mechanics of partitioning and collecting

Web sphere DataStage parallel jobs
• introduction to DataStage parallel jobs
• difference between a passive stage and active stage

• handling metadata in DataStage
o Running column propagation (RPC)
o Table definitions
o Schema files and partial schemas
o Data types
o Data and time formats
o Complex data types

• Handling oracle enterprise stage in parallel jobs
• handling special characters(# and $)
• loading tables
• type conversions writing to oracle
• updating an oracle database
• deleting rows from an oracle database
• leading an oracle database
• reading an oracle database
• performing a direct lookup on an oracle database table
• using SQL builder
• Handling transformer stage in parallel jobs
• how it is different from server transformer stage
• creating and deleting columns
• handling null values
• defining constraints and handling otherwise links
• specifying link order
• defining local stage variables
• what is a BASIC transformer stage
• transformer functions
? combining data in DataStage parallel jobs
• horizontal and vertical combining

• join stage
o inner
o Left outer
o Right outer
o Full order

• Look up stage
• Merger stage
• Comparison between join merge and look up stage
• Partitioning in reference links
• Aggregator stage

• Funnel stage
o Funnel mode
o Sort funnel mode
o Sequence

Some more useful stages in DataStage parallel jobs
o sort stage
- sequential sort
- Parallel sort
- Total sort
- Partitioning requirement
o Remove duplicates stage
o Modify stage
- Dropping and keeping columns
- Changing data type
- Null handling

 Pivot stage
- Limitations in pivot stage
- Modify stage
- Copy stage
- Filter stage
- External filter stage
- Switch stage
- Compress stage
- Expand stage
- Encode stage
- Decode stage
- FIP enterprise stage
- Generic stage
- Surrogate key generator stage
- SAS stage

Capturing changes in DataStage parallel jobs
o change capture stage
o Change apply stage
o Difference stage
o Compare stage
o Slowly changing dimension stage

Handling develop / debug stages in DataStage parallel jobs
o Head stage
- Head stage
- Head stage default behavior
- Skipping data
o Tail stage
o Sample stage
o Peek stage
o Row generator stage
- How to specify data to be generated
- Generating data in parallel

Colomn generator stage
Write range map stage
- How to perform range look up in DataStage
- handling restructure stages in DataStage parallel jobs
- colomn import stage
- Colomn export stage
- Make sub record stage
- Split sub record stage
- Combine records stage
- Promote sub record stage
- Make vector stage
- Split vector stage

Handling XML file in DataStage parallel jobs
- Introduction to XML files
- Using the XML meta data importer
- Using xml input stage
- Validating documents and schemas
- Processing namespaces
- Supported x path expressions

Using XML output stage
- Processing names spaces
- Supported x path expressions
- Aggregating input rows on output
- Writing output to your file system
- Processing NULLS and empty values
- How repetition paths work

Using xml transformer stage
- Optimizing performance in server and parallel jobs
Web sphere DataStage jobs and processes
interpreting performances statistics in server jobs
improving performance in server jobs
- CPU limited jobs single processor systems
- CPU limited jobs multiprocessor systems
- I/O limited jobs
- Hashed file stages
- Hash file design

Inter process stages in sever jobs
Link collector stages in server jobs
Link partitioned stages in server jobs

Job design tips in parallel jobs
- Processing large volumes of data
- Modular development
- Designing for good performance
Database sparse lookup vs. join

Improving performance in parallel jobs
- Understanding a flow
- Performance monitoring
- Resolving bottlenecks
- Ensuring data is evenly partitioned

Programming in DataStage
Introduction to programming components
Routines

- Transform functions
- Before /after subroutines
- Custom universe functions
- Active (ole) functions
- Subroutines
- Creating a routine
- Defining custom transforms
Transforms
Macros
Precedence rules
BASIC programming

Built in transforms and routines
- Handling web services in Data Stage
Introduction to web services technologies
Encoding requests and responses
Using the soap framework
Publishing web service operations
Accessing web services
What is the web service pack
Using the web service meta data importer
Using the web services transformer stage
Using the web services client stage
Creating web service routines
How to expose DataStage job as a web service
? Using IBM information console
Job scheduling using job sequences in DataStage

Creating a job sequence
- Overview of activity stages
- Triggers
- Expressions
- Job activity properties
- Routine activity properties
- Email notification activity properties
- Wait for file activity properties
- exception activity properties
- Nested condition activity properties
- Start loop activity properties
- End loop activity properties
- User variables activity properties
- Compiling and restarting the job sequence

Some advanced concepts in DataStage
- Achieving reusability in DataStage using containers
- Types of containers
- Local containers
- Server shared containers
- Parallel shared containers
- Creating a shared containers
- Using shared containers in DataStage jobs
- Converting shared containers to local containers
- Deconstruction of shared containers
- Specifying our own parallel stage
- Defining custom stage
- Defining build stage
- Build stage macros
- Defining wrapper stage
- Usage of administrator client in datastage
- Adding environment variables
- Setting job parameters default values
- Changing license details
- Handling projects
- Buffer settings in DataStage
- Multiple instances of jobs in DataStage
- DataStage job control utility
- Jobs – compilation execution and checking of logs using DataStage tool
- Handling multilingual data in DataStage
- How to enable NLS on DataStage
- Orchestrate architecture and commands
- Orchestrate parallel processing framework in datastage
- Orchestrate utility in DataStage
- Surrogate key generation using DataStage
- Version control in DataStage