Learn how to create ETL transformations to populate a star schema in a short span of time
Create a fully-functional ETL process using a practical approach
Follow the step-by-step instructions for creating an ETL based on a fictional company – get your hands dirty and learn fast
Create a star schema
Populate and maintain slowly changing dimensions type 1 and type 2
Load fact and dimension tables in an efficient manner
Use a columnar database to store the data for the star schema
Analyze the quality of the data in an agile manner
Implement logging and scheduling for the ETL process
Get an overview of the whole process: from source data to the end user analyzing the data
Learn how to auto-generate data for a date dimension
Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.
Building a Data Mart with Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.
Building a Data Mart with Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.
By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.
Packt video courses are designed to cover the breadth of the topic in short, hands-on, task-based videos. Each course is divided into short manageable sections, so you can watch the whole thing or jump to the bit you need. The focus is on practical instructions and screencasts showing you how to get the job done.
Follow carefully organized sequences of instructions that outline how to leverage the power of Pentaho Data Integration in a simple and practical approach.
Getting Started
The Second-hand Lens Store
The Derived Star Schema
Setting up Our Development Environment
Agile BI – Creating ETLs to Prepare Joined Data Set
Importing Raw Data
Exporting Data Using the Standard Table Output
Exporting Data Using the Dedicated Bulk Loading
Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL Improvements
Creating a Pentaho Analysis Model
Analyzing Data Using Pentaho Analyzer
Improving Your ETL for Better Data Quality
Slowly Changing Dimensions
Creating a Slowly Changing Dimension of Type 1 Using Insert/Update
Creating a Slowly Changing Dimension of Type 1 Using Dimension Lookup Update
Creating a Slowly Changing Dimension Type 2
Populating Data Dimension
Defining Start and End Date Parameters
Auto-generating Daily Rows for a Given Period
Auto-generating Year, Month, and Day
Creating the Fact Transformation
Sourcing Raw Data for Fact Table
Lookup Slowly Changing Dimension of the Type 1 Key
Lookup Slowly Changing Dimension of the Type 2 key
Orchestration
Loading Dimensions in Parallel
Creating Master Jobs
ID-based Change Data Capture
Implementing Change Data Capture (CDC)
Creating a CDC Job Flow
Final Touches: Logging and Scheduling
Setting up a Dedicated DB Schema
Setting up Built-in Logging
Scheduling on the Command Line
Ещё видео!