Tech Talk #10
Jaspersoft ETL Architecture
Alan Kavanagh & Ernesto Ongaro
Agenda
-
ETL Basics
- Jaspersoft ETL Editions
-
Architecture
- Installation
-
The Job Designer (Talend Studio)
- The Admin Center (TAC)
- Q&A
ETL Basics
[E]xtract
[T]ransform
[L]oad
Variations
(replica of source systems)
[E]
[L]
(transformations occur in database)
[E]
[L]
[T]
Why ETL?
- Speed - Source systems are usually optimized for writing small bits of data at once (OLTP)
- Complexity - reporting and analytics from source data is usually too complex
- Join multiple systems - correlate between sales and marketing for example
- Data Quality - Take machine data and turn it into business data
Types of Transforms
- Selecting only columns you want
- Translating values (1= male, 2= female)
- Deriving calculated values (qty * unit_price = sale_amt)
- Joining multiple systems
- Aggregations (sales per day)
Data Quality:
- Duplicates
- Missing fields
- Standardization
- Linking - fuzzy logic
Community vs Commercial
Biggest difference:
Once you design a job in the community editions, you have to export as a JAR and you're on your own. Scheduling, failures, etc - you're on your own.
Commercial edition comes with a web app to manage all this.
There's also some differences in the designer...
(Studio)
- CDC (Change Data Capture)
- Data Viewer
- Versioning/Shared Repo
- Metadata wizards
Architecture..lots of moving pieces!
The pieces:
- Studio: Desktop application for designing jobs (analogous to iReport)
- Admin Center: J2EE Web App for managing jobs, users (analogous to JasperReports Server)
- CmdLine: generates and deploys processes to a JobServer
- Database: Like our repo, database for internals
- SVN Server: Code is checked in and out here automatically by Studio and Admin Center
- JobServer(s): Run the actual ETL jobs
Installation
Like JasperReports Server
- Bundled install with Tomcat + H2, JobServer, CommandLine and JobServer
OR
Jaspersoft ETL Demo
Job Designer
Start Job Server + Command Line
Job workflows (publish)
Q&A
thank you!
Upcoming topics:
- March 19 (GMT-7) OLAP vs Domains
- March 26 (GMT) Linux Installation Tips
http://www.jaspersoft.com/external/jaspersoft-tech-talks
jaspersoftetl
By ernestoo
jaspersoftetl
- 2,616