Data Governance
حوكمة البيانات
مؤسسة دوائر البيانات
Data Governance DG
- Data governance (DG) is the overall management of the availability, usability, integrity, and security of data used in an enterprise
- Businesses benefit from data governance because it ensures data is consistent and trustworthy
Related Subjects
- Data governance (DG) is a general term that covers multiple subjects such as:
- Data Catalog
- ETL procedure , Extracting, Transforming and Loading data
- Data Analysis and Processing
- Stream Processing for real time applications
- Batch Processing for offline applications
DG Implementation
- Defining the owners or custodians of the data assets in the enterprise. (data stewardship)
- Processes must then be defined to effectively cover how the data will be stored, archived, backed up and protected from theft or attacks.
- A set of standards and procedures must be developed that defines how the data is to be used by authorized personnel
- A set of controls and audit procedures must be put into place that ensures ongoing compliance with internal data policies and external government regulations, and that guarantees data is used in a consistent manner across multiple enterprise applications
DG Teams
- Teams of data stewards: Acts as a communication channel between the IT department and the business side of an organization.
- includes
- Database administrators
- Business analysts
- Data architects
- Business intelligence developers
- Extract, transform and load (ETL) designers
- Business data owners
DG implementation
- Data Quality
- Data scrubbing, also known as data cleansing
- Master data management ( MDM )
- Metadata repositories, which hold data about data
Who Needs DG
- For governmental institutions.
- Example:
- The European Union's (EU's) directive concerning General Data Protection Regulation (GDPR) is an example of a use case for data governance.
- Example:
- Enterprises with huge amount of data
DG Open Source Solutions
-
Kylo
- https://kylo.io/Bullet Two
- Apache Atlas
Kylo Solution
-
Kylo is an open source enterprise-ready data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices
- Ingest: Self-service data ingest with data cleansing, validation, and automatic profiling.
- Prepare: Wrangle data with visual SQL and an interactive transform through a simple user interface.
- Discover Search and explore data and metadata, view lineage, and profile statistics.
- Monitor: Monitor health of feeds and services in the data lake. Track SLAs and troubleshoot performance.
- Design: Design batch or streaming pipeline templates in Apache NiFi and register with Kylo to enable user self-service.
Kylo Architecture
Why Kylo
-
Speed to market
Kylo can accelerate your big data efforts, helping your program stay ahead of the competition -
Growth through innovation
UsingKylo , the prioritized use cases you select will help deliver business value and new opportunities across your company - Improved quality, security and governance
-
Cost reduction Kylo can help your organization build
custom engineered data lakes at a fraction of the typical cost
Who uses Kylo?
- Airline: 2 companies of top 15 global brands
- Telecommunications: 2 companies of top 10 European brands
- Banking: 2 companies of top 5 global brands
- Insurance: 2 companies of top 10 US brands
- Financial Services: 1 company of top 5 global brands
- Retail and Consumer Goods: 2 companies of top 10 global brands
TeraData Company
- US-based company
- Started developing the product years ago
- Now it's open source.
- Provides support and training for
kylo's customers
Thanks
Data Governance
By abshammeri
Data Governance
- 374