Data Governance

حوكمة البيانات

مؤسسة دوائر البيانات

Data Governance DG

  • Data governance (DG) is the overall management of the availability, usability, integrity, and security of data used in an enterprise
     
  • Businesses benefit from data governance because it ensures data is consistent and trustworthy

Related Subjects

  • Data governance (DG) is a general term that covers multiple subjects such as:
     
    • Data Catalog
    • ETL procedure , Extracting, Transforming and Loading data
    • Data Analysis and Processing
      • Stream Processing for real time applications
      • Batch Processing for offline applications

DG Implementation

  • Defining the owners or custodians of the data assets in the enterprise. (data stewardship)
  • Processes must then be defined to effectively cover how the data will be stored, archived, backed up and protected from theft or attacks.
  • A set of standards and procedures must be developed that defines how the data is to be used by authorized personnel
  • A set of controls and audit procedures must be put into place that ensures ongoing compliance with internal data policies and external government regulations, and that guarantees data is used in a consistent manner across multiple enterprise applications

DG Teams

  • Teams of data stewards: Acts as a communication channel between the IT department and the business side of an organization.
  • includes
    • Database administrators
    • Business analysts
    • Data architects
    • Business intelligence developers
    • Extract, transform and load (ETL) designers
    • Business data owners

DG implementation

  • Data Quality
    • Data scrubbing, also known as data cleansing
  • Master data management ( MDM )
    • Metadata repositories, which hold data about data

Who Needs DG

  • For  governmental institutions.
    • Example:
      • The European Union's (EU's) directive concerning General Data Protection Regulation (GDPR) is an example of a use case for data governance.
      •  
  • Enterprises with huge amount of data

DG Open Source Solutions

  • Kylo
    • https://kylo.io/Bullet Two
  • Apache Atlas

Kylo Solution

  • Kylo is an open source enterprise-ready data lake management software platform for self-service data ingest and data preparation with integrated metadata management, governance, security and best practices
    • Ingest: Self-service data ingest with data cleansing, validation, and automatic profiling.
    • Prepare: Wrangle data with visual SQL and an interactive transform through a simple user interface.
    • Discover Search and explore data and metadata, view lineage, and profile statistics.
    • Monitor: Monitor health of feeds and services in the data lake. Track SLAs and troubleshoot performance.
    • Design: Design batch or streaming pipeline templates in Apache NiFi and register with Kylo to enable user self-service.

Kylo Architecture

Why Kylo

  • Speed to market
     
    Kylo can accelerate your big data efforts, helping your program stay ahead of the competition
  • Growth through innovation
    Using Kylo, the prioritized use cases you select will help deliver business value and new opportunities across your company
  • Improved quality, security and governance
  • Cost reduction Kylo can help your organization build custom engineered data lakes at a fraction of the typical cost
     

Who uses Kylo?

  • Airline: 2 companies of top 15 global brands
  • Telecommunications: 2 companies of top 10 European brands
  • Banking: 2 companies of top 5 global brands
  • Insurance: 2 companies of top 10 US brands
  • Financial Services: 1 company of top 5 global brands
  • Retail and Consumer Goods: 2 companies of top 10 global brands

TeraData Company

  • US-based company
  • Started developing the product years ago
  • Now it's open source.
  • Provides support and training for kylo's customers

Thanks

Data Governance

By abshammeri

Data Governance

  • 374