Cyberak

Analyse en vue d'une refactorisation

Contexte

  • 1: Le scope VAULT_EVD1 ( par la suite VAULT_API1)
  • 2: Entrées et sorties doivent dans la mesure du possible rester identiques
  • 3: Limiter le nombre de schema (ii_collect, ii_custom, ii_protec) --> (ii_collect, ii_protec)
  • 4: Limiter l'usage des vues (revoir les vue de ii_custom)
  • 5: Refactoriser le process de post process -> amelioration de la performance

Stats de la pipeline obtenues à partir du collecteur de statistiques dans un contexte de prod.

--1st step à faire qu'une fois:
SET track_functions = 'all';

--Boucle sur chaque fonction/ a chaque fois lancer une instruction à la fois:
--1st step:
SELECT pg_stat_reset();
--2nd step:
SELECT ii_protec.sp_ipc_evd_ref_import_cyberark_data('212af752-0ee2-11eb-b47a-0242ac120004'::uuid);
--3rd step:
SELECT * from pg_stat_user_functions;

Voir comment automatiser la collecte de statistiques par étapes afin d'obtenir des métriques et repérer des anomalies.

Création d'une table contenant les logs des métriques ?

Premier problème soulevé

L'accumulation des vues

  • Le schéma ii_collect contient les vues liant tous les vaults par type de table importée.
  • Différentes vues sont construites dans le schéma ii_protec à partir de ces dernières :
  1. ii_protec.v_ipc_collect_evd_safe
  2. ii_protec.v_ipc_collect_evd_safe_modified

 

--
-- Type: VIEW ; Name: v_ipc_collect_evd_safe; Owner: postgres
--

CREATE OR REPLACE VIEW ii_protec.v_ipc_collect_evd_safe AS
 SELECT collect_safe.fk_ref_vault AS fk_ref_vault_id,
    collect_safe."CASSafeID" AS safe_id,
    collect_safe."CASSafeName" AS safe_name,
    collect_safe."CASLocationID" AS location_id,
    collect_safe."CASLocationName" AS location_name,
    collect_safe."CASCreationDate" AS creation_datetime,
    collect_safe."CASCreatedBy" AS created_by,
    collect_safe."CASLastUsed" AS last_used,
    collect_safe."CASSize" AS safe_size,
    collect_safe."CASMaxSize" AS max_size,
    collect_safe."CASUsedSize" AS used_size,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASVirusFree"::text) AS is_virus_free,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASTextOnly"::text) AS is_text_only,
    collect_safe."CASAccessLocation" AS access_location,
    collect_safe."CASSecurityLevel" AS security_level,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireReasonToRetrieve"::text) AS require_reason_to_retrieve,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASEnforceExclusivePasswords"::text) AS enforce_exclusive_passwords,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireContentValidation"::text) AS require_content_validation,
    collect_safe."CASLogRetentionPeriod" AS log_retention_period,
    collect_safe."CASObjectsRetentionPeriod" AS objects_retention_period,
    collect_safe."CASRequestsRetentionPeriod" AS requests_retention_period,
    collect_safe."CASDelay" AS safe_delay,
    collect_safe."CASFromHour" AS from_hour,
    collect_safe."CASToHour" AS to_hour,
    collect_safe."CASDailyVersions" AS daily_versions,
    collect_safe."CASMonthlyVersions" AS monthly_versions,
    collect_safe."CASYearlyVersions" AS yearly_versions,
    collect_safe."CASNumberOfPasswordVersions" AS number_of_password_versions,
    collect_safe."CASShareOptions" AS share_options,
    collect_safe."CASConfirmersCount" AS confirmers_count,
    collect_safe."CASConfirmType" AS confirm_type,
    collect_safe."CASDefaultAccessMarks" AS default_access_marks,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultFileCompression"::text) AS has_default_file_compression,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultReadOnly"::text) AS is_default_readonly,
    collect_safe."CASQuotaOwner" AS quota_owner,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASUseFileCategories"::text) AS use_file_categories
   FROM ii_custom.v_collect_evd_safe collect_safe;


ALTER VIEW ii_protec.v_ipc_collect_evd_safe OWNER TO postgres;

--
-- Type: VIEW ; Name: v_ipc_collect_evd_safe_modified; Owner: postgres
--

CREATE OR REPLACE VIEW ii_protec.v_ipc_collect_evd_safe_modified AS
 SELECT collect_safe.fk_ref_vault AS fk_ref_vault_id,
    collect_safe."CASSafeID" AS safe_id,
    collect_safe."CASSafeName" AS safe_name,
    collect_safe."CASLocationID" AS location_id,
    collect_safe."CASLocationName" AS location_name,
    collect_safe."CASCreationDate" AS creation_datetime,
    collect_safe."CASCreatedBy" AS created_by,
    collect_safe."CASLastUsed" AS last_used,
    collect_safe."CASSize" AS safe_size,
    collect_safe."CASMaxSize" AS max_size,
    collect_safe."CASUsedSize" AS used_size,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASVirusFree"::text) AS is_virus_free,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASTextOnly"::text) AS is_text_only,
    collect_safe."CASAccessLocation" AS access_location,
    collect_safe."CASSecurityLevel" AS security_level,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireReasonToRetrieve"::text) AS require_reason_to_retrieve,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASEnforceExclusivePasswords"::text) AS enforce_exclusive_passwords,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireContentValidation"::text) AS require_content_validation,
    collect_safe."CASLogRetentionPeriod" AS log_retention_period,
    collect_safe."CASObjectsRetentionPeriod" AS objects_retention_period,
    collect_safe."CASRequestsRetentionPeriod" AS requests_retention_period,
    collect_safe."CASDelay" AS safe_delay,
    collect_safe."CASFromHour" AS from_hour,
    collect_safe."CASToHour" AS to_hour,
    collect_safe."CASDailyVersions" AS daily_versions,
    collect_safe."CASMonthlyVersions" AS monthly_versions,
    collect_safe."CASYearlyVersions" AS yearly_versions,
    collect_safe."CASNumberOfPasswordVersions" AS number_of_password_versions,
    collect_safe."CASShareOptions" AS share_options,
    collect_safe."CASConfirmersCount" AS confirmers_count,
    collect_safe."CASConfirmType" AS confirm_type,
    collect_safe."CASDefaultAccessMarks" AS default_access_marks,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultFileCompression"::text) AS has_default_file_compression,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultReadOnly"::text) AS is_default_readonly,
    collect_safe."CASQuotaOwner" AS quota_owner,
    ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASUseFileCategories"::text) AS use_file_categories,
    false AS is_deleted_from_source
   FROM ii_custom.v_collect_evd_safe collect_safe
EXCEPT
 SELECT ref_safe.fk_ref_vault_id,
    ref_safe.safe_id,
    ref_safe.safe_name,
    ref_safe.location_id,
    ref_safe.location_name,
    ref_safe.creation_datetime,
    ref_safe.created_by,
    ref_safe.last_used,
    ref_safe.safe_size,
    ref_safe.max_size,
    ref_safe.used_size,
    ref_safe.is_virus_free,
    ref_safe.is_text_only,
    ref_safe.access_location,
    ref_safe.security_level,
    ref_safe.require_reason_to_retrieve,
    ref_safe.enforce_exclusive_passwords,
    ref_safe.require_content_validation,
    ref_safe.log_retention_period,
    ref_safe.objects_retention_period,
    ref_safe.requests_retention_period,
    ref_safe.safe_delay,
    ref_safe.from_hour,
    ref_safe.to_hour,
    ref_safe.daily_versions,
    ref_safe.monthly_versions,
    ref_safe.yearly_versions,
    ref_safe.number_of_password_versions,
    ref_safe.share_options,
    ref_safe.confirmers_count,
    ref_safe.confirm_type,
    ref_safe.default_access_marks,
    ref_safe.has_default_file_compression,
    ref_safe.is_default_readonly,
    ref_safe.quota_owner,
    ref_safe.use_file_categories,
    ref_safe.is_deleted_from_source
   FROM ii_protec.ipc_evd_ref_safe ref_safe;


ALTER VIEW ii_protec.v_ipc_collect_evd_safe_modified OWNER TO postgres;

Problème avec les vues : elles sont recalculés à chaque requête.

Ces trois vues sont utilisées dans la step_1.ii_protec.sp_ipc_evd_ref_import_cyberark_data(uuid)

On utilise donc trois vues et à chaque SELECT on recalcule les différents castings (notemment boolean, timestamp, ect)

Solution proposée

  1. Commenter toutes les tables des différents vaults avec leur uuid associé.
  2. Créer une table temporaire à partir de la table du vault souhaité.
  3. La table n'est pas journalisée et sera supprimée à la fin de la transaction.
  4. Tout est calculé une seule fois, pas besoin de caster les valeurs à chaque SELECT.
  5. Le défaut principal : usage des requêtes dynamic SQL pour la création des tables.
COMMENT ON TABLE ii_collect.vault1_evd_casafes IS '212af752-0ee2-11eb-b47a-0242ac120004';

do $$
DECLARE
var_relation regclass;
BEGIN

SELECT ('ii_collect.'||relname)::regclass FROM pg_class WHERE obj_description(oid) = '212af752-0ee2-11eb-b47a-0242ac120004' and relname ~ 'casafes'
into var_relation;
EXECUTE format('CREATE TEMP UNLOGGED TABLE  ii_custom.test_safe_table as SELECT
    -- II local values
    now() as "__insert_datetime", now() as "__lastmodif_datetime", ''Import Process'' as "__modif_user_login", FALSE as is_deleted_from_source,
    -- Safe references
    ''212af752-0ee2-11eb-b47a-0242ac120004''::uuid as fk_ref_vault,
    vault."CASSafeID" as safe_id,
    vault."CASSafeName" as safe_name,
    vault."CASLocationID" as location_id,
    vault."CASLocationName" as location_name,
    ii_custom.f_ipc_cast_text_as_timestamp(((vault."CASCreationDate")::text)::character varying, ''212af752-0ee2-11eb-b47a-0242ac120004''::uuid) AS creation_datetime,
    vault."CASCreatedBy" as created_by,
    ii_custom.f_ipc_cast_text_as_timestamp(((vault."CASLastUsed")::text)::character varying, ''212af752-0ee2-11eb-b47a-0242ac120004''::uuid) AS last_used,
    vault."CASSize" as safe_size,
    vault."CASMaxSize" as max_size,
    vault."CASUsedSize" as used_size,
    public.cast_text_as_boolean(vault."CASVirusFree"::text) as is_virus_free,
    public.cast_text_as_boolean(vault."CASTextOnly"::text) as is_text_only,
    vault."CASAccessLocation" as access_location,
    vault."CASSecurityLevel" as security_level,
    public.cast_text_as_boolean(vault."CASRequireReasonToRetrieve"::text) as require_reason_to_retrieve,
    public.cast_text_as_boolean(vault."CASEnforceExclusivePasswords"::text) as enforce_exclusive_passwords,
    public.cast_text_as_boolean(vault."CASRequireContentValidation"::text) as require_content_validation,
    vault."CASLogRetentionPeriod" as log_retention_period,
    vault."CASObjectsRetentionPeriod" as objects_retention_period,
    vault."CASRequestsRetentionPeriod" as requests_retention_period,
    vault."CASDelay" as safe_delay,
    vault."CASFromHour" as from_hour,
    vault."CASToHour" as to_hour,
    vault."CASDailyVersions" as daily_versions,
    vault."CASMonthlyVersions" as monthly_versions,
    vault."CASYearlyVersions" as yearly_versions,
    vault."CASNumberOfPasswordVersions" as number_of_password_versions,
    vault."CASShareOptions" as share_options,
    vault."CASConfirmersCount" as confirmers_count,
    vault."CASConfirmType" as confirm_type,
    vault."CASDefaultAccessMarks" as default_access_marks,
    public.cast_text_as_boolean(vault."CASDefaultFileCompression"::text) as has_default_file_compression,
    public.cast_text_as_boolean(vault."CASDefaultReadOnly"::text) as is_default_readonly,
    vault."CASQuotaOwner" as quota_owner,
    public.cast_text_as_boolean(vault."CASUseFileCategories"::text) as use_file_categories
  FROM
    %s vault',var_relation);

end;
$$

Deuxième problème soulevé

Fonction de casting non optimisées

Function de base cast as boolean

Function optimisée cast as boolean

6,4 plus rapide pour le même jeu de données

CREATE OR REPLACE FUNCTION public.cast_text_as_boolean(p_text_input text)
 RETURNS boolean
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
DECLARE
	-- -- -- -- MANDATORY -- -- -- -- 
	var_current_object_name text;
	var_error_handler text;
	var_error_details text;
	-- -- -- -- -- -- -- -- -- -- -- 
	
	v_result boolean := false;
BEGIN

	v_result = COALESCE(NULLIF(p_text_input,''),'f')::boolean;
		
	RETURN v_result;
	
EXCEPTION 
WHEN OTHERS THEN
    IF p_text_input ~* '^[OV].*' 
        THEN 
        v_result = true;

    ELSE
        GET STACKED DIAGNOSTICS var_error_handler = PG_EXCEPTION_CONTEXT;
        var_current_object_name := 'ii_dwh.f_ii_core_cast_text_as_boolean()';
    	var_error_details := 
    		'Error Name: ' || SQLERRM
    		|| ' | ----------- | '
    		|| 'Error State: ' || SQLSTATE
    		|| ' | ----------- | '
    		|| 'Error Context: ' || var_error_handler;
    		
    	PERFORM ii_dwh.sp_ii_core_log('', 10, var_current_object_name, null, var_error_details);	
    	
    	
    END IF;
    RETURN v_result;    

END;
$function$
CREATE OR REPLACE FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(p_text_input text)
 RETURNS boolean
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
DECLARE
	-- -- -- -- MANDATORY -- -- -- -- 
	var_current_object_name text;
	var_error_handler text;
	var_error_details text;
	-- -- -- -- -- -- -- -- -- -- -- 
	
	v_result boolean;
BEGIN
	var_current_object_name := 'ii_dwh.f_ii_core_cast_text_as_boolean()';
	
--	PERFORM ii_dwh.sp_ii_core_log('', 30, var_current_object_name, null, 'Start Execution');

	-- CORE FUNCTION / 
	-- v1 / 2020.01.10 / Fabien / Initial version
	-- This function casts text to boolean.
	--	INPUT: value to cast as text
	--	OUTPUT: value as boolean
	
	v_result := false;
	
--	PERFORM ii_dwh.sp_ii_core_log('', 40, var_current_object_name, null, 'p_text_input => ' || p_text_input);
	
	SELECT
		CASE UPPER(trim(p_text_input))
			WHEN '1' THEN true
			WHEN 'VRAI' THEN true
			WHEN 'TRUE' THEN true
			WHEN 'OUI' THEN true
			WHEN 'YES' THEN true
			WHEN 'T' THEN true
			WHEN 'Y' THEN true
			WHEN 'O' THEN true			
			WHEN '0' THEN false
			WHEN 'FAUX' THEN false
			WHEN 'FALSE' THEN false
			WHEN 'NO' THEN false
			WHEN 'NON' THEN false
			WHEN 'N' THEN false
			WHEN 'F' THEN false
			ELSE false
		END 	
	INTO
		v_result;
		
	RETURN v_result;

EXCEPTION WHEN OTHERS THEN
	GET STACKED DIAGNOSTICS var_error_handler = PG_EXCEPTION_CONTEXT;
    
	var_error_details := 
		'Error Name: ' || SQLERRM
		|| ' | ----------- | '
		|| 'Error State: ' || SQLSTATE
		|| ' | ----------- | '
		|| 'Error Context: ' || var_error_handler;
		
	PERFORM ii_dwh.sp_ii_core_log('', 10, var_current_object_name, null, var_error_details);	
	
	RETURN v_result;
END;
$function$;

ALTER FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(text) OWNER TO postgres;

GRANT EXECUTE ON FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(text) TO PUBLIC;
GRANT EXECUTE ON FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(text) TO postgres WITH GRANT OPTION;

Troisième problème soulevé

Multiplication des requêtes

Sur chaque table :

  1. UPDATE tag as deleted
  2. UPDATE
  3. INSERT

Problème à chaque fois on relance une requête SELECT pour la clause FROM.

Associé au premier problème, on comprend qu'à chaque fois on SELECT sur une vue qui se recalcule à chaque SELECT.

UPDATE
	ii_protec.ipc_evd_ref_safe
SET
	is_deleted_from_source = TRUE
	, "__lastmodif_datetime" = now()
	, "__modif_user_login" = 'Import Process'
WHERE
	id IN (
		SELECT
			ref_safe.id
		FROM
			ii_protec.ipc_evd_ref_safe ref_safe
			LEFT JOIN ii_custom.test_safe_mat evd_safe ON
				evd_safe.safe_id = ref_safe.safe_id
				AND evd_safe.fk_ref_vault = ref_safe.fk_ref_vault_id
		WHERE
			evd_safe.safe_id IS NULL
			AND ref_safe.is_deleted_from_source IS FALSE
			AND ref_safe.fk_ref_vault_id = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid
	);

	INSERT INTO	ii_protec.ipc_evd_ref_safe
	(
		-- II local values
		"__insert_datetime", "__lastmodif_datetime", "__modif_user_login", is_deleted_from_source
		-- Safe references
		, fk_ref_vault_id, safe_id, safe_name, location_id, location_name, creation_datetime, created_by, last_used
		-- Sizing
		, safe_size, max_size, used_size
		-- Security contraints
		, is_virus_free, is_text_only, access_location, security_level, require_reason_to_retrieve, enforce_exclusive_passwords, require_content_validation
		-- Retention + usage contraints
		, log_retention_period, objects_retention_period, requests_retention_period, safe_delay, from_hour, to_hour, daily_versions, monthly_versions, yearly_versions, number_of_password_versions
		-- Options
		, share_options, confirmers_count, confirm_type, default_access_marks, has_default_file_compression, is_default_readonly, quota_owner, use_file_categories
	)
	(
		SELECT
			-- II local values
			now(), now(), 'Import Process', FALSE
			-- Safe references
			, collected_data.fk_ref_vault, collected_data.safe_id, collected_data.safe_name, collected_data.location_id, collected_data.location_name, collected_data.creation_datetime, collected_data.created_by, collected_data.last_used
			-- Sizing
			, collected_data.safe_size, collected_data.max_size, collected_data.used_size
			-- Security contraints
			, collected_data.is_virus_free, collected_data.is_text_only, collected_data.access_location, collected_data.security_level, collected_data.require_reason_to_retrieve, collected_data.enforce_exclusive_passwords, collected_data.require_content_validation
			-- Retention + usage contraints
			, collected_data.log_retention_period, collected_data.objects_retention_period, collected_data.requests_retention_period, collected_data.safe_delay, collected_data.from_hour, collected_data.to_hour, collected_data.daily_versions, collected_data.monthly_versions, collected_data.yearly_versions, collected_data.number_of_password_versions
			-- Options
			, collected_data.share_options, collected_data.confirmers_count, collected_data.confirm_type, collected_data.default_access_marks, collected_data.has_default_file_compression, collected_data.is_default_readonly, collected_data.quota_owner, collected_data.use_file_categories
		FROM
			ii_custom.test_safe_mat collected_data
			LEFT JOIN ii_protec.ipc_evd_ref_safe ref_safe
				ON collected_data.safe_id = ref_safe.safe_id
				AND ref_safe.fk_ref_vault_id = collected_data.fk_ref_vault
		WHERE
			ref_safe.id IS NULL
			AND collected_data.fk_ref_vault = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid
	);

	UPDATE
		ii_protec.ipc_evd_ref_safe
	SET
		-- II local values
		"__lastmodif_datetime" = now()
		, "__modif_user_login" = 'Import Process'
		, is_deleted_from_source = FALSE
		-- Safe references
		, safe_name = collected_data.safe_name
		, location_id = collected_data.location_id
		, location_name = collected_data.location_name
		, creation_datetime = collected_data.creation_datetime
		, created_by = collected_data.created_by
		, last_used = collected_data.last_used
		-- Sizing
		, safe_size = collected_data.safe_size
		, max_size = collected_data.max_size
		, used_size = collected_data.used_size
		-- Security contraints
		, is_virus_free = collected_data.is_virus_free
		, is_text_only = collected_data.is_text_only
		, access_location = collected_data.access_location
		, security_level = collected_data.security_level
		, require_reason_to_retrieve = collected_data.require_reason_to_retrieve
		, enforce_exclusive_passwords = collected_data.enforce_exclusive_passwords
		, require_content_validation = collected_data.require_content_validation
		-- Retention + usage contraints
		, log_retention_period = collected_data.log_retention_period
		, objects_retention_period = collected_data.objects_retention_period
		, requests_retention_period = collected_data.requests_retention_period
		, safe_delay = collected_data.safe_delay
		, from_hour = collected_data.from_hour
		, to_hour = collected_data.to_hour
		, daily_versions = collected_data.daily_versions
		, monthly_versions = collected_data.monthly_versions
		, yearly_versions = collected_data.yearly_versions
		, number_of_password_versions = collected_data.number_of_password_versions
		-- Options
		, share_options = collected_data.share_options
		, confirmers_count = collected_data.confirmers_count
		, confirm_type = collected_data.confirm_type
		, default_access_marks = collected_data.default_access_marks
		, has_default_file_compression = collected_data.has_default_file_compression
		, is_default_readonly = collected_data.is_default_readonly
		, quota_owner = collected_data.quota_owner
		, use_file_categories = collected_data.use_file_categories
	FROM
	(
		SELECT
			fk_ref_vault, safe_id, safe_name, location_id, location_name, creation_datetime, created_by, last_used, safe_size, max_size, used_size, is_virus_free, is_text_only, access_location, security_level, require_reason_to_retrieve, enforce_exclusive_passwords, require_content_validation, log_retention_period, objects_retention_period, requests_retention_period, safe_delay, from_hour, to_hour, daily_versions, monthly_versions, yearly_versions, number_of_password_versions, share_options, confirmers_count, confirm_type, default_access_marks, has_default_file_compression, is_default_readonly, quota_owner, use_file_categories
		FROM
			ii_custom.
		WHERE
			fk_ref_vault = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid
	) collected_data
	WHERE
		ipc_evd_ref_safe.safe_id = collected_data.safe_id
		AND ipc_evd_ref_safe.fk_ref_vault_id = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid;

Solution proposée

  1. Utilisation du WITH pour certaines opérations.
  2. Utilisation du UPSERT ou UPDATE  ON CONFLICT lors d'un INSERT

 

Avantage premier : limiter les SELECT

  UPDATE
  		ii_protec.ipc_evd_ref_safe
  	SET
  		is_deleted_from_source = TRUE
  		, "__lastmodif_datetime" = now()
  		, "__modif_user_login" = 'Import Process'
    WHERE safe_id IN (
      SELECT
        ref_data.safe_id
      FROM
        ii_protec.ipc_evd_ref_safe ref_data LEFT JOIN ii_custom.test_safe_mat collected_data
        ON ref_data.safe_id = collected_data.safe_id AND ref_data.fk_ref_vault_id = collected_data.fk_ref_vault
       WHERE collected_data.safe_id is NULL
    );

INSERT INTO	ii_protec.ipc_evd_ref_safe
(
  -- II local values
  "__insert_datetime", "__lastmodif_datetime", "__modif_user_login", is_deleted_from_source
  -- Safe references
  , fk_ref_vault_id, safe_id, safe_name, location_id, location_name, creation_datetime, created_by, last_used
  -- Sizing
  , safe_size, max_size, used_size
  -- Security contraints
  , is_virus_free, is_text_only, access_location, security_level, require_reason_to_retrieve, enforce_exclusive_passwords, require_content_validation
  -- Retention + usage contraints
  , log_retention_period, objects_retention_period, requests_retention_period, safe_delay, from_hour, to_hour, daily_versions, monthly_versions, yearly_versions, number_of_password_versions
  -- Options
  , share_options, confirmers_count, confirm_type, default_access_marks, has_default_file_compression, is_default_readonly, quota_owner, use_file_categories
)
SELECT * FROM ii_custom.test_safe collected_data
ON CONFLICT (safe_id)
DO
  UPDATE SET
   -- II local values
   "__lastmodif_datetime" = now()
   , "__modif_user_login" = 'Import Process'
   , is_deleted_from_source = FALSE
   -- Safe references
   , safe_name = EXCLUDED.safe_name
   , location_id = EXCLUDED.location_id
   , location_name = EXCLUDED.location_name
   , creation_datetime = EXCLUDED.creation_datetime
   , created_by = EXCLUDED.created_by
   , last_used = EXCLUDED.last_used
   -- Sizing
   , safe_size = EXCLUDED.safe_size
   , max_size = EXCLUDED.max_size
   , used_size = EXCLUDED.used_size
   -- Security contraints
   , is_virus_free = EXCLUDED.is_virus_free
   , is_text_only = EXCLUDED.is_text_only
   , access_location = EXCLUDED.access_location
   , security_level = EXCLUDED.security_level
   , require_reason_to_retrieve = EXCLUDED.require_reason_to_retrieve
   , enforce_exclusive_passwords = EXCLUDED.enforce_exclusive_passwords
   , require_content_validation = EXCLUDED.require_content_validation
   -- Retention + usage contraints
   , log_retention_period = EXCLUDED.log_retention_period
   , objects_retention_period = EXCLUDED.objects_retention_period
   , requests_retention_period = EXCLUDED.requests_retention_period
   , safe_delay = EXCLUDED.safe_delay
   , from_hour = EXCLUDED.from_hour
   , to_hour = EXCLUDED.to_hour
   , daily_versions = EXCLUDED.daily_versions
   , monthly_versions = EXCLUDED.monthly_versions
   , yearly_versions = EXCLUDED.yearly_versions
   , number_of_password_versions = EXCLUDED.number_of_password_versions
   -- Options
   , share_options = EXCLUDED.share_options
   , confirmers_count = EXCLUDED.confirmers_count
   , confirm_type = EXCLUDED.confirm_type
   , default_access_marks = EXCLUDED.default_access_marks
   , has_default_file_compression = EXCLUDED.has_default_file_compression
   , is_default_readonly = EXCLUDED.is_default_readonly
   , quota_owner = EXCLUDED.quota_owner
   , use_file_categories = EXCLUDED.use_file_categories;

Quatrième problème soulevé

Présence de champs text dans les tables _history contenant beaucoup de charactères : length(json_content) > 1000

Les requêtes sur ces tables sont très lentes.

 

Solution proposée

Utilisation du TOAST sur certaines tables avec un paramètre équilibré :

toast_tuple_target=128

 

Postgresql gerera lui même les champs et les données associées qui devront être stockées sur une TOAST table.

Elle seront compressées.

Les requêtes qui n'utilisent par directement ces champs seront plus rapides.

Cinquième problème soulevé

Après analyse du schéma ii_protec avec l'outil schemaspy, on retrouve plusieurs tables orphelines

Sixième problème soulevé

Après analyse de certaines fonctions comme

ii_protec.sp_ipc_evd_apply_exclusion_rules

On observe des incohérences, pas sûr que toutes les fonctions s'exécutent comme attendu.

CREATE OR REPLACE FUNCTION ii_protec.sp_ipc_evd_apply_exclusion_rules(p_current_user_name character varying, p_schema_name character varying, p_table_name character varying, p_effdate date)
 RETURNS integer
 LANGUAGE plpgsql
AS $function$

DECLARE

	-- -- -- -- MANDATORY -- -- -- -- 

	var_current_object_name text;

	var_error_handler text;

	var_error_details text;

	-- -- -- -- -- -- -- -- -- -- -- 

	

	var_result_status integer; 

	var_update_request text;

	var_where_clause text;

	var_rules record;

	var_associated_rules record;

BEGIN

	var_current_object_name := 'ii_protec.sp_ipc_evd_apply_exclusion_rules()';

	var_result_status := 0;

	

	PERFORM ii_dwh.sp_ii_core_log(p_current_user_name, 20, var_current_object_name, null, 'Start Execution');



	-- PROTEC EVD STORED PROC / 

	-- v1 / 2020.04.24 / Fabien / Initial version
	-- v3 / 2021.01.28 / Valentin / Change apply from with insert datetime (before eff_date)

	-- This stored procedure applies dynamic exclusion rules to table given in parameter (or all tables if empty).

	

	

	-- RESET EXCLUDE FLAG

	var_update_request := 

		'UPDATE '

			|| p_schema_name || '.' || p_table_name 

		|| ' SET '

			|| 'is_excluded_from_reports = FALSE'

		|| ' WHERE '
			|| ' is_deleted_from_source IS FALSE ';

			

	IF p_effdate != NULL THEN	

		var_update_request := var_update_request || ' AND eff_date = ''' || p_effdate || '''';	

	END IF;

			

	EXECUTE var_update_request;

	

	-- APPLY RULES

	FOR var_rules IN (

		SELECT

			* 

		FROM

			ii_protec.ipc_evd_ref_dynamic_exclusion_rule

		WHERE

			valid_to is null

			AND input_table_schema = p_schema_name

			AND input_table_name = p_table_name

			AND fk_and_linked_rule_id IS null

		ORDER BY

			rule_order

	) LOOP

	
		var_where_clause := 'is_deleted_from_source IS FALSE';
			

		var_where_clause := '1=1';

		

		IF var_rules.apply_from != NULL THEN		

			var_where_clause := var_where_clause || ' AND __insert_datetime >= ''' || var_rules.apply_from || '''';		

		END IF;

		

		IF p_effdate != NULL THEN		

			var_where_clause := var_where_clause || ' AND eff_date = ''' || p_effdate || '''';		

		END IF;

		

		var_where_clause := var_where_clause || ' AND ' || ii_dwh.f_ii_core_generate_where_clause(var_rules.input_field_type, p_table_name, var_rules.input_field, var_rules.fk_comparison_type, var_rules.tested_value);

		

		FOR var_associated_rules IN (

			SELECT

				* 

			FROM

				ii_protec.ipc_evd_ref_dynamic_exclusion_rule

			WHERE

				valid_to is null

				AND input_table_schema = p_schema_name

				AND input_table_name = p_table_name

				AND fk_and_linked_rule_id = var_rules.id

			ORDER BY

				rule_order

		) LOOP

		

			var_where_clause := var_where_clause || ' AND ' || ii_dwh.f_ii_core_generate_where_clause(var_associated_rules.input_field_type, p_table_name, var_associated_rules.input_field, var_associated_rules.fk_comparison_type, var_associated_rules.tested_value);

		

		END LOOP;

	

		var_update_request := 

			'UPDATE '

				|| p_schema_name || '.' || p_table_name 

			|| ' SET '

				|| 'is_excluded_from_reports = ' || var_rules.output_value

			|| ' WHERE '

				|| var_where_clause;

		

		EXECUTE var_update_request;

	

	END LOOP;

		

	var_result_status = 1;

	

    RETURN var_result_status;

	

EXCEPTION WHEN OTHERS THEN

	GET STACKED DIAGNOSTICS var_error_handler = PG_EXCEPTION_CONTEXT;

    

	var_error_details := 

		'Error Name: ' || SQLERRM

		|| ' | ----------- | '

		|| 'Error State: ' || SQLSTATE

		|| ' | ----------- | '

		|| 'Error Context: ' || var_error_handler;

		

	PERFORM ii_dwh.sp_ii_core_log(p_current_user_name, 10, var_current_object_name, null, var_error_details);	

	

	RETURN var_result_status;

END;

$function$

Autres pistes d'amélioration

  1. Colonnes calculées plutôt que requêtes update.
  2. Optimisation des index.

Optimisation du process

Possibilité de regrouper les différentes steps (1 à 10) en une seule action.

Une seule action à enclencher par l'utilisateur.

Nécessité d'avoir un retour sur les performances de l'exécution des différentes fonctions (voir début du slide sur les métriques)

On pourrait imaginer un rendu récapitulant l'exécution, le nombre de calls des différentes fonctions, le nombre d'erreurs, le temps d'exécution cumulé de chaque fonctions par step.

Limitations

Après analyse, je ne sais pas si toutes les différentes tables sont nécessaires, il y a peut etre des répétitions de données entre les tables.

Je n'ai pas de vision du process métier.

Cyberark

By Simon Ducournau