Cyberak
Analyse en vue d'une refactorisation
Contexte
- 1: Le scope VAULT_EVD1 ( par la suite VAULT_API1)
- 2: Entrées et sorties doivent dans la mesure du possible rester identiques
- 3: Limiter le nombre de schema (ii_collect, ii_custom, ii_protec) --> (ii_collect, ii_protec)
- 4: Limiter l'usage des vues (revoir les vue de ii_custom)
- 5: Refactoriser le process de post process -> amelioration de la performance
Stats de la pipeline obtenues à partir du collecteur de statistiques dans un contexte de prod.
--1st step à faire qu'une fois:
SET track_functions = 'all';
--Boucle sur chaque fonction/ a chaque fois lancer une instruction à la fois:
--1st step:
SELECT pg_stat_reset();
--2nd step:
SELECT ii_protec.sp_ipc_evd_ref_import_cyberark_data('212af752-0ee2-11eb-b47a-0242ac120004'::uuid);
--3rd step:
SELECT * from pg_stat_user_functions;
Voir comment automatiser la collecte de statistiques par étapes afin d'obtenir des métriques et repérer des anomalies.
Création d'une table contenant les logs des métriques ?
Premier problème soulevé
L'accumulation des vues
- Le schéma ii_collect contient les vues liant tous les vaults par type de table importée.
- Différentes vues sont construites dans le schéma ii_protec à partir de ces dernières :
- ii_protec.v_ipc_collect_evd_safe
- ii_protec.v_ipc_collect_evd_safe_modified
--
-- Type: VIEW ; Name: v_ipc_collect_evd_safe; Owner: postgres
--
CREATE OR REPLACE VIEW ii_protec.v_ipc_collect_evd_safe AS
SELECT collect_safe.fk_ref_vault AS fk_ref_vault_id,
collect_safe."CASSafeID" AS safe_id,
collect_safe."CASSafeName" AS safe_name,
collect_safe."CASLocationID" AS location_id,
collect_safe."CASLocationName" AS location_name,
collect_safe."CASCreationDate" AS creation_datetime,
collect_safe."CASCreatedBy" AS created_by,
collect_safe."CASLastUsed" AS last_used,
collect_safe."CASSize" AS safe_size,
collect_safe."CASMaxSize" AS max_size,
collect_safe."CASUsedSize" AS used_size,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASVirusFree"::text) AS is_virus_free,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASTextOnly"::text) AS is_text_only,
collect_safe."CASAccessLocation" AS access_location,
collect_safe."CASSecurityLevel" AS security_level,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireReasonToRetrieve"::text) AS require_reason_to_retrieve,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASEnforceExclusivePasswords"::text) AS enforce_exclusive_passwords,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireContentValidation"::text) AS require_content_validation,
collect_safe."CASLogRetentionPeriod" AS log_retention_period,
collect_safe."CASObjectsRetentionPeriod" AS objects_retention_period,
collect_safe."CASRequestsRetentionPeriod" AS requests_retention_period,
collect_safe."CASDelay" AS safe_delay,
collect_safe."CASFromHour" AS from_hour,
collect_safe."CASToHour" AS to_hour,
collect_safe."CASDailyVersions" AS daily_versions,
collect_safe."CASMonthlyVersions" AS monthly_versions,
collect_safe."CASYearlyVersions" AS yearly_versions,
collect_safe."CASNumberOfPasswordVersions" AS number_of_password_versions,
collect_safe."CASShareOptions" AS share_options,
collect_safe."CASConfirmersCount" AS confirmers_count,
collect_safe."CASConfirmType" AS confirm_type,
collect_safe."CASDefaultAccessMarks" AS default_access_marks,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultFileCompression"::text) AS has_default_file_compression,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultReadOnly"::text) AS is_default_readonly,
collect_safe."CASQuotaOwner" AS quota_owner,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASUseFileCategories"::text) AS use_file_categories
FROM ii_custom.v_collect_evd_safe collect_safe;
ALTER VIEW ii_protec.v_ipc_collect_evd_safe OWNER TO postgres;
--
-- Type: VIEW ; Name: v_ipc_collect_evd_safe_modified; Owner: postgres
--
CREATE OR REPLACE VIEW ii_protec.v_ipc_collect_evd_safe_modified AS
SELECT collect_safe.fk_ref_vault AS fk_ref_vault_id,
collect_safe."CASSafeID" AS safe_id,
collect_safe."CASSafeName" AS safe_name,
collect_safe."CASLocationID" AS location_id,
collect_safe."CASLocationName" AS location_name,
collect_safe."CASCreationDate" AS creation_datetime,
collect_safe."CASCreatedBy" AS created_by,
collect_safe."CASLastUsed" AS last_used,
collect_safe."CASSize" AS safe_size,
collect_safe."CASMaxSize" AS max_size,
collect_safe."CASUsedSize" AS used_size,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASVirusFree"::text) AS is_virus_free,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASTextOnly"::text) AS is_text_only,
collect_safe."CASAccessLocation" AS access_location,
collect_safe."CASSecurityLevel" AS security_level,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireReasonToRetrieve"::text) AS require_reason_to_retrieve,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASEnforceExclusivePasswords"::text) AS enforce_exclusive_passwords,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASRequireContentValidation"::text) AS require_content_validation,
collect_safe."CASLogRetentionPeriod" AS log_retention_period,
collect_safe."CASObjectsRetentionPeriod" AS objects_retention_period,
collect_safe."CASRequestsRetentionPeriod" AS requests_retention_period,
collect_safe."CASDelay" AS safe_delay,
collect_safe."CASFromHour" AS from_hour,
collect_safe."CASToHour" AS to_hour,
collect_safe."CASDailyVersions" AS daily_versions,
collect_safe."CASMonthlyVersions" AS monthly_versions,
collect_safe."CASYearlyVersions" AS yearly_versions,
collect_safe."CASNumberOfPasswordVersions" AS number_of_password_versions,
collect_safe."CASShareOptions" AS share_options,
collect_safe."CASConfirmersCount" AS confirmers_count,
collect_safe."CASConfirmType" AS confirm_type,
collect_safe."CASDefaultAccessMarks" AS default_access_marks,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultFileCompression"::text) AS has_default_file_compression,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASDefaultReadOnly"::text) AS is_default_readonly,
collect_safe."CASQuotaOwner" AS quota_owner,
ii_dwh.f_ii_core_cast_text_as_boolean(collect_safe."CASUseFileCategories"::text) AS use_file_categories,
false AS is_deleted_from_source
FROM ii_custom.v_collect_evd_safe collect_safe
EXCEPT
SELECT ref_safe.fk_ref_vault_id,
ref_safe.safe_id,
ref_safe.safe_name,
ref_safe.location_id,
ref_safe.location_name,
ref_safe.creation_datetime,
ref_safe.created_by,
ref_safe.last_used,
ref_safe.safe_size,
ref_safe.max_size,
ref_safe.used_size,
ref_safe.is_virus_free,
ref_safe.is_text_only,
ref_safe.access_location,
ref_safe.security_level,
ref_safe.require_reason_to_retrieve,
ref_safe.enforce_exclusive_passwords,
ref_safe.require_content_validation,
ref_safe.log_retention_period,
ref_safe.objects_retention_period,
ref_safe.requests_retention_period,
ref_safe.safe_delay,
ref_safe.from_hour,
ref_safe.to_hour,
ref_safe.daily_versions,
ref_safe.monthly_versions,
ref_safe.yearly_versions,
ref_safe.number_of_password_versions,
ref_safe.share_options,
ref_safe.confirmers_count,
ref_safe.confirm_type,
ref_safe.default_access_marks,
ref_safe.has_default_file_compression,
ref_safe.is_default_readonly,
ref_safe.quota_owner,
ref_safe.use_file_categories,
ref_safe.is_deleted_from_source
FROM ii_protec.ipc_evd_ref_safe ref_safe;
ALTER VIEW ii_protec.v_ipc_collect_evd_safe_modified OWNER TO postgres;
Problème avec les vues : elles sont recalculés à chaque requête.
Ces trois vues sont utilisées dans la step_1.ii_protec.sp_ipc_evd_ref_import_cyberark_data(uuid)
On utilise donc trois vues et à chaque SELECT on recalcule les différents castings (notemment boolean, timestamp, ect)
Solution proposée
- Commenter toutes les tables des différents vaults avec leur uuid associé.
- Créer une table temporaire à partir de la table du vault souhaité.
- La table n'est pas journalisée et sera supprimée à la fin de la transaction.
- Tout est calculé une seule fois, pas besoin de caster les valeurs à chaque SELECT.
- Le défaut principal : usage des requêtes dynamic SQL pour la création des tables.
COMMENT ON TABLE ii_collect.vault1_evd_casafes IS '212af752-0ee2-11eb-b47a-0242ac120004';
do $$
DECLARE
var_relation regclass;
BEGIN
SELECT ('ii_collect.'||relname)::regclass FROM pg_class WHERE obj_description(oid) = '212af752-0ee2-11eb-b47a-0242ac120004' and relname ~ 'casafes'
into var_relation;
EXECUTE format('CREATE TEMP UNLOGGED TABLE ii_custom.test_safe_table as SELECT
-- II local values
now() as "__insert_datetime", now() as "__lastmodif_datetime", ''Import Process'' as "__modif_user_login", FALSE as is_deleted_from_source,
-- Safe references
''212af752-0ee2-11eb-b47a-0242ac120004''::uuid as fk_ref_vault,
vault."CASSafeID" as safe_id,
vault."CASSafeName" as safe_name,
vault."CASLocationID" as location_id,
vault."CASLocationName" as location_name,
ii_custom.f_ipc_cast_text_as_timestamp(((vault."CASCreationDate")::text)::character varying, ''212af752-0ee2-11eb-b47a-0242ac120004''::uuid) AS creation_datetime,
vault."CASCreatedBy" as created_by,
ii_custom.f_ipc_cast_text_as_timestamp(((vault."CASLastUsed")::text)::character varying, ''212af752-0ee2-11eb-b47a-0242ac120004''::uuid) AS last_used,
vault."CASSize" as safe_size,
vault."CASMaxSize" as max_size,
vault."CASUsedSize" as used_size,
public.cast_text_as_boolean(vault."CASVirusFree"::text) as is_virus_free,
public.cast_text_as_boolean(vault."CASTextOnly"::text) as is_text_only,
vault."CASAccessLocation" as access_location,
vault."CASSecurityLevel" as security_level,
public.cast_text_as_boolean(vault."CASRequireReasonToRetrieve"::text) as require_reason_to_retrieve,
public.cast_text_as_boolean(vault."CASEnforceExclusivePasswords"::text) as enforce_exclusive_passwords,
public.cast_text_as_boolean(vault."CASRequireContentValidation"::text) as require_content_validation,
vault."CASLogRetentionPeriod" as log_retention_period,
vault."CASObjectsRetentionPeriod" as objects_retention_period,
vault."CASRequestsRetentionPeriod" as requests_retention_period,
vault."CASDelay" as safe_delay,
vault."CASFromHour" as from_hour,
vault."CASToHour" as to_hour,
vault."CASDailyVersions" as daily_versions,
vault."CASMonthlyVersions" as monthly_versions,
vault."CASYearlyVersions" as yearly_versions,
vault."CASNumberOfPasswordVersions" as number_of_password_versions,
vault."CASShareOptions" as share_options,
vault."CASConfirmersCount" as confirmers_count,
vault."CASConfirmType" as confirm_type,
vault."CASDefaultAccessMarks" as default_access_marks,
public.cast_text_as_boolean(vault."CASDefaultFileCompression"::text) as has_default_file_compression,
public.cast_text_as_boolean(vault."CASDefaultReadOnly"::text) as is_default_readonly,
vault."CASQuotaOwner" as quota_owner,
public.cast_text_as_boolean(vault."CASUseFileCategories"::text) as use_file_categories
FROM
%s vault',var_relation);
end;
$$
Deuxième problème soulevé
Fonction de casting non optimisées
Function de base cast as boolean
Function optimisée cast as boolean
6,4 plus rapide pour le même jeu de données
CREATE OR REPLACE FUNCTION public.cast_text_as_boolean(p_text_input text)
RETURNS boolean
LANGUAGE plpgsql
IMMUTABLE
AS $function$
DECLARE
-- -- -- -- MANDATORY -- -- -- --
var_current_object_name text;
var_error_handler text;
var_error_details text;
-- -- -- -- -- -- -- -- -- -- --
v_result boolean := false;
BEGIN
v_result = COALESCE(NULLIF(p_text_input,''),'f')::boolean;
RETURN v_result;
EXCEPTION
WHEN OTHERS THEN
IF p_text_input ~* '^[OV].*'
THEN
v_result = true;
ELSE
GET STACKED DIAGNOSTICS var_error_handler = PG_EXCEPTION_CONTEXT;
var_current_object_name := 'ii_dwh.f_ii_core_cast_text_as_boolean()';
var_error_details :=
'Error Name: ' || SQLERRM
|| ' | ----------- | '
|| 'Error State: ' || SQLSTATE
|| ' | ----------- | '
|| 'Error Context: ' || var_error_handler;
PERFORM ii_dwh.sp_ii_core_log('', 10, var_current_object_name, null, var_error_details);
END IF;
RETURN v_result;
END;
$function$
CREATE OR REPLACE FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(p_text_input text)
RETURNS boolean
LANGUAGE plpgsql
IMMUTABLE
AS $function$
DECLARE
-- -- -- -- MANDATORY -- -- -- --
var_current_object_name text;
var_error_handler text;
var_error_details text;
-- -- -- -- -- -- -- -- -- -- --
v_result boolean;
BEGIN
var_current_object_name := 'ii_dwh.f_ii_core_cast_text_as_boolean()';
-- PERFORM ii_dwh.sp_ii_core_log('', 30, var_current_object_name, null, 'Start Execution');
-- CORE FUNCTION /
-- v1 / 2020.01.10 / Fabien / Initial version
-- This function casts text to boolean.
-- INPUT: value to cast as text
-- OUTPUT: value as boolean
v_result := false;
-- PERFORM ii_dwh.sp_ii_core_log('', 40, var_current_object_name, null, 'p_text_input => ' || p_text_input);
SELECT
CASE UPPER(trim(p_text_input))
WHEN '1' THEN true
WHEN 'VRAI' THEN true
WHEN 'TRUE' THEN true
WHEN 'OUI' THEN true
WHEN 'YES' THEN true
WHEN 'T' THEN true
WHEN 'Y' THEN true
WHEN 'O' THEN true
WHEN '0' THEN false
WHEN 'FAUX' THEN false
WHEN 'FALSE' THEN false
WHEN 'NO' THEN false
WHEN 'NON' THEN false
WHEN 'N' THEN false
WHEN 'F' THEN false
ELSE false
END
INTO
v_result;
RETURN v_result;
EXCEPTION WHEN OTHERS THEN
GET STACKED DIAGNOSTICS var_error_handler = PG_EXCEPTION_CONTEXT;
var_error_details :=
'Error Name: ' || SQLERRM
|| ' | ----------- | '
|| 'Error State: ' || SQLSTATE
|| ' | ----------- | '
|| 'Error Context: ' || var_error_handler;
PERFORM ii_dwh.sp_ii_core_log('', 10, var_current_object_name, null, var_error_details);
RETURN v_result;
END;
$function$;
ALTER FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(text) OWNER TO postgres;
GRANT EXECUTE ON FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(text) TO PUBLIC;
GRANT EXECUTE ON FUNCTION ii_dwh.f_ii_core_cast_text_as_boolean(text) TO postgres WITH GRANT OPTION;
Troisième problème soulevé
Multiplication des requêtes
Sur chaque table :
- UPDATE tag as deleted
- UPDATE
- INSERT
Problème à chaque fois on relance une requête SELECT pour la clause FROM.
Associé au premier problème, on comprend qu'à chaque fois on SELECT sur une vue qui se recalcule à chaque SELECT.
UPDATE
ii_protec.ipc_evd_ref_safe
SET
is_deleted_from_source = TRUE
, "__lastmodif_datetime" = now()
, "__modif_user_login" = 'Import Process'
WHERE
id IN (
SELECT
ref_safe.id
FROM
ii_protec.ipc_evd_ref_safe ref_safe
LEFT JOIN ii_custom.test_safe_mat evd_safe ON
evd_safe.safe_id = ref_safe.safe_id
AND evd_safe.fk_ref_vault = ref_safe.fk_ref_vault_id
WHERE
evd_safe.safe_id IS NULL
AND ref_safe.is_deleted_from_source IS FALSE
AND ref_safe.fk_ref_vault_id = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid
);
INSERT INTO ii_protec.ipc_evd_ref_safe
(
-- II local values
"__insert_datetime", "__lastmodif_datetime", "__modif_user_login", is_deleted_from_source
-- Safe references
, fk_ref_vault_id, safe_id, safe_name, location_id, location_name, creation_datetime, created_by, last_used
-- Sizing
, safe_size, max_size, used_size
-- Security contraints
, is_virus_free, is_text_only, access_location, security_level, require_reason_to_retrieve, enforce_exclusive_passwords, require_content_validation
-- Retention + usage contraints
, log_retention_period, objects_retention_period, requests_retention_period, safe_delay, from_hour, to_hour, daily_versions, monthly_versions, yearly_versions, number_of_password_versions
-- Options
, share_options, confirmers_count, confirm_type, default_access_marks, has_default_file_compression, is_default_readonly, quota_owner, use_file_categories
)
(
SELECT
-- II local values
now(), now(), 'Import Process', FALSE
-- Safe references
, collected_data.fk_ref_vault, collected_data.safe_id, collected_data.safe_name, collected_data.location_id, collected_data.location_name, collected_data.creation_datetime, collected_data.created_by, collected_data.last_used
-- Sizing
, collected_data.safe_size, collected_data.max_size, collected_data.used_size
-- Security contraints
, collected_data.is_virus_free, collected_data.is_text_only, collected_data.access_location, collected_data.security_level, collected_data.require_reason_to_retrieve, collected_data.enforce_exclusive_passwords, collected_data.require_content_validation
-- Retention + usage contraints
, collected_data.log_retention_period, collected_data.objects_retention_period, collected_data.requests_retention_period, collected_data.safe_delay, collected_data.from_hour, collected_data.to_hour, collected_data.daily_versions, collected_data.monthly_versions, collected_data.yearly_versions, collected_data.number_of_password_versions
-- Options
, collected_data.share_options, collected_data.confirmers_count, collected_data.confirm_type, collected_data.default_access_marks, collected_data.has_default_file_compression, collected_data.is_default_readonly, collected_data.quota_owner, collected_data.use_file_categories
FROM
ii_custom.test_safe_mat collected_data
LEFT JOIN ii_protec.ipc_evd_ref_safe ref_safe
ON collected_data.safe_id = ref_safe.safe_id
AND ref_safe.fk_ref_vault_id = collected_data.fk_ref_vault
WHERE
ref_safe.id IS NULL
AND collected_data.fk_ref_vault = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid
);
UPDATE
ii_protec.ipc_evd_ref_safe
SET
-- II local values
"__lastmodif_datetime" = now()
, "__modif_user_login" = 'Import Process'
, is_deleted_from_source = FALSE
-- Safe references
, safe_name = collected_data.safe_name
, location_id = collected_data.location_id
, location_name = collected_data.location_name
, creation_datetime = collected_data.creation_datetime
, created_by = collected_data.created_by
, last_used = collected_data.last_used
-- Sizing
, safe_size = collected_data.safe_size
, max_size = collected_data.max_size
, used_size = collected_data.used_size
-- Security contraints
, is_virus_free = collected_data.is_virus_free
, is_text_only = collected_data.is_text_only
, access_location = collected_data.access_location
, security_level = collected_data.security_level
, require_reason_to_retrieve = collected_data.require_reason_to_retrieve
, enforce_exclusive_passwords = collected_data.enforce_exclusive_passwords
, require_content_validation = collected_data.require_content_validation
-- Retention + usage contraints
, log_retention_period = collected_data.log_retention_period
, objects_retention_period = collected_data.objects_retention_period
, requests_retention_period = collected_data.requests_retention_period
, safe_delay = collected_data.safe_delay
, from_hour = collected_data.from_hour
, to_hour = collected_data.to_hour
, daily_versions = collected_data.daily_versions
, monthly_versions = collected_data.monthly_versions
, yearly_versions = collected_data.yearly_versions
, number_of_password_versions = collected_data.number_of_password_versions
-- Options
, share_options = collected_data.share_options
, confirmers_count = collected_data.confirmers_count
, confirm_type = collected_data.confirm_type
, default_access_marks = collected_data.default_access_marks
, has_default_file_compression = collected_data.has_default_file_compression
, is_default_readonly = collected_data.is_default_readonly
, quota_owner = collected_data.quota_owner
, use_file_categories = collected_data.use_file_categories
FROM
(
SELECT
fk_ref_vault, safe_id, safe_name, location_id, location_name, creation_datetime, created_by, last_used, safe_size, max_size, used_size, is_virus_free, is_text_only, access_location, security_level, require_reason_to_retrieve, enforce_exclusive_passwords, require_content_validation, log_retention_period, objects_retention_period, requests_retention_period, safe_delay, from_hour, to_hour, daily_versions, monthly_versions, yearly_versions, number_of_password_versions, share_options, confirmers_count, confirm_type, default_access_marks, has_default_file_compression, is_default_readonly, quota_owner, use_file_categories
FROM
ii_custom.
WHERE
fk_ref_vault = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid
) collected_data
WHERE
ipc_evd_ref_safe.safe_id = collected_data.safe_id
AND ipc_evd_ref_safe.fk_ref_vault_id = '212af752-0ee2-11eb-b47a-0242ac120004'::uuid;
Solution proposée
- Utilisation du WITH pour certaines opérations.
- Utilisation du UPSERT ou UPDATE ON CONFLICT lors d'un INSERT
Avantage premier : limiter les SELECT
UPDATE
ii_protec.ipc_evd_ref_safe
SET
is_deleted_from_source = TRUE
, "__lastmodif_datetime" = now()
, "__modif_user_login" = 'Import Process'
WHERE safe_id IN (
SELECT
ref_data.safe_id
FROM
ii_protec.ipc_evd_ref_safe ref_data LEFT JOIN ii_custom.test_safe_mat collected_data
ON ref_data.safe_id = collected_data.safe_id AND ref_data.fk_ref_vault_id = collected_data.fk_ref_vault
WHERE collected_data.safe_id is NULL
);
INSERT INTO ii_protec.ipc_evd_ref_safe
(
-- II local values
"__insert_datetime", "__lastmodif_datetime", "__modif_user_login", is_deleted_from_source
-- Safe references
, fk_ref_vault_id, safe_id, safe_name, location_id, location_name, creation_datetime, created_by, last_used
-- Sizing
, safe_size, max_size, used_size
-- Security contraints
, is_virus_free, is_text_only, access_location, security_level, require_reason_to_retrieve, enforce_exclusive_passwords, require_content_validation
-- Retention + usage contraints
, log_retention_period, objects_retention_period, requests_retention_period, safe_delay, from_hour, to_hour, daily_versions, monthly_versions, yearly_versions, number_of_password_versions
-- Options
, share_options, confirmers_count, confirm_type, default_access_marks, has_default_file_compression, is_default_readonly, quota_owner, use_file_categories
)
SELECT * FROM ii_custom.test_safe collected_data
ON CONFLICT (safe_id)
DO
UPDATE SET
-- II local values
"__lastmodif_datetime" = now()
, "__modif_user_login" = 'Import Process'
, is_deleted_from_source = FALSE
-- Safe references
, safe_name = EXCLUDED.safe_name
, location_id = EXCLUDED.location_id
, location_name = EXCLUDED.location_name
, creation_datetime = EXCLUDED.creation_datetime
, created_by = EXCLUDED.created_by
, last_used = EXCLUDED.last_used
-- Sizing
, safe_size = EXCLUDED.safe_size
, max_size = EXCLUDED.max_size
, used_size = EXCLUDED.used_size
-- Security contraints
, is_virus_free = EXCLUDED.is_virus_free
, is_text_only = EXCLUDED.is_text_only
, access_location = EXCLUDED.access_location
, security_level = EXCLUDED.security_level
, require_reason_to_retrieve = EXCLUDED.require_reason_to_retrieve
, enforce_exclusive_passwords = EXCLUDED.enforce_exclusive_passwords
, require_content_validation = EXCLUDED.require_content_validation
-- Retention + usage contraints
, log_retention_period = EXCLUDED.log_retention_period
, objects_retention_period = EXCLUDED.objects_retention_period
, requests_retention_period = EXCLUDED.requests_retention_period
, safe_delay = EXCLUDED.safe_delay
, from_hour = EXCLUDED.from_hour
, to_hour = EXCLUDED.to_hour
, daily_versions = EXCLUDED.daily_versions
, monthly_versions = EXCLUDED.monthly_versions
, yearly_versions = EXCLUDED.yearly_versions
, number_of_password_versions = EXCLUDED.number_of_password_versions
-- Options
, share_options = EXCLUDED.share_options
, confirmers_count = EXCLUDED.confirmers_count
, confirm_type = EXCLUDED.confirm_type
, default_access_marks = EXCLUDED.default_access_marks
, has_default_file_compression = EXCLUDED.has_default_file_compression
, is_default_readonly = EXCLUDED.is_default_readonly
, quota_owner = EXCLUDED.quota_owner
, use_file_categories = EXCLUDED.use_file_categories;
Quatrième problème soulevé
Présence de champs text dans les tables _history contenant beaucoup de charactères : length(json_content) > 1000
Les requêtes sur ces tables sont très lentes.
Solution proposée
Utilisation du TOAST sur certaines tables avec un paramètre équilibré :
toast_tuple_target=128
Postgresql gerera lui même les champs et les données associées qui devront être stockées sur une TOAST table.
Elle seront compressées.
Les requêtes qui n'utilisent par directement ces champs seront plus rapides.
Cinquième problème soulevé
Après analyse du schéma ii_protec avec l'outil schemaspy, on retrouve plusieurs tables orphelines
Sixième problème soulevé
Après analyse de certaines fonctions comme
ii_protec.sp_ipc_evd_apply_exclusion_rules
On observe des incohérences, pas sûr que toutes les fonctions s'exécutent comme attendu.
CREATE OR REPLACE FUNCTION ii_protec.sp_ipc_evd_apply_exclusion_rules(p_current_user_name character varying, p_schema_name character varying, p_table_name character varying, p_effdate date)
RETURNS integer
LANGUAGE plpgsql
AS $function$
DECLARE
-- -- -- -- MANDATORY -- -- -- --
var_current_object_name text;
var_error_handler text;
var_error_details text;
-- -- -- -- -- -- -- -- -- -- --
var_result_status integer;
var_update_request text;
var_where_clause text;
var_rules record;
var_associated_rules record;
BEGIN
var_current_object_name := 'ii_protec.sp_ipc_evd_apply_exclusion_rules()';
var_result_status := 0;
PERFORM ii_dwh.sp_ii_core_log(p_current_user_name, 20, var_current_object_name, null, 'Start Execution');
-- PROTEC EVD STORED PROC /
-- v1 / 2020.04.24 / Fabien / Initial version
-- v3 / 2021.01.28 / Valentin / Change apply from with insert datetime (before eff_date)
-- This stored procedure applies dynamic exclusion rules to table given in parameter (or all tables if empty).
-- RESET EXCLUDE FLAG
var_update_request :=
'UPDATE '
|| p_schema_name || '.' || p_table_name
|| ' SET '
|| 'is_excluded_from_reports = FALSE'
|| ' WHERE '
|| ' is_deleted_from_source IS FALSE ';
IF p_effdate != NULL THEN
var_update_request := var_update_request || ' AND eff_date = ''' || p_effdate || '''';
END IF;
EXECUTE var_update_request;
-- APPLY RULES
FOR var_rules IN (
SELECT
*
FROM
ii_protec.ipc_evd_ref_dynamic_exclusion_rule
WHERE
valid_to is null
AND input_table_schema = p_schema_name
AND input_table_name = p_table_name
AND fk_and_linked_rule_id IS null
ORDER BY
rule_order
) LOOP
var_where_clause := 'is_deleted_from_source IS FALSE';
var_where_clause := '1=1';
IF var_rules.apply_from != NULL THEN
var_where_clause := var_where_clause || ' AND __insert_datetime >= ''' || var_rules.apply_from || '''';
END IF;
IF p_effdate != NULL THEN
var_where_clause := var_where_clause || ' AND eff_date = ''' || p_effdate || '''';
END IF;
var_where_clause := var_where_clause || ' AND ' || ii_dwh.f_ii_core_generate_where_clause(var_rules.input_field_type, p_table_name, var_rules.input_field, var_rules.fk_comparison_type, var_rules.tested_value);
FOR var_associated_rules IN (
SELECT
*
FROM
ii_protec.ipc_evd_ref_dynamic_exclusion_rule
WHERE
valid_to is null
AND input_table_schema = p_schema_name
AND input_table_name = p_table_name
AND fk_and_linked_rule_id = var_rules.id
ORDER BY
rule_order
) LOOP
var_where_clause := var_where_clause || ' AND ' || ii_dwh.f_ii_core_generate_where_clause(var_associated_rules.input_field_type, p_table_name, var_associated_rules.input_field, var_associated_rules.fk_comparison_type, var_associated_rules.tested_value);
END LOOP;
var_update_request :=
'UPDATE '
|| p_schema_name || '.' || p_table_name
|| ' SET '
|| 'is_excluded_from_reports = ' || var_rules.output_value
|| ' WHERE '
|| var_where_clause;
EXECUTE var_update_request;
END LOOP;
var_result_status = 1;
RETURN var_result_status;
EXCEPTION WHEN OTHERS THEN
GET STACKED DIAGNOSTICS var_error_handler = PG_EXCEPTION_CONTEXT;
var_error_details :=
'Error Name: ' || SQLERRM
|| ' | ----------- | '
|| 'Error State: ' || SQLSTATE
|| ' | ----------- | '
|| 'Error Context: ' || var_error_handler;
PERFORM ii_dwh.sp_ii_core_log(p_current_user_name, 10, var_current_object_name, null, var_error_details);
RETURN var_result_status;
END;
$function$
Autres pistes d'amélioration
- Colonnes calculées plutôt que requêtes update.
- Optimisation des index.
Optimisation du process
Possibilité de regrouper les différentes steps (1 à 10) en une seule action.
Une seule action à enclencher par l'utilisateur.
Nécessité d'avoir un retour sur les performances de l'exécution des différentes fonctions (voir début du slide sur les métriques)
On pourrait imaginer un rendu récapitulant l'exécution, le nombre de calls des différentes fonctions, le nombre d'erreurs, le temps d'exécution cumulé de chaque fonctions par step.
Limitations
Après analyse, je ne sais pas si toutes les différentes tables sont nécessaires, il y a peut etre des répétitions de données entre les tables.
Je n'ai pas de vision du process métier.
Cyberark
By Simon Ducournau
Cyberark
- 12