Construindo seu código

Problema

Template renderer

Dado um template Jinja, do github ou de um arquivo, renderiza ele com um conjunto de parâmetros (linha de comando ou local)

SELECT
    coluna1,
    coluna3
    coluna3 as '{{nome}}'
FROM
    my_table
limit {{max_result| int}}
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"annotations":{},"name":"trajetorias-saopaulo","namespace":"mobilidade"},"spec":{"arguments":["--source-file-type","parquet","--partition-columns","unidade_cabana,event_date","--source-paths","s3a://kognita-data-lake-sp/mobilidade/by_municipio-v10/creation_date=*/uf=RJ/municipio=São Paulo/|s3a://kognita-data-lake/mobilidade/cabana/uf=RJ/municipio=São Paulo","--base-paths","s3a://kognita-data-lake-sp/mobilidade/by_municipio-v10/|s3a://kognita-data-lake/mobilidade/cabana/","--output-path","s3a://kognita-data-lake/mobilidade/trajetorias/cabana/uf=RJ/municipio=São Paulo","--sql-transformer-path","s3a://kognita-temporary-lake/kfp/b47b9e60-1c92-4fa3-af45-db0536988b67.sql","--paralelism","1"],"driver":{"annotations":{"prometheus.io/path":"/metrics","prometheus.io/port":"8090","prometheus.io/scrape":"true"},"cores":4,"envSecretKeyRefs":{"AWS_ACCESS_KEY_ID":{"key":"AWS_ACCESS_KEY_ID","name":"s3-storage-test"},"AWS_DEFAULT_REGION":{"key":"AWS_DEFAULT_REGION","name":"s3-storage-test"},"AWS_SECRET_ACCESS_KEY":{"key":"AWS_SECRET_ACCESS_KEY","name":"s3-storage-test"}},"labels":{"kognita.ai/group":"mobilidade","kognita.ai/job":"partition-builder"},"serviceAccount":"sparkjob"},"dynamicAllocation":{"enabled":true,"initialExecutors":1,"maxExecutors":15,"minExecutors":0},"executor":{"annotations":{"prometheus.io/path":"/metrics","prometheus.io/port":"8090","prometheus.io/scrape":"true"},"coreLimit":"8","cores":8,"deleteOnTermination":true,"envSecretKeyRefs":{"AWS_ACCESS_KEY_ID":{"key":"AWS_ACCESS_KEY_ID","name":"s3-storage-test"},"AWS_DEFAULT_REGION":{"key":"AWS_DEFAULT_REGION","name":"s3-storage-test"},"AWS_SECRET_ACCESS_KEY":{"key":"AWS_SECRET_ACCESS_KEY","name":"s3-storage-test"}},"instances":3,"labels":{"kognita.ai/group":"mobilidade","kognita.ai/job":"partition-builder"},"memory":"24Gb"},"image":"018457460817.dkr.ecr.sa-east-1.amazonaws.com/partition-builder:1.1.4-scala2.12","imagePullPolicy":"IfNotPresent","mainApplicationFile":"local:///home/runner/partition-builder.jar","mainClass":"ai.kognita.de.partitionbuilder.Main","mode":"cluster","monitoring":{"exposeDriverMetrics":true,"exposeExecutorMetrics":true,"prometheus":{"jmxExporterJar":"/home/runner/jmx_prometheus_javaagent.jar","port":8090}},"restartPolicy":{"onFailureRetries":1,"onFailureRetryInterval":5,"onSubmissionFailureRetries":1,"onSubmissionFailureRetryInterval":1,"type":"Never"},"sparkConf":{"spark.hadoop.fs.s3a.committer.abort.pending.uploads":"false","spark.hadoop.fs.s3a.committer.magic.enabled":"true","spark.hadoop.fs.s3a.committer.staging.conflict-mode":"append","spark.hadoop.fs.s3a.fast.upload":"true","spark.hadoop.fs.s3a.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem","spark.hadoop.fs.s3a.multiobjectdelete.enable":"false","spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version":"2","spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored":"true","spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped":"true","spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory","spark.hadoop.orc.overwrite.output.file":"true","spark.hadoop.parquet.enable.summary-metadata":"false","spark.hadoop.parquet.overwrite.output.file":"true","spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.sql.hive.metastorePartitionPruning":"true","spark.sql.parquet.filterPushdown":"true","spark.sql.parquet.int96RebaseMode":"CORRECTED","spark.sql.parquet.mergeSchema":"false","spark.ui.showConsoleProgress":"true"},"sparkVersion":"3.3.0","type":"Scala"}}
  creationTimestamp: "2023-04-05T20:49:52Z"
  generation: 2
  name: trajetorias-{{bandeira}}-{{municipio | lower | replace("ó","o") | replace("á","a") | replace("é","e") | replace("í", "i") | replace("ú","u") | replace("ü","u") | replace("ã","a") | replace("ẽ","e")| replace("õ","o")| replace("ç","c")| replace(" ","")| replace("â","a")| replace("ê","e")| replace("ô","o")| replace("'","")}}
  namespace: mobilidade
  resourceVersion: "63440696"
  uid: 9f25c25e-5d8e-4de4-87b0-53883702345a
spec:
  arguments:
  - --source-file-type
  - parquet
  - --partition-columns
  - unidade_{{bandeira}},event_date
  - --source-paths
  - s3a://kognita-data-lake-sp/mobilidade/by_municipio-v10/creation_date=*/uf={{uf}}/municipio={{municipio}}/|s3a://kognita-data-lake/mobilidade/{{bandeira}}/uf={{uf}}/municipio={{municipio}}
  - --base-paths
  - s3a://kognita-data-lake-sp/mobilidade/by_municipio-v10/|s3a://kognita-data-lake/mobilidade/{{bandeira}}/
  - --output-path
  - s3a://kognita-data-lake/mobilidade/trajetorias/{{bandeira}}/uf={{uf}}/municipio={{municipio}}
  - --sql-transformer-path
  - {{sql_transformer_path}}
  - --paralelism
  - "1"
  .
  .
  .

Composição do Módulo

Todo módulo é composto de um ou mais entrypoints (funções ou estruturas) e de acessórios restritos aos níveis seguintes

Módulos

  • Aplicação instalável (um único modulo principal)
  • Único entrypoint por módulo executável
  • Segregação de responsabilidades
  • Escopos corretos (problema em Python)

Módulo main

Contém apenas a função principal, que orquestra as demais

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

Parametros/Argumentos

Tudo feito via biblioteca click, mas podemos fazer de outras formas

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

Setup de execução

Nesse caso, apenas log com level fixo

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

Dependências

Delegada para outros módulos

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

persistence/loader.py

def load_template(template_source_path: str, template_source_type: str, github_token, github_organization,
                  github_repository, git_reference) -> Tuple[str, str]:
    if template_source_type == GITHUB_SOURCE_TYPE:
        template = _load_github_template(template_source_path, github_token, github_organization,
                                         github_repository, git_reference)
    else:
        template = _from_other_source(template_source_path)

    return template
def load_parameters_file(source_path: str) -> List[Dict[str, str]]:
    parameters_dict_list = []
    with open(source_path) as source:
        rows = csv.DictReader(source)

        for row in rows:
            parameters_dict_list.append(row)

    return parameters_dict_list

persistence/loader.py

def load_template(template_source_path: str, template_source_type: str, github_token, github_organization,
                  github_repository, git_reference) -> Tuple[str, str]:
    if template_source_type == GITHUB_SOURCE_TYPE:
        template = _load_github_template(template_source_path, github_token, github_organization,
                                         github_repository, git_reference)
    else:
        template = _from_other_source(template_source_path)

    return template
def _load_github_template(source: str, github_token: str, organization: str, github_repository,
                          reference: str) -> str:
    github_client = Github(login_or_token=github_token)
    org = github_client.get_organization(organization)
    repository = org.get_repo(github_repository)
    contents = repository.get_contents(source, reference).decoded_content.decode("UTF-8")

    return contents

persistence/loader.py

def load_template(template_source_path: str, template_source_type: str, github_token, github_organization,
                  github_repository, git_reference) -> Tuple[str, str]:
    if template_source_type == GITHUB_SOURCE_TYPE:
        template = _load_github_template(template_source_path, github_token, github_organization,
                                         github_repository, git_reference)
    else:
        template = _from_other_source(template_source_path)

    return template
def _from_other_source(source: str) -> str:
    with open(source, "r") as source_file:
        source_data = source_file.read()
        return source_data

Dependências

Delegada para outros módulos

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

parser/parameters.py

def make_parameter_dict_list(parameters, parameters_separator, parameters_type):
    if parameters_type == RAW_TEMPLATE_TYPE:
        parameters_dict_list = make_parameters_dict(parameters, parameters_separator)
    else:
        parameters_dict_list = load_parameters_file(parameters)
    return parameters_dict_list
def _make_parameters_dict(parameters: str, parameters_separator: str) -> List[Dict[str, str]]:
    parameters_list = parameters.split(parameters_separator) if parameters else []
    parameters_key_value = [key_value.split("=", 1) for key_value in parameters_list if key_value]
    parameters = {name: value for name, value in parameters_key_value}

    return [parameters]

parser/parameters.py

def make_parameter_dict_list(parameters, parameters_separator, parameters_type):
    if parameters_type == RAW_TEMPLATE_TYPE:
        parameters_dict_list = make_parameters_dict(parameters, parameters_separator)
    else:
        parameters_dict_list = load_parameters_file(parameters)
    return parameters_dict_list
def _make_parameters_dict(parameters: str, parameters_separator: str) -> List[Dict[str, str]]:
    parameters_list = parameters.split(parameters_separator) if parameters else []
    parameters_key_value = [key_value.split("=", 1) for key_value in parameters_list if key_value]
    parameters = {name: value for name, value in parameters_key_value}

    return [parameters]

parser/parameters.py

def make_parameter_dict_list(parameters, parameters_separator, parameters_type):
    if parameters_type == RAW_TEMPLATE_TYPE:
        parameters_dict_list = make_parameters_dict(parameters, parameters_separator)
    else:
        parameters_dict_list = load_parameters_file(parameters)
    return parameters_dict_list
def load_parameters_file(source_path: str) -> List[Dict[str, str]]:
    parameters_dict_list = []

    with open(source_path) as source:
        rows = csv.DictReader(source)

        for row in rows:
            parameters_dict_list.append(row)

    return parameters_dict_list

persistence/loader.py

"Módulos de funções impuras podem ser reaproveitadas em módulos de processamento para melhorar o módulo main, mas isso ainda é um acoplamento que deve ser avaliado."

Man, He

Ação Core do job

Delegada para outros módulos

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

processors/renderer.py

def process_parameter_dict_list(parameters_dict_list: List[Dict[str, str]], template: Tuple[str, str]) -> List[str]:
    rendered_templates = []

    for parameters_dict in parameters_dict_list:
        logging.info(f"Received parameters: {parameters_dict}")
        logging.info(f"Raw template: \n {template}")

        rendered_template = _render_template(template, parameters_dict)
        rendered_templates.append(rendered_template)
        logging.info(f"Rendered template \n {rendered_template}")

    return rendered_templates
def _render_template(template: Tuple[str, str], parameters_dict: Dict[str, str])-> str:
    template = Environment(loader=BaseLoader()).from_string(template)
    rendered = template.render(**parameters_dict)
    return rendered.strip()

"Simplifique suas funções complexas (grandes, com loops ou condicionais) criando funções auxiliares privadas que respondem por uma única responsabilidade."

Man, He

"Não existe nenhum problema em escrever uma função que será usada apenas uma vez. Se ela simplificar a leitura do código, ela é válida e muito útil. Sempre pense nisso!"

Output (Salvamento)

Delegada para outros módulos

@click.command()
@click.option("--source-template-path", "-s", type=click.STRING, required=True, help=SOURCE_TEMPLATE_PATH_HELP)
@click.option("--template-source-type", "-t", type=click.Choice(TEMPLATE_SOURCE_TYPES, case_sensitive=False),
              default=FILE_SOURCE_TYPE, help=TEMPLATE_SOURCE_TYPE_HELP)
@click.option("--parameters", "-p", type=click.STRING, help=PARAMETERS_HELP)
@click.option("--parameters-type", type=click.Choice(choices=[FILE_TEMPLATE_TYPE, RAW_TEMPLATE_TYPE]),
              default=RAW_TEMPLATE_TYPE, help=PARAMETERS_TYPE_HELP)
@click.option("--parameters-separator", default=PARAMETERS_SEPARATOR_DEFAULT, help=PARAMETERS_SEPARATOR_HELP)
@click.option("--output-path", "-o", type=click.STRING, help=OUTPUT_PATH_HELP)
@click.option("--github-token", envvar=GITHUB_TOKEN_ENV, help=GITHUB_TOKEN_HELP)
@click.option("--github-organization", envvar=GITHUB_ORGANIZATION_ENV, help=GITHUB_ORGANIZATION_HELP)
@click.option("--github-repository", help=GITHUB_REPOSITORY_HELP)
@click.option("--git-reference", default=DEFAULT_GIT_REFERENCE, help=GIT_REFERENCE_HELP)
def main(source_template_path: str, template_source_type: str, parameters: str, parameters_type: str,
         parameters_separator: str, output_path: str, github_token: Optional[str], github_organization: Optional[str],
         github_repository: Optional[str], git_reference: Optional[str]):

    logging.basicConfig(level=logging.INFO)

    template = load_template(source_template_path, template_source_type, github_token, github_organization,
                             github_repository, git_reference)

    parameters_dict_list = make_parameter_dict_list(parameters, parameters_separator, parameters_type)
    rendered_templates = process_parameter_dict_list(parameters_dict_list, template)
    save_rendered_by_template_type(output_path, parameters_type, rendered_templates)


if __name__ == "__main__":
    main()

persistence/saver.py

def save_rendered_by_template_type(output_path, parameters_type, rendered_templates):
    if parameters_type == RAW_TEMPLATE_TYPE:
        _save_rendered_template(rendered_templates[0], output_path)
    else:
        _save_rendered_templates(rendered_templates, output_path)
def _save_rendered_template(rendered_template, output_path):
    with open(output_path, "w") as template_file_object:
        template_file_object.write(rendered_template)
        template_file_object.write("\n---")
        template_file_object.close()

"Simplifique seu código reusando funções existentes, mas não crie funções só pra que sejam reutilizadas, isso prejudica a responsabilidade"

Man, He

def _save_rendered_templates(rendered_templates, output_path):
    os.makedirs(output_path, exist_ok=True)
    for index, rendered_template in enumerate(rendered_templates):
        _save_rendered_template(rendered_template, f"{output_path}/{index}")

../setup.py

from setuptools import setup, find_packages

setup(
    name='template-render',
    version='2.0.0',
    packages=find_packages(),
    url='',
    license='',
    author='de-team',
    author_email='',
    description='',
    setup_requires=['wheel'],
    install_requires=[
        "Jinja2>=3.1.2",
        "click>=8.1.3",
        "smart-open[all]>=6.0.0",
        "pyarrow>=7.0.0",
        "PyGithub>=1.55",
        "objsize>=0.3.3"
    ],
    entry_points={
        'console_scripts': [
            'render=template_render.main:main'
        ]
    }
)
  • Concentre versões e dependências no setup.py
  • Se for um executável, garanta o entrypoint exposto.
  • Use semantic versioning, nunca deixe versões em aberto.

Recapitulando

  • Módulo main não tem lógica, apenas chamados
  • Parametros -> Setup -> Dependências -> Core -> Output
  • O único IO possível fora do output são logs
  • Cada dependência gera insumos pro módulo core
  • Apenas o módulo core conhece todas as dependências
  • IO de entrada pode ser usado fora da main, mas deve ser evitado

Próximos encontro

  • Construir o código em live coding
  • Construir o mesmo código em tecnologias diferentes
  • Começar a incluir testes

Organização de código e modularização - Parte 1

By André Claudino

Organização de código e modularização - Parte 1

Neste Deck, vamos discutir como organizar um código que renderiza um template como etapa de um pipeline, ele pode ser usado para renderizar queries SQL, objetos do kubernetes ou qualquer outra coisa que possa ser representada em modo texto.

  • 64