Oracle + Django

Why?

Why Not?

Why Not

  • Its only used by a few companies e.g. Salesforce, SAP, Amazon, most banks, 98% F500 Companies, etc etc
  • Its its own ecosystem:
    • DB: Oracle12g DB / Postgres
    • ETL: Oracle Data Integrator / Talend / OpenRefine
    • Realtime: Golden Gate / Splunk / Pub Sub
    • Not alien

Different Ways of Doing Things

SELECT XMLELEMENT("Emp", 
   XMLFOREST(e.employee_id, e.last_name, e.salary))
   "Emp Element"
   FROM employees e WHERE employee_id = 204;

Emp Element
----------------------------------------------------------------
<Emp>
  <EMPLOYEE_ID>204</EMPLOYEE_ID>
  <LAST_NAME>Baer</LAST_NAME>
  <SALARY>10000</SALARY>
</Emp>

Text

  • XML Serialisation / Deserialisation From Query
  • + Table Trailing for Updates
  • + Golden Gate Dumping XML -> MQ + Read Receipts
  • = Data Bus
SELECT xmlforest(table_name, column_name)
FROM information_schema.columns
WHERE table_schema = 'pg_catalog';

                                         xmlforest
-------------------------------------------------------------------------------------------
 <table_name>pg_authid</table_name><column_name>rolname</column_name>
 <table_name>pg_authid</table_name><column_name>rolsuper</column_name>

Cross Pollination of Ideas + Learning new stuff

Postgres Too?

Some Oracle + Django Quirks

These are mainly just Oracle Quirks

Table/Column Name Limits

class ThisIsAReallyReallyReallyLongTableName(models.Model):
    thisisareallyreallyreallylongfieldname = models.BooleanField(null=False)
  • Reporting / BI / Other teams will have nothing but lots of praise for you
create table DJAAPP_THISISAREALLYREALLY097E
(
    ID                             NUMBER(11)
generated by default on null as identity primary key,


    THISISAREALLYREALLYREALLYL6BA8 NUMBER(1) not null
check ("THISISAREALLYREALLYREALLYL6BA8" IN (0, 1))
)
/

Table/Column Name Limits

  • In Oracle 12.2 and above the maximum object name length is 128 bytes.

    In Oracle 12.1 and below the maximum object name length is 30 bytes.

  • Technology only seems to be getting better

  • Did I mention that tables / fields are case insensitive

class ThisIsAReallyReallyReallyLongTableName(models.Model):
    normal_sized_field_name = models.BooleanField(null=False)

    class Meta:
        db_table = "dj_long_table_name"
Oracle Postgres
Fast Refresh MV

(Primary Key based +
No grouping, distinct, aggregation, set operations)
Updateable View
Proposed patch submitted for inclusion in PostgreSQL 8.2. Never Made it
Complete Refresh MV Materialized View
Trigger Based Update of Materialized View

Materialized Views

  • Multiple Joins that need to be evaluated resulting in faster queries

  • Search Strategy, Reporting, etc

Materialized Views

class MaterialisedViewName(models.Model):
    field1 = models.IntegerField()
    field2 = models.IntegerField()
    field3 = models.IntegerField()

    class Meta:
        db_table = "vw_my_materialized_view"
        managed = False

    @classmethod
    def refresh_view(cls):
        """
        :return:
        """
        with connection.cursor() as cursor:
            stmt = "BEGIN DBMS_SNAPSHOT.REFRESH('{}','?',
                atomic_refresh=>FALSE,out_of_place=>TRUE);END;;".format(
                cls._meta.db_table)
            cursor.execute(stmt)

Sequences (Oracle 11)

CREATE SEQUENCE emp_sequence
      INCREMENT BY 1
      START WITH 1
      NOMAXVALUE
      NOCYCLE
      CACHE 10;
INSERT INTO Orders_tab (Orderno, Custno)
    VALUES (Order_seq.NEXTVAL, 1032);
CREATE SEQUENCE t1_seq;
CREATE TABLE t1 (
  id          NUMBER DEFAULT t1_seq.NEXTVAL,
  description VARCHAR2(30)
);

Sequences (Oracle 12)

CREATE SEQUENCE t1_seq;
CREATE TABLE t1 (
  id          NUMBER(11) GENERATED BY DEFAULT ON NULL AS IDENTITY NOT NULL PRIMARY KEY,
  description VARCHAR2(30)
);
  • Whenever we copied data between enviornments, sequences would go out of sync. (Datagrip, etc) esp when you're specifying values into the id column

Sequences

"DJANGO"."ISEQ$$_94778".nextval,AUTH_GROUP,ID
"DJANGO"."ISEQ$$_94782".nextval,AUTH_GROUP_PERMISSIONS,ID
"DJANGO"."ISEQ$$_94775".nextval,AUTH_PERMISSION,ID
"DJANGO"."ISEQ$$_94857".nextval,AUTH_ROLE,ID
"DJANGO"."ISEQ$$_94851".nextval,AUTH_SAMLSETTING,ID
"DJANGO"."ISEQ$$_94785".nextval,AUTH_USER,ID
"DJANGO"."ISEQ$$_94860".nextval,AUTH_USER_ACCESS,ID
"DJANGO"."ISEQ$$_94863".nextval,AUTH_USER_ACCESS_FORCS,ID
"DJANGO"."ISEQ$$_94789".nextval,AUTH_USER_GROUPS,ID
"DJANGO"."ISEQ$$_94792".nextval,AUTH_USER_USER_PERMISSIONS,ID
"DJANGO"."ISEQ$$_94806".nextval,DJANGO_ADMIN_LOG,ID
"DJANGO"."ISEQ$$_94771".nextval,DJANGO_CONTENT_TYPE,ID
"DJANGO"."ISEQ$$_94728".nextval,DJANGO_MIGRATIONS,ID
SELECT data_default AS sequence_val
    ,table_name
    ,column_name
FROM all_tab_columns
WHERE OWNER = 'DJANGO'
    AND identity_column = 'YES';

My Management Command

import csv
import os

from django.core.management.base import BaseCommand
from django.db import connection


sequence_update_query = """DECLARE
    maxid NUMBER := {};
    maxseq NUMBER := {};
    temp NUMBER;  -- without this variable Oracle would skip to query the sequence {}
BEGIN
    FOR i IN maxseq .. maxid LOOP
        SELECT RADAR.{}.NEXTVAL INTO temp FROM DUAL;
    END LOOP;
END;
/"""


class Command(BaseCommand):
    help = """Reset sequences. Take the output of this script and paste it into your oracle utility"""

    def handle(self, *args, **options):
        sequence_tables_query = """SELECT data_default AS sequence_val ,table_name ,column_name
        FROM all_tab_columns
        WHERE OWNER = 'DJANGO'
        AND identity_column = 'YES';"""

        results = []
        with connection.cursor() as cursor:
            cursor.execute(sequence_tables_query)
            results = cursor.fetchall()

        sequence_dict = {}
        for (sequence, table, id_column) in results:
            max_value_query = "select max({}) from {};".format(id_column, table)
            max_value = 0
            with connection.cursor() as cursor:
                cursor.execute(max_value_query)
                results = cursor.fetchall()
                if results:
                    if results[0]:
                        max_value = results[0][0]
                        if max_value is None:
                            max_value = 0

            (sequence_owner, sequence_name, _) = tuple([x.strip('"') for x in sequence.split('.')])


            seq_value_query = "SELECT LAST_NUMBER FROM all_sequences where SEQUENCE_OWNER='{}' and SEQUENCE_NAME='{}';".format(
                sequence_owner, sequence_name)

            last_value = 0
            with connection.cursor() as cursor:
                cursor.execute(seq_value_query)
                results = cursor.fetchall()
                if results:
                    if results[0] is not None:
                        last_value = results[0][0]

            pass  # print((sequence_name, table, id_column, max_value, last_value))

            if last_value < (max_value+1): # Sequence needs syncing
                print(sequence_update_query.format(max_value, last_value, table, sequence_name))

Django's Management Command

sqlsequencereset¶

django-admin sqlsequencereset app_label [app_label ...]¶

Prints the SQL statements for resetting sequences for the given app name(s).

Sequences are indexes used by some database engines to track the next available number 
for automatically incremented fields.

Use this command to generate SQL which will fix cases where a sequence is out of sync 
with its automatically incremented field data.

--database DATABASE¶

Specifies the database for which to print the SQL. Defaults to default.

DRY Anyone?

Missing Text Indexes

Lets have a look at how this might be implemented

Pandas & Django

        from django.db import connection
        query = """
        SELECT "ERC"."CA",
               "ERC"."CB",
               "ERC"."CC",
               "ERC"."CD",
               "RCBF"."CTPP",
               "MRO"."PUB_YEAR",
               CASE
                 WHEN "CAGG"."CT" IS NOT NULL THEN "CAGG"."CT"
                 WHEN "CAGG"."CT" IS NULL THEN 0
               END AS "CT",
               CASE
                 WHEN "CT"/"CTPP" IS NULL THEN 0
                 WHEN "CT"/"CTPP" IS NOT NULL THEN "CT"/"CTPP"
               END AS "RCI"
        FROM "ERC"
        left outer join MRO
        on ("ERC"."CC" = "MRO"."ID"
              and "ERC"."ERA_SUBMISSION_ID" = {era_id}
              AND "ERC"."IS_ACTIVE" = 1
              AND "MRO".RESEARCH_OUTPUT_TYPE = 'C1'
        )
        INNER JOIN RCBF
        on ("ERC"."CA" = "RCBF"."CA" and "MRO"."PUB_YEAR" = "RCBF"."YEAR")
        INNER JOIN RDCB
        ON RDCB.ID = RCBF.BENCHMARK_ID AND RDCB.SR='X' AND RDCB.META_ID={meta_id}
        LEFT OUTER JOIN (select CC, sum(VALUE) AS CT from MRO_CT where SR='X' and year BETWEEN {a} and {b} group by CC) CAGG
        ON ("ERC"."CC"="CAGG"."CC")
        """

        query = query.format(**dict)

Fast Number Crunching

        from django.db import connection
        df = pandas.read_sql_query(query, connection)  # 95% of the cost

Fast Number Crunching

        df["CLASS0"] = df["RCI"] < 0.001
        df["CLASS1"] = df["RCI"].between(0.001, 0.8)
        df["CLASS2"] = df["RCI"].between(0.8, 1.2)
        df["CLASS3"] = df["RCI"].between(1.2, 2)
        df["CLASS4"] = df["RCI"].between(2, 4)
        df["CLASS5"] = df["RCI"].between(4, 8)
        df["CLASS6"] = df["RCI"] > 8

        series_dict = dict()

        for rci_class in range(0, 7):
            series = df.query("CLASS{}==True".format(rci_class)).groupby("CA").sum()["CB"]/100
            series_dict[rci_class] = series

        results = pandas.DataFrame(dict(
            CLASS0=series_dict[0],
            CLASS1=series_dict[1],
            CLASS2=series_dict[2],
            CLASS3=series_dict[3],
            CLASS4=series_dict[4],
            CLASS5=series_dict[5],
            CLASS6=series_dict[6]
        )).reset_index()

        results.fillna(0, inplace=True)

        d = {}
        for (CA, class0, class1, class2, class3, class4, class5, class6) in zip(
            results["CA"], results["CLASS0"], results["CLASS1"], results["CLASS2"], results["CLASS3"],
                results["CLASS4"], results["CLASS5"], results["CLASS6"]):
            d[CA] = {
                "CLASS0": class0,
                "CLASS1": class1,
                "CLASS2": class2,
                "CLASS3": class3,
                "CLASS4": class4,
                "CLASS5": class5,
                "CLASS6": class6
            }

Fast Number Crunching

Some Django Stuff

Result set with a lot of data

from django.core.paginator import Paginator


def chunked_iterator(queryset, chunk_size=1000):
    paginator = Paginator(queryset, chunk_size)
    for page in range(1, paginator.num_pages + 1):
        for obj in paginator.page(page).object_list:
            yield obj
all_the_objects = chunked_iterator(MyModel.objects.all())

Customizing Prefetch

from django.db.models import Prefetch


questions = Question.objects.filter(**kwargs).prefetch_related('choice_set')

q = questions.objects.first()
q_choices = [x for x in q.choice_set.all() if x.votes > 0]
q_choices = sorted(ro_claims, key=attrgetter('votes'))



############################################

voted_choices = Choice.objects.filter(votes__gt=0)
prefetch = Prefetch('choice_set', queryset=voted_choices)
questions = Question.objects.filter(**kwargs).prefetch_related(prefetch)

https://youtu.be/eUM3b2q27pI

Recommended Link

Made with Slides.com