Oracle + Django
Why?
Why Not?
Why Not
- Its only used by a few companies e.g. Salesforce, SAP, Amazon, most banks, 98% F500 Companies, etc etc
- Its its own ecosystem:
- DB: Oracle12g DB / Postgres
- ETL: Oracle Data Integrator / Talend / OpenRefine
- Realtime: Golden Gate / Splunk / Pub Sub
- Not alien
Different Ways of Doing Things
SELECT XMLELEMENT("Emp",
XMLFOREST(e.employee_id, e.last_name, e.salary))
"Emp Element"
FROM employees e WHERE employee_id = 204;
Emp Element
----------------------------------------------------------------
<Emp>
<EMPLOYEE_ID>204</EMPLOYEE_ID>
<LAST_NAME>Baer</LAST_NAME>
<SALARY>10000</SALARY>
</Emp>
Text
- XML Serialisation / Deserialisation From Query
- + Table Trailing for Updates
- + Golden Gate Dumping XML -> MQ + Read Receipts
- = Data Bus
SELECT xmlforest(table_name, column_name)
FROM information_schema.columns
WHERE table_schema = 'pg_catalog';
xmlforest
-------------------------------------------------------------------------------------------
<table_name>pg_authid</table_name><column_name>rolname</column_name>
<table_name>pg_authid</table_name><column_name>rolsuper</column_name>
Cross Pollination of Ideas + Learning new stuff
Postgres Too?
Some Oracle + Django Quirks
These are mainly just Oracle Quirks
Table/Column Name Limits
class ThisIsAReallyReallyReallyLongTableName(models.Model):
thisisareallyreallyreallylongfieldname = models.BooleanField(null=False)
- Reporting / BI / Other teams will have nothing but lots of praise for you
create table DJAAPP_THISISAREALLYREALLY097E
(
ID NUMBER(11)
generated by default on null as identity primary key,
THISISAREALLYREALLYREALLYL6BA8 NUMBER(1) not null
check ("THISISAREALLYREALLYREALLYL6BA8" IN (0, 1))
)
/
Table/Column Name Limits
-
In Oracle 12.2 and above the maximum object name length is 128 bytes.
In Oracle 12.1 and below the maximum object name length is 30 bytes.
-
Technology only seems to be getting better
-
Did I mention that tables / fields are case insensitive
class ThisIsAReallyReallyReallyLongTableName(models.Model):
normal_sized_field_name = models.BooleanField(null=False)
class Meta:
db_table = "dj_long_table_name"
Oracle | Postgres |
---|---|
Fast Refresh MV (Primary Key based + No grouping, distinct, aggregation, set operations) |
Updateable View Proposed patch submitted for inclusion in PostgreSQL 8.2. Never Made it |
Complete Refresh MV | Materialized View |
Trigger Based Update of Materialized View |
Materialized Views
-
Multiple Joins that need to be evaluated resulting in faster queries
-
Search Strategy, Reporting, etc
Materialized Views
class MaterialisedViewName(models.Model):
field1 = models.IntegerField()
field2 = models.IntegerField()
field3 = models.IntegerField()
class Meta:
db_table = "vw_my_materialized_view"
managed = False
@classmethod
def refresh_view(cls):
"""
:return:
"""
with connection.cursor() as cursor:
stmt = "BEGIN DBMS_SNAPSHOT.REFRESH('{}','?',
atomic_refresh=>FALSE,out_of_place=>TRUE);END;;".format(
cls._meta.db_table)
cursor.execute(stmt)
Sequences (Oracle 11)
CREATE SEQUENCE emp_sequence
INCREMENT BY 1
START WITH 1
NOMAXVALUE
NOCYCLE
CACHE 10;
INSERT INTO Orders_tab (Orderno, Custno)
VALUES (Order_seq.NEXTVAL, 1032);
CREATE SEQUENCE t1_seq;
CREATE TABLE t1 (
id NUMBER DEFAULT t1_seq.NEXTVAL,
description VARCHAR2(30)
);
Sequences (Oracle 12)
CREATE SEQUENCE t1_seq;
CREATE TABLE t1 (
id NUMBER(11) GENERATED BY DEFAULT ON NULL AS IDENTITY NOT NULL PRIMARY KEY,
description VARCHAR2(30)
);
-
Whenever we copied data between enviornments, sequences would go out of sync. (Datagrip, etc) esp when you're specifying values into the id column
Sequences
"DJANGO"."ISEQ$$_94778".nextval,AUTH_GROUP,ID
"DJANGO"."ISEQ$$_94782".nextval,AUTH_GROUP_PERMISSIONS,ID
"DJANGO"."ISEQ$$_94775".nextval,AUTH_PERMISSION,ID
"DJANGO"."ISEQ$$_94857".nextval,AUTH_ROLE,ID
"DJANGO"."ISEQ$$_94851".nextval,AUTH_SAMLSETTING,ID
"DJANGO"."ISEQ$$_94785".nextval,AUTH_USER,ID
"DJANGO"."ISEQ$$_94860".nextval,AUTH_USER_ACCESS,ID
"DJANGO"."ISEQ$$_94863".nextval,AUTH_USER_ACCESS_FORCS,ID
"DJANGO"."ISEQ$$_94789".nextval,AUTH_USER_GROUPS,ID
"DJANGO"."ISEQ$$_94792".nextval,AUTH_USER_USER_PERMISSIONS,ID
"DJANGO"."ISEQ$$_94806".nextval,DJANGO_ADMIN_LOG,ID
"DJANGO"."ISEQ$$_94771".nextval,DJANGO_CONTENT_TYPE,ID
"DJANGO"."ISEQ$$_94728".nextval,DJANGO_MIGRATIONS,ID
SELECT data_default AS sequence_val
,table_name
,column_name
FROM all_tab_columns
WHERE OWNER = 'DJANGO'
AND identity_column = 'YES';
My Management Command
import csv
import os
from django.core.management.base import BaseCommand
from django.db import connection
sequence_update_query = """DECLARE
maxid NUMBER := {};
maxseq NUMBER := {};
temp NUMBER; -- without this variable Oracle would skip to query the sequence {}
BEGIN
FOR i IN maxseq .. maxid LOOP
SELECT RADAR.{}.NEXTVAL INTO temp FROM DUAL;
END LOOP;
END;
/"""
class Command(BaseCommand):
help = """Reset sequences. Take the output of this script and paste it into your oracle utility"""
def handle(self, *args, **options):
sequence_tables_query = """SELECT data_default AS sequence_val ,table_name ,column_name
FROM all_tab_columns
WHERE OWNER = 'DJANGO'
AND identity_column = 'YES';"""
results = []
with connection.cursor() as cursor:
cursor.execute(sequence_tables_query)
results = cursor.fetchall()
sequence_dict = {}
for (sequence, table, id_column) in results:
max_value_query = "select max({}) from {};".format(id_column, table)
max_value = 0
with connection.cursor() as cursor:
cursor.execute(max_value_query)
results = cursor.fetchall()
if results:
if results[0]:
max_value = results[0][0]
if max_value is None:
max_value = 0
(sequence_owner, sequence_name, _) = tuple([x.strip('"') for x in sequence.split('.')])
seq_value_query = "SELECT LAST_NUMBER FROM all_sequences where SEQUENCE_OWNER='{}' and SEQUENCE_NAME='{}';".format(
sequence_owner, sequence_name)
last_value = 0
with connection.cursor() as cursor:
cursor.execute(seq_value_query)
results = cursor.fetchall()
if results:
if results[0] is not None:
last_value = results[0][0]
pass # print((sequence_name, table, id_column, max_value, last_value))
if last_value < (max_value+1): # Sequence needs syncing
print(sequence_update_query.format(max_value, last_value, table, sequence_name))
Django's Management Command
sqlsequencereset¶
django-admin sqlsequencereset app_label [app_label ...]¶
Prints the SQL statements for resetting sequences for the given app name(s).
Sequences are indexes used by some database engines to track the next available number
for automatically incremented fields.
Use this command to generate SQL which will fix cases where a sequence is out of sync
with its automatically incremented field data.
--database DATABASE¶
Specifies the database for which to print the SQL. Defaults to default.
DRY Anyone?
Missing Text Indexes
Lets have a look at how this might be implemented
Pandas & Django
from django.db import connection
query = """
SELECT "ERC"."CA",
"ERC"."CB",
"ERC"."CC",
"ERC"."CD",
"RCBF"."CTPP",
"MRO"."PUB_YEAR",
CASE
WHEN "CAGG"."CT" IS NOT NULL THEN "CAGG"."CT"
WHEN "CAGG"."CT" IS NULL THEN 0
END AS "CT",
CASE
WHEN "CT"/"CTPP" IS NULL THEN 0
WHEN "CT"/"CTPP" IS NOT NULL THEN "CT"/"CTPP"
END AS "RCI"
FROM "ERC"
left outer join MRO
on ("ERC"."CC" = "MRO"."ID"
and "ERC"."ERA_SUBMISSION_ID" = {era_id}
AND "ERC"."IS_ACTIVE" = 1
AND "MRO".RESEARCH_OUTPUT_TYPE = 'C1'
)
INNER JOIN RCBF
on ("ERC"."CA" = "RCBF"."CA" and "MRO"."PUB_YEAR" = "RCBF"."YEAR")
INNER JOIN RDCB
ON RDCB.ID = RCBF.BENCHMARK_ID AND RDCB.SR='X' AND RDCB.META_ID={meta_id}
LEFT OUTER JOIN (select CC, sum(VALUE) AS CT from MRO_CT where SR='X' and year BETWEEN {a} and {b} group by CC) CAGG
ON ("ERC"."CC"="CAGG"."CC")
"""
query = query.format(**dict)
Fast Number Crunching
from django.db import connection
df = pandas.read_sql_query(query, connection) # 95% of the cost
Fast Number Crunching
df["CLASS0"] = df["RCI"] < 0.001
df["CLASS1"] = df["RCI"].between(0.001, 0.8)
df["CLASS2"] = df["RCI"].between(0.8, 1.2)
df["CLASS3"] = df["RCI"].between(1.2, 2)
df["CLASS4"] = df["RCI"].between(2, 4)
df["CLASS5"] = df["RCI"].between(4, 8)
df["CLASS6"] = df["RCI"] > 8
series_dict = dict()
for rci_class in range(0, 7):
series = df.query("CLASS{}==True".format(rci_class)).groupby("CA").sum()["CB"]/100
series_dict[rci_class] = series
results = pandas.DataFrame(dict(
CLASS0=series_dict[0],
CLASS1=series_dict[1],
CLASS2=series_dict[2],
CLASS3=series_dict[3],
CLASS4=series_dict[4],
CLASS5=series_dict[5],
CLASS6=series_dict[6]
)).reset_index()
results.fillna(0, inplace=True)
d = {}
for (CA, class0, class1, class2, class3, class4, class5, class6) in zip(
results["CA"], results["CLASS0"], results["CLASS1"], results["CLASS2"], results["CLASS3"],
results["CLASS4"], results["CLASS5"], results["CLASS6"]):
d[CA] = {
"CLASS0": class0,
"CLASS1": class1,
"CLASS2": class2,
"CLASS3": class3,
"CLASS4": class4,
"CLASS5": class5,
"CLASS6": class6
}
Fast Number Crunching
Some Django Stuff
Result set with a lot of data
from django.core.paginator import Paginator
def chunked_iterator(queryset, chunk_size=1000):
paginator = Paginator(queryset, chunk_size)
for page in range(1, paginator.num_pages + 1):
for obj in paginator.page(page).object_list:
yield obj
all_the_objects = chunked_iterator(MyModel.objects.all())
Customizing Prefetch
from django.db.models import Prefetch
questions = Question.objects.filter(**kwargs).prefetch_related('choice_set')
q = questions.objects.first()
q_choices = [x for x in q.choice_set.all() if x.votes > 0]
q_choices = sorted(ro_claims, key=attrgetter('votes'))
############################################
voted_choices = Choice.objects.filter(votes__gt=0)
prefetch = Prefetch('choice_set', queryset=voted_choices)
questions = Question.objects.filter(**kwargs).prefetch_related(prefetch)
https://youtu.be/eUM3b2q27pI
Recommended Link
Oracle + Django
By Iqbal Talaat Bhatti
Oracle + Django
- 516