@Tarun_Garg2
Let's build Y.A.M.A. (Yet Another Messaging Application)
coz why not?
Let's see a high-level database design for Y.A.M.A.
class User(models.Model):
username = models.CharField(unique=True)
profile_picture = models.URLField()
...user related fields
PM:- Hey, can you tell me why does this user exists in our system? π
Developer:-Β Sure, allow me some time to debug π¦
Developer:-Β I need to know when was this user created to debug this; and I don't have that info in my schema.
**googles**
An official examination of the present state of something.
Have some metadata fields which
will help you audit your models
class CreateUpdateAbstractModel:
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
abstract = True
def __str__(self):
return str(self.id)
Most basic form of auditing can be achieved by following snippet
For a more advanced version of auditing, you can add other fields such as -
updated_by, created_by, deleted_at, deleted_by, raw_json
Having default=timezone.now has benefits like you can specify the value of the created_at while testing without having to mock timezone.now()
auto_now_add is a Django level construct vs default=timezone.now is database level construct
Performance implications of having raw_jsonΒ in models
Alternative?
β’ Evaluate for your use-case; after all, this is good to have.
β’ Have JSON as a 1:1 field in some other model
β’ Override Django manager to defer JSON
class User(models.Model):
created_at = models.DateTimeField(default=timezon.now)
updated_at = models.DateTimeField(auto_now=True)
...other auditing fields
username = models.CharField(unique=True)
profile_picture = models.URLField()
...
We can alter our Y.A.M.A. user model with :-Β
3 weeks later....
No no, nothing dramatic happens!
It's just that the boolean field can be swapped with something better here.
A digital record of the time of occurrence of a particular event.
**googles**
Essence almost remains the same, but having timestamped fields will provide benefits in future.
It's not either timestamp OR boolean debate always, sometimes you gotta have both of them also.
class Message(models.Model):
created_at = models.DateTimeField(default=timezone.now)
updated_at = models.DateTimeField(auto_now=True)
...other auditing fields
user = models.ForeignKey(User)
message = models.TextField()
bookmarked_at = models.DateTimeField(null=True)
...
We can now change our Message model schema with :-Β
class Message(models.Model):
...auditing fields
user = models.ForeignKey(User)
deleted_at = models.DateTimeField()
...
(In future)
For every query involving messages, we had to make sure to add
deleted_at is NULL flag and it became a pain.
For Organizations where the deletion was frequent; then it also became a performance overhead to maintain.
**duck-duck-go**
"Soft delete" in database lingo means that you set a flag on an existing table which indicates that a record has been deleted, instead of actually deleting the record.
In the case of Y.A.M.A.; we rarely needed to fetch deleted messages(except for compliance audit).
class DeletedMessage(models.Model):
...auditing fields
body = models.TextField()
deleted_at = models.DateTimeField()
original_message_id = models.PositiveIntegerField()
....
USER_ROLE_CHOICES = [
(0, 'Normal'),
(1, 'Admin'),
(2, 'Archived')
]
class User(models.Model):
... other fields
role = models.PositiveIntegerField(choices=USER_ROLE_CHOICES, default=0)
pk | username | role |
---|---|---|
4 | Test 4 | 0 |
5 | Test 5 | 2 |
select * from user where role=1
It Was Good Until It Wasn't
Django choices just do validation on ModelForm level i.e. it does not ensure constraint & quality check at the lowest level which is database.
Remembering all numerical mapping for roles became a pain
user.role = 3
user.save() # this will work
Keep strings instead of numbers for value; the benefits outweighs theΒ disadavantages
USER_ROLE_CHOICES = [
('normal', 'Normal'),
('admin', 'Admin'),
('archived', 'Archived')
]
class User(models.Model):
... other fields
role = models.CharField(choices=USER_ROLE_CHOICES, default='normal', max_length=10)
...
select * from user where role='normal'
Keep choices as constant variables and namespaced.
class UserRole(models.TextChoices):
NORMAL = 'normal', 'Normal'
ADMIN = 'admin', 'Admin'
ARCHIVED = 'archived', 'Archived'
class User(models.Model):
...
role = models.CharField(choices=UserRole.choices, default=UserRole.NORMAL, max_length=10)
...
Since Django does not enforce constraints for choices on Database level; we can force it to do so via :-Β
class UserRole(models.TextChoices):
NORMAL = 'normal', 'Normal'
ADMIN = 'admin', 'Admin'
ARCHIVED = 'archived', 'Archived'
class User(models.Model):
...
role = models.PositiveIntegerField(choices=UserRole.choices, default=UserRole.NORMAL)
...
class Meta:
constraints = [
models.CheckConstraint(
name="%(app_label)s_%(class)s_role_valid",
check=models.Q(role__in=UserRole.values),
)
]
SELECT org.name, count(msg.id) from message msg
JOIN user u ON u.id=msg.user_id
JOIN organization org ON org.id=org.organization_id
GROUP BY 1
Run following SQL every hour
Things were fine in the citadel until one Friday database server crashed due to this query πΉ
Reason? Expensive join cost between user & message
**bings**
It is OK to denormalize, in fact, denormalization might be one of the top reasons why new age databases are able to scale up
Before :-
Option #1 :-
Option #2 :-
Kill two birds with one stone
Β
Soft delete is good but it can be the root of problems also; so evaluate.
Β
Use Django choices wisely and maintain consistency.
Β
Denormalization is ok; is a tradeoff.
Don't follow any advice blindly from here(or anyone) without evaluating your use case.
@Tarun_Garg2
tarungarg546