docs
  • Overview
  • 🐍 PYTHON
    • Type Hints
    • PEP8 Style Guide for Python Code
    • 🏡Pipenv
    • Pathlib
  • 🕸Django
    • 🗄models
      • 🎯Best Practices
      • 🚦Django Signals
    • ⚙️ settings
    • DRF
      • Serializer
      • Authentication
      • Permissions
      • Viewsets
    • Testing
      • Faker and Factory Boy
    • 🧪Test Coverage
    • 💦Python-Decouple
    • Django Tips:
    • 💾Django ORM Queryset
    • Custom Exceptions
    • Celery
    • Resources
  • Deploy
    • 🚀Django Deployment
    • 🔒Setup SSL Certificate
  • 💾Database
    • MongoDB
  • 🛠️DevOps
    • 🖥Scripting
      • A First Script
      • Loops
      • Test
      • Variables
      • External programs
      • Functions
    • Command Line Shortcuts
    • Basic Linux Commands
    • 🎛Microservices
    • 🐳Docker
      • Docker Commands
      • Docker Compose
      • Django project
    • Kubernates
  • 📝Software IDE
    • EditorConfig
    • Linters
    • VsCode
Powered by GitBook
On this page
  • Working With ORM
  • Basic Filters
  • Annotations
  • Group Querysets with Django ORM
  • ExpressionWrapper
  • Aggregation with Filter
  • Queryset results as namedtuples
  • Statement Timeout

Was this helpful?

  1. 🕸Django

Django ORM Queryset

Django ORM querysets

Working With ORM

Django ORM is a powerful way to query the database without writing SQL. QuerySets doesn’t actually involve any activity with the database. Querysets are lazy, Django will only read data from the database if the QuerySet is evaluated.

Basic Filters

# Filters
Race.objects.filter(year=2021)

#  Filter on related fields (foreign keys)
Race.objects.filter(year=2021, circuit__country="Spain")

# Select fields using .values() or .values_list()
# .values() will return a QuerySet of dictionaries and 
# .values_list() will return a QuerySet of tuples.

# .values()
Race.objects.filter(year=2021).values("circuit__country", "date")
<QuerySet [{'circuit__country': 'Italy', 'date': datetime.date(2021, 4, 18)}, {'circuit__country': 'Bahrain', 'date': datetime.date(2021, 3, 28)}]>

# .values_list()
Race.objects.filter(year=2021).values_list("circuit__country", "date")
<QuerySet [('Italy', datetime.date(2021, 4, 18)), ('Bahrain', datetime.date(2021, 3, 28))]>

# Convert QuerySets into a List
races = Race.objects.filter(year=2021).values_list("name", flat=True)
<QuerySet ['Emilia Romagna Grand Prix', 'Bahrain Grand Prix']>


# Sort Querysets: using .order_by() to sort the results.

Driver.objects.all().order_by("-date_of_birth").values_list("first_name", "last_name", "date_of_birth")

# Limiting (slicing) results
Driver.objects.all().order_by("date_of_birth").values_list("first_name", "last_name", "date_of_birth")[:5]

# Count results
Race.objects.filter(year=2021).count()

# Check if results exist
Race.objects.filter(year=2022).exists()

# Greater than / less than
# We can use __gt, __gte, __lt & __lte to perform ‘greater than’ and ‘less than’ queries.

drivers = (
    Driver.objects.filter(date_of_birth__gte="1998-01-01")
    .order_by("date_of_birth")
    .values_list("first_name", "last_name", "date_of_birth")
)

# To add OR conditions to your QuerySets
# We can use Q objects to create OR conditions in your QuerySets.

from django.db.models import Q

i = Race.objects.filter(
    Q(circuit__country="France") | Q(circuit__country="Spain"), year=2021
)

# Filter with lists
countries = ["France", "Spain", "Portugal", "Belgium", "Netherlands"]

races = Race.objects.filter(year=2021, circuit__country__in=countries)

# Get distinct values
# The .distinct() method can be used to make sure no value of the supplied field appears in the QuerySet more than once.

countries = (
    Race.objects.filter(year=2020)
    .distinct("circuit__country")
    .order_by("circuit__country")
    .values_list("circuit__country", flat=True)
)
<QuerySet ['Austria', 'Bahrain', 'Belgium', 'Germany']>

# Exclude data
# The .exclude() method can be used to remove results from a QuerySet.

nationalities = (
    Driver.objects.all()
    .exclude(nationality__in=["British", "American"])
    .distinct("nationality")
    .order_by("nationality")
    .values_list("nationality", flat=True)
)

# Earliest and Latest
# We can use .earliest and .latest on date fields. Instead of returning a QuerySet, they return a single result.

oldest = Driver.objects.earliest("date_of_birth")
youngest = Driver.objects.latest("date_of_birth")

# First and Last
# .first() and .last() can be used to return the first and last results of a QuerySet.

first = Race.objects.all().order_by("date").first()
last = Race.objects.all().order_by("date").last()

Annotations

Annotations are useful for generating statistics from your data.

What do annotations do?

  • Add additional fields to each result of the queryset, calculated from other fields

  • Group results, similar to SQLs GROUP BY expression.

Applying .annotate() on an unevaluated queryset will apply the annotation to all results. If you evaluate the queryset using .values() or values_list() first and apply .annotate(), then this will have a grouping effect.

Annotations will apply the expression to all results in the queryset, whereas aggregates will condense the data into one result.

# Find the most common reasons for drivers to drop out of rage

from django.db.models import Count, Q

from .models import Status

# Top 10 reasons 
top_10_reasons = (
    Status.objects.annotate(occurences=Count("results"))
    .order_by("-occurrences")
    .values_list("status", "occurrences")[:10]
)

# Top 10 reasons that don't finish the race

top_10_dnf_outcomes = (
    Status.objects.exclude(
        Q(status="Finished") | Q(status__icontains="lap")
    ).annotate(occurences=Count("results"))
    .order_by("-occurrences")
    .values_list("status", "occurrences")[:10]
)

# Concat
Driver.objects.annotate(
    full_name=Concat("first_name", Value(" "), "last_name")
).order_by("full_name")


# Adding conditions (Case and When example)
drivers = (
    Driver.objects.annotate(
        full_name=Case(
            When(
                nationality="Chinese",
                then=Concat("last_name", Value(" "), "first_name"),
            ),
            default=Concat("first_name", Value(" "), "last_name"),
        )
    ).order_by("-full_name")
    .values_list("full_name", "nationality")[4:10]
)

Group Querysets with Django ORM

With .annotate() it is possible to group data, end up with fewer results than the original queryset. However, we will have to evaluate it using .values() first to get the grouping effect.

# which driver in the 2021 season had the most DNFs (they didn’t finish a race)

driver_dnfs = (
    Result.objects.filter(
        race__year=2021,
        finishing_position__isnull=True)
        .values("driver",)
        .annotate(dnfs=Count("driver"))
        .order_by("-dnfs")
        .values_list("driver__first_name", "driver__last_name", "dnfs")
)

# Which driver scored the most points in 2021?

from django.db.models import Sum

driver_championship_2021 = (
    Result.objects.filter(race__year=2021)
    .values("driver")
    .annotate(total_points=Sum("points_scored"))
    .order_by("-total_points")
    .values_list("driver__first_name", "driver__last_name", "total_points")
)
# The Sum expression adds all values in the points_scored for all 2021 race results for each driver.


# get the driver’s full name with annotation

driver_championship_2021 = (
    Result.objects.filter(race__year=2021)
    .values("driver")
    .annotate(
        total_points=Sum("points_scored"),
        full_name=Concat ("driver__first_name", Value(" "), "driver__last_name"),
    )
    .order_by("-total_points")
    .values_list("full_name", "total_points")
)

# How many point-scoring races did each 2021 driver have?

from django.db.models import Count, Q, Sum, Value
from django.db.models.functions import Concat


points_scoring_races = (
    Result.objects.filter(race__year=2021)
    .values("driver")
    .annotate(
        full_name=Concat("driver__first_name", Value(" "), "driver__last_name"),
        points_scoring_races=Count("driver", filter=Q(points_scored__gt=0)),
        total_points=Sum("points_scored"),
    )
    .order_by("-points_scoring_races", "-total_points")
    .values("full_name", "points_scoring_races")
)

# we can see that Lewis Hamilton, Carlos Sainz, and Lando Norris each had 20 races where they scored points.

ExpressionWrapper

# to calculate a person’s age from their date of birth.

import datetime
from django.db.models import ExpressionWrapper, DateField, IntergerField, F
from django.db.models.functions import ExtractDay

driver_ages = (
    Driver.objects.annotate(
        full_name=full_name_expression,
        age=ExpressionWrapper(
            ExtractDay(
                ExpressionWrapper(
                    Value(datetime.datetime.now()) - F("date_of_birth"),
                    output_field=IntergerField(),
                )
            )
            / 365.25,
            output_field=IntergerField(),
        ),
    )
    .order_by("-date_of_birth")
    .values_list("full_name", "age")[:10]
)

# In this ExpressionWrapper, we find the difference between today’s date and the driver’s date of birth
ExpressionWrapper(
    Value(datetime.datetime.now()) - F("date_of_birth"),
    output_field=DateField(),
)

# We can extract the number of days out of the delta using ExtractDay
ExtractDay(
    ExpressionWrapper(
        Value(datetime.datetime.now()) - F("date_of_birth"),
        output_field=DateField(),
    )
)

# Finally, we can divide the days by 365.25 to get the years passed:
age=ExpressionWrapper(
    ExtractDay(
        ExpressionWrapper(
            Value(datetime.datetime.now()) - F("date_of_birth"),
            output_field=DateField(),
        )
    )
    / 365.25,
    output_field=IntegerField(),
)

Aggregation with Filter

Prior to Django 2.0 if we wanted to get something like the total number of users and the total number of active users we had to resort to conditional expressions:

from django.contrib.auth.models import User
from django.db.models. import (
    Count,
    Sum,
    Case,
    When,
    Value,
    IntegerField,
)

User.objects.aggregate(
    total_users=Count('id'),
    total_active_users=Sum(Case(
        When(is_avtive=True, then=Value(1)),
        default=Value(0),
        output_field=IntegerField(),
    ))
)

In Django 2.0 a filter argument_to_aggregate_functions was added to make it a lot easier:

from django.contrib.auth.models import User
from django.db.models import Count, F

User.objects.aggregate(
    total_users=Count('id'),
    total_active_users=Count('id', filter=F('is_active')),
)

Queryset results as namedtuples

User.objects.values_list(
    'first_name',
    'last_name',
    named=True,
)

user[0].fist_name

Statement Timeout

We are all humans and we make mistakes, We can't possibly handle each and every edge case so we mush set boundaries.

Unlike other not-blocking app servers such as Torando, asyncio or even Node, Django usually uses synchronous worker processes. This means that When a user executes a long running operations, the worker process is blocked and no onse else can use it untill it is done.

None one should run worker process but we will still want to make sure a single query is not hogging too much resources for too long.

In most Django apps the majority of time is spent waiting for datbase queries. So, setting a timeout on SQL queries is a good place to start.

We like setting a global timeout in our wsgi.py file like this:

# wsgi.py

from django.db.backends.signals import connection_created
from django.dispatch import receiver 

@reveivers(connection_created)
def setup_postgres(connection, **kwargs):
    if connection.vendor != 'postgresql':
        return
    # Timeout statements after 30 seconds.
    with connection.cursor() as cursor:
        cursor.execute("""
            SET statement_timeout TO 30000;
            """)

Side Note: The other common place we spent a lot of time at is networking. So make sure when you call a remote service to always set a timeout.

PreviousDjango Tips:NextCustom Exceptions

Last updated 2 years ago

Was this helpful?

💾