Sql Alchemy Foreign Keys and Relationships

by John | January 16, 2024

 

 

 

Contents

 

- In this post we will show through examples how foreign keys work in sqlalchemy.

 

- Discuss the ondelete argument to handle event of object being deleted.

 

- Lazy loading for performance.

 

 

What is a Foreign Key?

 

A foreign key is used in a database when we want to reference some other related table row. A foreign key will point to a primary key in the table of interest. This can be quite a confusing topic when first seeing it, so in this post we will just give many examples.

 

 

Example 1 Book Store

 

The foreign keys example we are about to show as a project directory as shown below.

 

/BookStore
│
├── __init__.py
├── foreign_key_books.py
├── foreign_key_books.ipynb
├── for_keys_book.db

 

 

 

In the example below the relevant lines that declare the foreign key are shown below. Note this is similar to the user table example we went through in the previous post

 

# In Author class
 books: Mapped[list["Book"]] = relationship(
        "Book",
        back_populates="author",
        cascade="all, delete",
    )



#In Book class


 author_id: Mapped[int] = mapped_column(
        Integer,
        ForeignKey("authors.id"),
    )
    author: Mapped[Author] = relationship(
        "Author",
        back_populates="books",
    )

 

 

This is pretty easy to understand that a book has an associated author with it, the way to declare this cross-table relationship is through a foreign key. Also note that the ondelete="all, delete" means we will delete all books in the event of an author being deleted. This makes sense in the context of our example, as if a book store removes an author, it is likely they will want to remove all related books by that same author.

 

 

from sqlalchemy import create_engine, Integer, String, ForeignKey
from sqlalchemy.orm import DeclarativeBase, relationship, Mapped, mapped_column, Session
from typing_extensions import Annotated


class Base(DeclarativeBase):
    pass


int_pk = Annotated[int, mapped_column(Integer, primary_key=True)]
str_100 = Annotated[str, mapped_column(String(length=100))]


class Author(Base):
    __tablename__ = "authors"

    id: Mapped[int_pk]
    name: Mapped[str_100]
    books: Mapped[list["Book"]] = relationship(
        "Book",
        back_populates="author",
        cascade="all, delete",
    )

    def __repr__(self) -> str:
        return f"Author(name={self.name})"


class Book(Base):
    __tablename__ = "books"

    id: Mapped[int_pk]
    title: Mapped[str_100]
    author_id: Mapped[int] = mapped_column(
        Integer,
        ForeignKey("authors.id"),
    )
    author: Mapped[Author] = relationship(
        "Author",
        back_populates="books",
    )

    def __repr__(self) -> str:
        return f"Book(title={self.title}, author={self.author})"


# Example usage
if __name__ == "__main__":
    engine = create_engine("sqlite:///for_keys_books_example.db", echo=True)
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)

    with Session(bind=engine) as session:
        george_orwell = Author(name="George Orwell")
        book1 = Book(title="1984", author=george_orwell)
        book2 = Book(title="Animal Farm", author=george_orwell)

        session.add(george_orwell)
        session.add(book1)
        session.add(book2)

        thomas_sowell = Author(name="Stephen King")
        book1 = Book(title="The Shining", author=thomas_sowell)

        book2 = Book(title="The Stand", author=thomas_sowell)

        session.add(thomas_sowell)
        session.add(book1)
        session.add(book2)

        session.commit()



'''

CREATE TABLE authors (
        id INTEGER NOT NULL, 
        name VARCHAR(100) NOT NULL, 
        PRIMARY KEY (id)
)



CREATE TABLE books (
        id INTEGER NOT NULL, 
        title VARCHAR(100) NOT NULL, 
        author_id INTEGER NOT NULL, 
        PRIMARY KEY (id), 
        FOREIGN KEY(author_id) REFERENCES authors (id) ON DELETE CASCADE
)


'''

 

In summary, we have created two related tables, one for authors and another for books created by that author.

 

 

 

Lazy Loading

 

Let's see if we can retrieve the authors in foreign_keys_books.ipynb , take note of the sql queries being echoed in the notebook. The order in which they are executed is of interest , as you will see the retrieval of books from the database doesn't occur until we explicitly call them. It is worth noting that this is the default behaviour in sqlalchemy.

 

 

from sqlalchemy.orm import Session
from sqlalchemy import create_engine 
from foreign_key_books import Author, Book


engine = create_engine("sqlite:///for_keys_books_example.db", echo=True)


with Session(bind=engine) as session:
    
    orwell = session.query(Author).filter(Author.name=="George Orwell").first()
    
    
    print(orwell.name)
    
    
    print('##'*30)
    
    for book in orwell.books:
        print(book)
    

'''
INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-16 20:35:19,927 INFO sqlalchemy.engine.Engine SELECT authors.id AS authors_id, authors.name AS authors_name 
FROM authors 
WHERE authors.name = ?
 LIMIT ? OFFSET ?
2024-01-16 20:35:19,927 INFO sqlalchemy.engine.Engine [generated in 0.00040s] ('George Orwell', 1, 0)
George Orwell
############################################################
2024-01-16 20:35:19,930 INFO sqlalchemy.engine.Engine SELECT books.id AS books_id, books.title AS books_title, books.author_id AS books_author_id 
FROM books 
WHERE ? = books.author_id
2024-01-16 20:35:19,930 INFO sqlalchemy.engine.Engine [generated in 0.00064s] (1,)
Book(title=1984, author=Author(name=George Orwell))
Book(title=Animal Farm, author=Author(name=George Orwell))



'''

 

This behavior of only loading the books when they are called in the program is known as lazy loading. Why might you want to use lazy loading? Well, let's take a sub-example, say you have an online shop that sells books by many authors, on one page you want to list all the authors,  you don't really want to be calling the database more than you need to for user experience reasons, so it makes no sense to retrieve all the books of the authors on the authors page.

 

What about if we want to prevent lazy loading? Well, this is quite easy also, we can simply follow the steps shown below.

 

 

from sqlalchemy.orm import subqueryload

# Usage
with Session(bind=engine) as session:
    authors = session.query(Author).options(subqueryload(Author.books)).all()
    
    print(orwell.name)
    
    
    print('##'*30)
    for author in authors:
        books = author.books  


'''

2024-01-17 01:25:36,052 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-17 01:25:36,054 INFO sqlalchemy.engine.Engine SELECT authors.id AS authors_id, authors.name AS authors_name 
FROM authors
2024-01-17 01:25:36,055 INFO sqlalchemy.engine.Engine [generated in 0.00066s] ()
2024-01-17 01:25:36,059 INFO sqlalchemy.engine.Engine SELECT books.id AS books_id, books.title AS books_title, books.author_id AS books_author_id, anon_1.authors_id AS anon_1_authors_id 
FROM (SELECT authors.id AS authors_id 
FROM authors) AS anon_1 JOIN books ON anon_1.authors_id = books.author_id
2024-01-17 01:25:36,060 INFO sqlalchemy.engine.Engine [generated in 0.00086s] ()
George Orwell
############################################################
2024-01-17 01:25:36,061 INFO sqlalchemy.engine.Engine ROLLBACK

'''

 

Notice now the retrieval of the books from the database happens before they are accessed by the program. This is known as eager loading, we could also configure this directly in our model declaration. But the default behavior is to lazily load.

 

 

Ondelete

 

Let's see what happens if we delete an author, as mentioned previously we intend to delete all books by an author if we delete the actual author.

 

 

with Session(bind=engine) as session:
    orwell = session.query(Author).filter(Author.name == "George Orwell").first()
    
    if orwell:
        session.delete(orwell)
        session.commit()

 

 

As expected on delete of the author the books associated with that author through the relationship we defined in the model also get deleted.

 

with Session(bind=engine) as session:
    all_books = session.query(Book).all()
    
    for book in all_books: 
        print(book)


'''


Book(title=The Shining, author=Author(name=Stephen King))
Book(title=The Stand, author=Author(name=Stephen King))

'''

 

 

 

Example 2 Google Analytics Replica

 

The following example has a project directory as shown below.

 

/GoogleAnalytics
│
├── __init__.py
├── foreign_key_analytics.py
├── foreign_key_analytics.ipynb
├── for_keys_analytics.db

 

 

I personally use Google analytics for this website, this is a service that allows me to view page views, user information and events on a nice dashboard. Once starting to learn about SQL I took more notice of the fact that this is almost surely saving  data to a database in some form. So for those who have never seen the dashboard before, it looks something similar to the image below.

 

 

 

 

Of course I can not exactly see how Google has designed their table schema, but we can make a decent guess. Take note of the following

 

- A user is defined as someone who visits the site.

 

- A session is defined as a vistit to the site from a user.

 

- An event is defined as something that happens during a session e.g. page view, ad click scroll or something like that.

 

Before we move in to the code, it may be worth thinking of these 'relationships' above, I find it hard to go straight in to code and try to define relationships, but if we write it out, then it becomes pretty clear. So what foreign keys do we assume Google keeps in their database?

 

User- Session relationship

Session - Event relationship

 

No doubt it is actually more complicated than this behind the scenes, but this is enough for us to start with.

 

 

 

from sqlalchemy import create_engine, Integer, String, ForeignKey, TIMESTAMP
from sqlalchemy.orm import DeclarativeBase, relationship, Mapped, mapped_column, Session
from sqlalchemy import func
from typing_extensions import Annotated
from datetime import datetime


# Base class for all models, provided by SQLAlchemy's declarative system.
class Base(DeclarativeBase):
    pass


# Helper for defining an integer primary key column.
int_primary_key = Annotated[int, mapped_column(Integer, primary_key=True)]
# Helper for defining a string column with a maximum length of 100 characters.
str_100 = Annotated[str, mapped_column(String(length=100))]


# Mixin class to add a 'start' timestamp to any table.
class Timestamp:
    # The 'start' column is automatically populated with the current timestamp.
    start: Mapped[datetime] = mapped_column(TIMESTAMP, default=func.now)


# User model representing a user of the website.
class User(Base, Timestamp):
    __tablename__ = "users"  # Name of the table in the database.

    id: Mapped[int_primary_key]  # Primary key for the user table.
    country: Mapped[str_100]  # String column to store the user's country.

    # Establishes a one-to-many relationship with the 'Session' model.
    sessions: Mapped[list["Session"]] = relationship("Session", back_populates="user")


# Session model representing a user's session on the website.
class Session(Base, Timestamp):
    __tablename__ = "sessions"  # Name of the table in the database.

    id: Mapped[int_primary_key]  # Primary key for the session table.
    end: Mapped[datetime] = mapped_column(
        TIMESTAMP
    )  # Timestamp for when the session ended.
    # Foreign key linking to the 'User' table with 'SET NULL' ondelete behavior.
    user_id: Mapped[int] = mapped_column(
        Integer, ForeignKey("users.id", ondelete="SET NULL")
    )

    # Establishes a many-to-one relationship with the 'User' model.
    user: Mapped[User] = relationship("User", back_populates="sessions")

    events: Mapped[list["Event"]] = relationship("Event", back_populates="session")


class Event(Base, Timestamp):
    __tablename__ = "events"
    id: Mapped[int_primary_key]
    # This could be for example a page view, or a scroll
    event_type: Mapped[str_100]
    # Define relationship between event and session
    session_id: Mapped[int] = mapped_column(Integer, ForeignKey("sessions.id"))

    # backpopulate the events column in the sessions table
    session: Mapped[Session] = relationship("Session", back_populates="events")


if __name__ == "__main__":
    engine = create_engine("sqlite:///for_keys_analytics.db", echo=True)

    Base.metadata.create_all(engine)







'''


CREATE TABLE users (
        id INTEGER NOT NULL, 
        country VARCHAR(100) NOT NULL, 
        start TIMESTAMP NOT NULL, 
        PRIMARY KEY (id)
)





CREATE TABLE sessions (
        id INTEGER NOT NULL, 
        "end" TIMESTAMP NOT NULL, 
        user_id INTEGER NOT NULL, 
        start TIMESTAMP NOT NULL, 
        PRIMARY KEY (id), 
        FOREIGN KEY(user_id) REFERENCES users (id) ON DELETE SET NULL
)




CREATE TABLE events (
        id INTEGER NOT NULL, 
        event_type VARCHAR(100) NOT NULL, 
        session_id INTEGER NOT NULL, 
        start TIMESTAMP NOT NULL, 
        PRIMARY KEY (id), 
        FOREIGN KEY(session_id) REFERENCES sessions (id)
)


'''

 

 

 

 

 

 

 

 

Let's simulate a user session

 

 

#foreign_key_analytics.ipynb
from foreign_key_analytics import User, Session, Event
from sqlalchemy.orm import Session as db_Session
from sqlalchemy import create_engine
from datetime import datetime
import time

engine = create_engine("sqlite:///for_keys_analytics.db", echo=True)


with db_Session(bind=engine) as session:
    # Create a new user
    new_user = User(country='Canada', start=datetime.utcnow())
    session.add(new_user)
    session.flush()  # Flush to assign an ID to new_user

    # Create a new session with a temporary end time
    temp_end_time = datetime.utcnow()
    new_session = Session(user=new_user, start=datetime.utcnow(), end=temp_end_time)
    session.add(new_session)
    session.flush()  # Flush to assign an ID to new_session

    events = ["page view", "scroll", "ad click", "page view"]

    for event in events:
        new_event = Event(session=new_session, event_type=event)  # Directly assign session object
        session.add(new_event)
        time.sleep(5)

    # Update the end time after the simulated period
    new_session.end = datetime.utcnow()

    session.commit()
    

 

 

Now we have access to the tables when we query a user

 

 

with db_Session(bind=engine) as session: 
    user = session.query(User).first() 
    
    print(user)
    
    for s in users.sessions:
        print(s)
        
        for e in s.events:
            print(e)



'''
User(id=1, country=Canada)



Session(start=2024-01-21 07:52:07.871233, end=2024-01-21 07:52:27.879055, user_id=1)


Event(event_type=ad click)
Event(event_type=page view)
Event(event_type=page view)
Event(event_type=scroll)

'''

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Join the discussion

Share this post with your friends!