SQL Alchemy Hello World

What is Sqlalchemy?

Sqlalchemy is a python library for managing SQL databases, it provides a very easy to use interface for managing object-relationship-mapping (ORM) databases. Using SQL alchemy we don't have to write raw SQL queries, this is due to the fact that Sqlalchemy provides an abstraction over the queries and means the code will essentially all look like normal Python code.

Benefits of using Sqlalchemy

- It works more or less the same for all databases, apart from maybe postgres , but even in the case of postgres there is a special dialectics module that handles this sort of thing.

- Ability to map Python classes to database tables in an object-orientated way.

Installation

To follow along with this post you will need to pip install the libraries below. The faker lib will be used to generate toy data.

pip install sqlalchemy

pip install faker

Also if you wish to follow along, take note of the following file structure that I am using as I write this post.

/Intro
│
├── intro.py
└── intro.ipynb

The intro.py file will be used to denote declaring the database and writing reusable parts of the code and the notebook is used for interacting and testing with it, this is because notebooks make it much easier to run things line by line and play with the output.

Create First Database

In intro.py let's write some of the code required to set up our first database with sqlalchemy. Understanding of the __repr__ method is not required, it basically is a way to print out some meaningful representation of the object, see here if interested to learn more.

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base

# Create an engine this will create a new sqlite database in current workingdir
engine = create_engine("sqlite:///hello_world.db", echo=True)

# Define a base class using declarative_base
Base = declarative_base()

# Define the User model
class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True)
    name = Column(String)
    fullname = Column(String)
    email = Column(String)
    age = Column(Integer, nullable=True)

    def __repr__(self):
        return f"User(id={self.id}, name={self.name}, fullname={self.fullname}, email={self.email}, age={self.age})"




metadata = Base.metadata

if __name__ == "__main__":
    # Create the table in the database
    metadata.create_all(engine)

Ok before we get in to what some of the methods and objects are doing below, first view the table below, which is intended to visualize what exactly the output of the script above it. Note you should see an hello_world.db file in your folder. You won't be able to open it though and view it as you might a normal file, but internally in that file there will be a structure that looks like the table below i.e. empty!

Database Table Structure

id (Integer, Primary Key)	name (String)	fullname (String)	email (String)	age (Integer, Nullable)

What is a Declarative Base Class?

It may be a little confusing as to how sqlalchemy actually creates the table above. It is probably best to understand what this does on a simple example, as if you see it in a large code block with many tables and relationships, it will seem like magic how sqlalchemy knows what tables to create.

Think of the Base class we defined in the code snippet above, which the User class inherits from, as like the black entity in the center of the image below, anything that inherits from this entity is connected in some way. The base class then registers all associated entities (or more accurately 'models') and then when it comes time to create the models (read tables) this registry automatically adds everything , by taking the registry and creating all the models.

metadata = Base.metadata

# Create the table in the database
metadata.create_all(engine)

Also you will see on your terminal an output that looks like the commands below.

2024-01-14 21:44:41,658 INFO sqlalchemy.engine.Engine 
CREATE TABLE users (
        id INTEGER NOT NULL, 
        name VARCHAR, 
        fullname VARCHAR, 
        email VARCHAR, 
        age INTEGER, 
        PRIMARY KEY (id)
)

This is a demonstration of the 'abstraction' we mentioned at the beginning of the article. Sqlalchemy is able to create SQL commands based on the objects we provide i.e. it applys a mapping from our models to SQL. The command above is for creating a table and is visible due to the fact that we use the echo=True in the create_engine method. This is a very nice way to learn about what is actually going on under the hood!

Open Jupyter Notebook to Play

Now that we have created our hello world database, we can open it, I am using Jupyter because this is a nice way to run things line by line and see the output interactively, but you can use anything you like, the code should still run.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# Create an engine for a SQLite database
engine = create_engine("sqlite:///hello_world.db", echo=True)

# Create a sessionmaker
session_pool = sessionmaker(bind=engine)

Now we can add a User , to do this we should import the User model from the intro file we just created.

from intro import User

Create Row in Database

Let's create an instance of User to see how it works.

user1 = User(id=1, 
             name="John",
             fullname="John Codearmo", 
             email="admin@codearmo.com",
             age=32)

print(user1)

'''
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)

'''

Now that we have created the first user for our database, we can add to the database, we will be using a context manager to establish and close the connection to the database. The insert method, does exactly what you might expect in inserting the new object in to the database.

# Using the session to insert data
with session_pool() as conn:
    conn.add(user1)  # Add the new user to the session
    conn.commit()    # Commit the transaction


'''

2024-01-14 21:45:07,800 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-14 21:45:07,802 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)
2024-01-14 21:45:07,803 INFO sqlalchemy.engine.Engine [generated in 0.00059s] (1, 'John', 'John Codearmo', 'admin@codearmo.com', 32)
2024-01-14 21:45:07,804 INFO sqlalchemy.engine.Engine COMMIT
'''

Again since we have echo=True, we see the raw SQL queries that are executed.

Let's add some more users and tie things back in with the way we created the initial Users model.

# Using the session to insert data
with session_pool() as conn:
    user2 = User(id=2, 
             name="Bill",
             fullname="Bill Codearmo", 
             email="bill@codearmo.com",
             age=32)
    
    conn.add(user2)
    conn.commit()
    
    user3 = User(id=3, 
             name="Sally",
             fullname="Sally Codearmo", 
             email="sally@codearmo.com",
             )
    # note we dont have sally age, cause impolite to 
    # ask lady her age
    conn.add(user3)
    
    
    user4 = User(
             name="Peter",
             fullname="Peter Codearmo", 
             email="sally@codearmo.com",
             age = 40
             )
    # here we don't add id we just autoincrement
    
    
    conn.add(user4)
    
    conn.commit()



'''


2024-01-14 22:03:33,115 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-14 22:03:33,116 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)
2024-01-14 22:03:33,117 INFO sqlalchemy.engine.Engine [cached since 76.36s ago] (2, 'Bill', 'Bill Codearmo', 'bill@codearmo.com', 32)
2024-01-14 22:03:33,118 INFO sqlalchemy.engine.Engine COMMIT
2024-01-14 22:03:33,127 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-14 22:03:33,128 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)
2024-01-14 22:03:33,129 INFO sqlalchemy.engine.Engine [cached since 76.37s ago] (3, 'Sally', 'Sally Codearmo', 'sally@codearmo.com', None)
2024-01-14 22:03:33,129 INFO sqlalchemy.engine.Engine INSERT INTO users (name, fullname, email, age) VALUES (?, ?, ?, ?)
2024-01-14 22:03:33,130 INFO sqlalchemy.engine.Engine [generated in 0.00043s] ('Peter', 'Peter Codearmo', 'sally@codearmo.com', 40)
2024-01-14 22:03:33,130 INFO sqlalchemy.engine.Engine COMMIT

'''

Some notes on above

Notice that we don't have to input the id, this is because it is the primary key and by default it will auto-increment i.e. add one to last observed value. Recall the line below where we defined the primary key of the table

#This line
id = Column(Integer, primary_key=True)

We didn't have to input Sally's age, this is because when we created the table/model , we set nullable to True, this essentially means that in the event of not passing an age argument to the model, Sqlalchemy will simply set it to None (null).

age = Column(Integer, nullable=True) # optional parameter

Read Rows from DataBase

Let's see if we can retrieve all the users we just put in the database.

with session_pool() as conn:
    # Query to get all users
    all_users = conn.query(User).all()

    # Iterate through the list of users and print them
    for user in all_users:
        print(user)



'''
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
User(id=2, name=Bill, fullname=Bill Codearmo, email=bill@codearmo.com, age=32)
User(id=3, name=Sally, fullname=Sally Codearmo, email=sally@codearmo.com, age=None)
User(id=4, name=Peter, fullname=Peter Codearmo, email=sally@codearmo.com, age=40)

'''

And as expected the rows are now populated in the database.

Update a Row

Notice that Peter's email from above is the same as Sally's this isn't correct and we want to update Peter's email. Let's see how we can do that


peter_fullname = "Peter Codearmo"
peters_correct_email = "peter@codearmo.com"

# make connection 
with session_pool() as conn:
    # Step 1: get peter object from database
    user_to_update = conn.query(User).filter(User.fullname == peter_fullname).first()

    
    # Step 2: Update the desired field
    user_to_update.email = peters_correct_email

    # Step 3: Commit the changes
    conn.commit() 
    
    
    
    ## lets see if Peter's email is correct now 
    
    peter = conn.query(User).filter(User.fullname == peter_fullname).first()
    print('##################################################')
    print(peter)
    print('##################################################')
    
   


'''
##################################################
User(id=4, name=Peter, fullname=Peter Codearmo, email=peter@codearmo.com, age=40)
##################################################

'''

Delete a row

Peter being very upset regarding the wrong email fiasco has decided to leave us and wants his data deleted from the database. Let's see how we can do that.

peters_correct_email = "peter@codearmo.com"

# make connection 
with session_pool() as conn:
    # Step 1: get peter object from database by his email
    peter = conn.query(User).filter(User.email == peters_correct_email).first()

    
    # Step 2: Delete Peter
    conn.delete(peter)

    # Step 3: Commit the changes
    conn.commit() 
    
    
    
    
    ## see if he is gone 
    
    all_users = session.query(User).all()

    print('##################################################')
    for user in all_users:
        print(user)
    print('##################################################')





'''
##################################################
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
User(id=2, name=Bill, fullname=Bill Codearmo, email=bill@codearmo.com, age=32)
User(id=3, name=Sally, fullname=Sally Codearmo, email=sally@codearmo.com, age=None)
##################################################
'''

CRUD

CRUD stands for create, read, update and delete which we gave simple examples of in this toy example, these are the 4 fundamental operations that are performed on a database.

Break Things Errors

Let's intentionally try to break some of the model we have created, this is useful because later we can recognise these errors in more complex structures. It should be noted here that Sqlite is much more lax in regards to typing and missing data than other databases. This is useful for our purposes of playing around with things, but also has drawbacks which we will discuss after a few examples.

with session_pool() as conn:
    duplicate_id_user = User(id=2, 
             name="Simon",
             fullname="Simon Codearmo", 
             email="Simon@codearmo.com",
             age=32)
    
    conn.add(duplicate_id_user)
    conn.commit()


'''

IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: users.id
[SQL: INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)]
[parameters: (2, 'Simon', 'Simon Codearmo', 'Simon@codearmo.com', 32)]
'''

An IntegrityError is raised by databases when we try to duplicate rows that either can't be duplicated e.g. the primary key, or rows we have explictly told it not to duplicate. I think we have already established that it is better to let the database handle the primary key, but this is a useful example as to what the IntegrityError actually means.

Let's take a look at the next error, although this won't raise an exception.

with session_pool() as conn:
    new_user = User(
             name="Simon",
             age=10)
    
    conn.add(new_user)
    conn.commit()

It looks like Simon is back and his is playing silly buggers with our database. It appears he has joined again, but hasn't supplied all the necessary data, but sqlite is allowing him to input data without the proper requirements. See below!

all_users = conn.query(User).all()

print('##################################################')
for user in all_users:
    print(user)
print('##################################################')

'''
##################################################
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
User(id=2, name=Bill, fullname=Bill Codearmo, email=bill@codearmo.com, age=32)
User(id=3, name=Sally, fullname=Sally Codearmo, email=sally@codearmo.com, age=None)
User(id=4, name=Simon, fullname=None, age=10)
##################################################

'''

Delete Table and Redesign

So there are certain problems with the way we instantiated this database. Below we are going to delete all the data and start again, but take more care about the way we design the table. Note that this will be in the intro.py file rather than the notebook.

As I intended this post to be for absolute beginners, don't worry too much if you don't undertstand anything that has @property directly above it, this is simply for illustrative purposes for ensuring that we get the right data type, but if interested read my post on properties

The objectives of this redesign are as follows:

Make sure that all variables are provided (except age) by setting the nullable=False

Make sure that the email is unique i.e. no two users can have the same email.

For illustrative purposes we want to make sure the age is either None or an integer , else raise an exception.

#intro.py
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base

# Create an engine this will create a new sqlite database in current workingdir
engine = create_engine("sqlite:///hello_world.db", echo=True)

# Define a base class using declarative_base
Base = declarative_base()


# THIS IS TO DELETE ALL THE SCHEMA AND START AGAIN


class DataError(Exception):
    """
    This is for making the sqlite database replicate
    behaviour of other databases with typing
    only for illustrative purposes, not too sure
    if this is something that is advisable
    """

    pass


# Define the User model
class User(Base):
    __tablename__ = "users"

    # there are no changes we need to make here , this is as we expect it to be
    id = Column(Integer, primary_key=True)

    # make this a required field, if not passed then error
    name = Column(String, nullable=False)

    # make this a required field, if not passed then error
    fullname = Column(String, nullable=False)

    # this is a required field AND it must not be used by other user
    email = Column(String, nullable=False, unique=True)

    # here we try enforce some typing, replicate behaviour of more production type dbs
    # if we are given an age it must be an integer.
    _age = Column("age", Integer, nullable=True)

    def __repr__(self):
        return f"User(id={self.id}, name={self.name}, fullname={self.fullname}, email={self.email}, age={self.age})"

    @property
    def age(self):
        return self._age

    @age.setter
    def age(self, value):
        if value is not None and not isinstance(value, int):
            raise DataError(f"DataError occurred: input syntax for integer: {value}")
        self._age = value


if __name__ == "__main__":
    # Drop all the old data
    Base.metadata.drop_all(engine)

    # make it again
    Base.metadata.create_all(engine)



'''
CREATE TABLE users (
        id INTEGER NOT NULL, 
        name VARCHAR NOT NULL, 
        fullname VARCHAR NOT NULL, 
        email VARCHAR NOT NULL, 
        age INTEGER, 
        PRIMARY KEY (id), 
        UNIQUE (email)
)


'''

Back to Jupyter Notebook to play around

Now let's test out our new and improved database.

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# make connection again to database
engine = create_engine("sqlite:///hello_world.db", echo=False)

# Create a sessionmaker
session_pool = sessionmaker(bind=engine)

from intro import User


with session_pool() as conn:
    
    
    ## add a user
    user1 = User(
             name="John",
             fullname="John Codearmo", 
             email="admin@codearmo.com",
             age=32)
    
    conn.add(user1)
    conn.commit()
    
    
    # get all users (expect only 1)
    all_users = conn.query(User).all()

    # Iterate through the list of users and print them
    for user in all_users:
        print(user)


'''
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)

'''

Looks like we can still add users. Let's see if some of the errors get raised that we expect

Let's see if someone can join using an email that is already present in the database.

with session_pool() as conn:
    user = User(
             name="Peter",
             fullname="Peter Codearmo", 
             email="admin@codearmo.com",
             age=40)
    
    conn.add(user)
    conn.commit()



'''
IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: users.email
[SQL: INSERT INTO users (name, fullname, email, age) VALUES (?, ?, ?, ?)]
[parameters: ('Peter', 'Peter Codearmo', 'admin@codearmo.com', 40)]

'''

Looks like someone tried to join our app with an email that is already taken and we get the expected behaviour with an error raised.

Let's see if someone can join the app by passing in an age that is in string format.

with session_pool() as conn:
    
    ## add a user
    user1 = User(
             name="Peter",
             fullname="Peter Codearmo", 
             email="Peter@codearmo.com",
             age="forty")
    
    conn.add(user1)
    conn.commit()


'''
DataError: DataError occurred: input syntax for integer: forty

'''

OK great, if someone is trying to join our app and they give an age, then it must be in a integer format else we can an error.

What about if someone tries to join without passing in an email?

with session_pool() as conn:
    
    ## add a user
    user1 = User(
             name="Peter",
             fullname="Peter Codearmo", 
             age=40)
    
    conn.add(user1)
    conn.commit()



'''

IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: users.email
[SQL: INSERT INTO users (name, fullname, email, age) VALUES (?, ?, ?, ?)]
[parameters: ('Peter', 'Peter Codearmo', None, 40)]

'''

This NOT NULL constraint failed relates to the nullable=False definition we created when we redesigned the database.

'''
email = Column(String, nullable=False, unique=True)

'''

What is the Point of Using a Database?

In this example, the reader may rightly believe that this would all be much more simple if we just stored the data in an Excel sheet for example. So let's simulate a real use case and explain.

Let's assume we are the IT guy at some international company, we have users across the world, and we also have employees across the world. Some employees will be writing to the database and some will be reading from it (think adding new customers and performing analytics on existing customers respectively).

Well, in this case we need some sort of shared resource. Let's create an example of a bigger database than we have been using so far. Note that I created this in Jupyter notebook.

Again for those not so familiar with either Python or SQL it is not necessary to understand the code below, it is simply a function to create random data in the database we created.

Make sure you have faker installed with 'pip install faker'

#Jupyter
import random
from faker import Faker

def get_random_age(n):
    ages = [age for age in range(18, 100)]
    ages += [None, None, None]
    return random.choices(ages, k=n)


def create_fake_users(n_users):
    locales = [
        "en_US",
        "en_GB",
        "en_AU",
        "en_CA",
        "de_DE",
        "fr_FR",
        "es_ES",
        "it_IT",
        "ja_JP",
        "ru_RU",
    ]
    rand_ages = get_random_age(n_users)

    users = []
    
    Faker.seed(0)
    
    faker = Faker(locale=locales)

    for i in range(n_users):
        fake_name = faker.name()
        name = fake_name.split()[0]
        email = faker.unique.email()
        age = rand_ages[i]

        user = User(name=name, fullname=fake_name, email=email, age=age)
        users.append(user)

    return users


import time


with session_pool() as conn:
    t = time.time()
    fake_users = create_fake_users(30000)
    print(f" creating fake data time was {time.time() - t} seconds")

    t = time.time()
    conn.bulk_save_objects(fake_users)

    conn.commit()

    print(f" SQL stuff was {time.time() - t} seconds")



'''
creating fake data time was 4.46756911277771 seconds
SQL stuff was 0.2603566646575928 seconds

'''

Now we have a sizeable database to work with let's take a look at the last 10 entries we inserted into the database.

## view 20 of our fake users

with session_pool() as conn:
    all_users = conn.query(User).all()
    
    for user in all_users[-10:]:
        print(user)


'''
User(id=29992, name=Elizabeth, fullname=Elizabeth Smith, email=edward40@example.net, age=92)
User(id=29993, name=Carol, fullname=Carol Myers, email=vfranke@example.com, age=40)
User(id=29994, name=Kim, fullname=Kim Fry, email=nanaminakamura@example.com, age=54)
User(id=29995, name=Elsbeth, fullname=Elsbeth Ruppersberger, email=louiseebert@example.com, age=95)
User(id=29996, name=Cirillo, fullname=Cirillo Gelli, email=qleopardi@example.org, age=92)
User(id=29997, name=Daniel, fullname=Daniel Traore du Huet, email=tloewer@example.com, age=36)
User(id=29998, name=Caitlin, fullname=Caitlin Hanson, email=david00@example.net, age=65)
User(id=29999, name=Ing., fullname=Ing. Jolanthe Weller, email=fujiwarasayuri@example.net, age=94)
User(id=30000, name=Uta, fullname=Uta Preiß, email=nancyochoa@example.org, age=64)
User(id=30001, name=Jérôme, fullname=Jérôme Lemoine, email=marcher@example.org, age=89)
'''

Use case 1

Assume we are in the analytics department at the Japanese Branch and we want to know the average age of our users. Well, we can do this very easily.

with session_pool() as conn:
    # think of this as pulling an excel sheet of all users from the cloud
    all_users = conn.query(User).all()

    valid_ages = []
    
    for user in all_users:
        if user.age:
            valid_ages.append(user.age)
    
    
    print(f"Average age of user is {sum(valid_ages)/len(valid_ages)}")


'''

Average age of user is 58.529297820823246
'''

Use case 2

Another employee is working in customer service in Germany, and a customer has called in to update his email.

fullname='Jérôme Lemoine'
email='marcher@example.org'

new_email = 'jerome_new@example.com'

with session_pool() as conn:
    # This is the same sheet as being pulled in from Japan
    all_users = conn.query(User).all() 
    
    
    for user in all_users:
        if user.fullname == fullname and user.email == email:
            user.email = new_email 
    
    conn.commit()

I think you get the idea. There are of course much more efficient ways to make the queries above, which we will link to here once they are completed. If you have made it this far, now you have a database to play around with!

Suggested Exercises

Answer each of the following questions by performing queries on the database.

How many of our users did not give an 'age' when they signed up to our company?

How many of how users use a .net email and how many use a .org ?

What % of our users are aged between 18-25 excluding the users that did not provide an age.

Look in to why the conn.query(User).all() method is not the best way to go about things and see if you can come up with an alternative.

Now young Padawan, you can put "Performed complex SQL queries on a big data-set" on your csv!

I don't think it is too controversial to assert "The most difficult part of working with databases, is designing them" , interacting with the database is quite easy, so in the next few posts we will go over some more details regarding database design.