What is Sqlalchemy?
Sqlalchemy is a python library for managing SQL databases, it provides a very easy to use interface for managing object-relationship-mapping (ORM) databases. Using SQL alchemy we don't have to write raw SQL queries, this is due to the fact that Sqlalchemy provides an abstraction over the queries and means the code will essentially all look like normal Python code.
Benefits of using Sqlalchemy
- It works more or less the same for all databases, apart from maybe postgres , but even in the case of postgres there is a special dialectics module that handles this sort of thing.
- Ability to map Python classes to database tables in an object-orientated way.
Installation
To follow along with this post you will need to pip install the libraries below. The faker lib will be used to generate toy data.
pip install sqlalchemy
pip install faker
Also if you wish to follow along, take note of the following file structure that I am using as I write this post.
/Intro
│
├── intro.py
└── intro.ipynb
The intro.py file will be used to denote declaring the database and writing reusable parts of the code and the notebook is used for interacting and testing with it, this is because notebooks make it much easier to run things line by line and play with the output.
Create First Database
In intro.py let's write some of the code required to set up our first database with sqlalchemy. Understanding of the __repr__ method is not required, it basically is a way to print out some meaningful representation of the object, see here if interested to learn more.
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base
# Create an engine this will create a new sqlite database in current workingdir
engine = create_engine("sqlite:///hello_world.db", echo=True)
# Define a base class using declarative_base
Base = declarative_base()
# Define the User model
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String)
fullname = Column(String)
email = Column(String)
age = Column(Integer, nullable=True)
def __repr__(self):
return f"User(id={self.id}, name={self.name}, fullname={self.fullname}, email={self.email}, age={self.age})"
metadata = Base.metadata
if __name__ == "__main__":
# Create the table in the database
metadata.create_all(engine)
Ok before we get in to what some of the methods and objects are doing below, first view the table below, which is intended to visualize what exactly the output of the script above it. Note you should see an hello_world.db file in your folder. You won't be able to open it though and view it as you might a normal file, but internally in that file there will be a structure that looks like the table below i.e. empty!
Database Table Structure
id (Integer, Primary Key) | name (String) | fullname (String) | email (String) | age (Integer, Nullable) |
---|---|---|---|---|
What is a Declarative Base Class?
It may be a little confusing as to how sqlalchemy actually creates the table above. It is probably best to understand what this does on a simple example, as if you see it in a large code block with many tables and relationships, it will seem like magic how sqlalchemy knows what tables to create.
Think of the Base class we defined in the code snippet above, which the User class inherits from, as like the black entity in the center of the image below, anything that inherits from this entity is connected in some way. The base class then registers all associated entities (or more accurately 'models') and then when it comes time to create the models (read tables) this registry automatically adds everything , by taking the registry and creating all the models.
metadata = Base.metadata
# Create the table in the database
metadata.create_all(engine)
Also you will see on your terminal an output that looks like the commands below.
2024-01-14 21:44:41,658 INFO sqlalchemy.engine.Engine
CREATE TABLE users (
id INTEGER NOT NULL,
name VARCHAR,
fullname VARCHAR,
email VARCHAR,
age INTEGER,
PRIMARY KEY (id)
)
This is a demonstration of the 'abstraction' we mentioned at the beginning of the article. Sqlalchemy is able to create SQL commands based on the objects we provide i.e. it applys a mapping from our models to SQL. The command above is for creating a table and is visible due to the fact that we use the echo=True in the create_engine method. This is a very nice way to learn about what is actually going on under the hood!
Open Jupyter Notebook to Play
Now that we have created our hello world database, we can open it, I am using Jupyter because this is a nice way to run things line by line and see the output interactively, but you can use anything you like, the code should still run.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# Create an engine for a SQLite database
engine = create_engine("sqlite:///hello_world.db", echo=True)
# Create a sessionmaker
session_pool = sessionmaker(bind=engine)
Now we can add a User , to do this we should import the User model from the intro file we just created.
from intro import User
Create Row in Database
Let's create an instance of User to see how it works.
user1 = User(id=1,
name="John",
fullname="John Codearmo",
email="admin@codearmo.com",
age=32)
print(user1)
'''
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
'''
Now that we have created the first user for our database, we can add to the database, we will be using a context manager to establish and close the connection to the database. The insert method, does exactly what you might expect in inserting the new object in to the database.
# Using the session to insert data
with session_pool() as conn:
conn.add(user1) # Add the new user to the session
conn.commit() # Commit the transaction
'''
2024-01-14 21:45:07,800 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-14 21:45:07,802 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)
2024-01-14 21:45:07,803 INFO sqlalchemy.engine.Engine [generated in 0.00059s] (1, 'John', 'John Codearmo', 'admin@codearmo.com', 32)
2024-01-14 21:45:07,804 INFO sqlalchemy.engine.Engine COMMIT
'''
Again since we have echo=True, we see the raw SQL queries that are executed.
Let's add some more users and tie things back in with the way we created the initial Users model.
# Using the session to insert data
with session_pool() as conn:
user2 = User(id=2,
name="Bill",
fullname="Bill Codearmo",
email="bill@codearmo.com",
age=32)
conn.add(user2)
conn.commit()
user3 = User(id=3,
name="Sally",
fullname="Sally Codearmo",
email="sally@codearmo.com",
)
# note we dont have sally age, cause impolite to
# ask lady her age
conn.add(user3)
user4 = User(
name="Peter",
fullname="Peter Codearmo",
email="sally@codearmo.com",
age = 40
)
# here we don't add id we just autoincrement
conn.add(user4)
conn.commit()
'''
2024-01-14 22:03:33,115 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-14 22:03:33,116 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)
2024-01-14 22:03:33,117 INFO sqlalchemy.engine.Engine [cached since 76.36s ago] (2, 'Bill', 'Bill Codearmo', 'bill@codearmo.com', 32)
2024-01-14 22:03:33,118 INFO sqlalchemy.engine.Engine COMMIT
2024-01-14 22:03:33,127 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-01-14 22:03:33,128 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)
2024-01-14 22:03:33,129 INFO sqlalchemy.engine.Engine [cached since 76.37s ago] (3, 'Sally', 'Sally Codearmo', 'sally@codearmo.com', None)
2024-01-14 22:03:33,129 INFO sqlalchemy.engine.Engine INSERT INTO users (name, fullname, email, age) VALUES (?, ?, ?, ?)
2024-01-14 22:03:33,130 INFO sqlalchemy.engine.Engine [generated in 0.00043s] ('Peter', 'Peter Codearmo', 'sally@codearmo.com', 40)
2024-01-14 22:03:33,130 INFO sqlalchemy.engine.Engine COMMIT
'''
Some notes on above
- Notice that we don't have to input the id, this is because it is the primary key and by default it will auto-increment i.e. add one to last observed value. Recall the line below where we defined the primary key of the table
#This line
id = Column(Integer, primary_key=True)
- We didn't have to input Sally's age, this is because when we created the table/model , we set nullable to True, this essentially means that in the event of not passing an age argument to the model, Sqlalchemy will simply set it to None (null).
age = Column(Integer, nullable=True) # optional parameter
Read Rows from DataBase
Let's see if we can retrieve all the users we just put in the database.
with session_pool() as conn:
# Query to get all users
all_users = conn.query(User).all()
# Iterate through the list of users and print them
for user in all_users:
print(user)
'''
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
User(id=2, name=Bill, fullname=Bill Codearmo, email=bill@codearmo.com, age=32)
User(id=3, name=Sally, fullname=Sally Codearmo, email=sally@codearmo.com, age=None)
User(id=4, name=Peter, fullname=Peter Codearmo, email=sally@codearmo.com, age=40)
'''
And as expected the rows are now populated in the database.
Update a Row
Notice that Peter's email from above is the same as Sally's this isn't correct and we want to update Peter's email. Let's see how we can do that
peter_fullname = "Peter Codearmo"
peters_correct_email = "peter@codearmo.com"
# make connection
with session_pool() as conn:
# Step 1: get peter object from database
user_to_update = conn.query(User).filter(User.fullname == peter_fullname).first()
# Step 2: Update the desired field
user_to_update.email = peters_correct_email
# Step 3: Commit the changes
conn.commit()
## lets see if Peter's email is correct now
peter = conn.query(User).filter(User.fullname == peter_fullname).first()
print('##################################################')
print(peter)
print('##################################################')
'''
##################################################
User(id=4, name=Peter, fullname=Peter Codearmo, email=peter@codearmo.com, age=40)
##################################################
'''
Delete a row
Peter being very upset regarding the wrong email fiasco has decided to leave us and wants his data deleted from the database. Let's see how we can do that.
peters_correct_email = "peter@codearmo.com"
# make connection
with session_pool() as conn:
# Step 1: get peter object from database by his email
peter = conn.query(User).filter(User.email == peters_correct_email).first()
# Step 2: Delete Peter
conn.delete(peter)
# Step 3: Commit the changes
conn.commit()
## see if he is gone
all_users = session.query(User).all()
print('##################################################')
for user in all_users:
print(user)
print('##################################################')
'''
##################################################
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
User(id=2, name=Bill, fullname=Bill Codearmo, email=bill@codearmo.com, age=32)
User(id=3, name=Sally, fullname=Sally Codearmo, email=sally@codearmo.com, age=None)
##################################################
'''
CRUD
CRUD stands for create, read, update and delete which we gave simple examples of in this toy example, these are the 4 fundamental operations that are performed on a database.
Break Things Errors
Let's intentionally try to break some of the model we have created, this is useful because later we can recognise these errors in more complex structures. It should be noted here that Sqlite is much more lax in regards to typing and missing data than other databases. This is useful for our purposes of playing around with things, but also has drawbacks which we will discuss after a few examples.
with session_pool() as conn:
duplicate_id_user = User(id=2,
name="Simon",
fullname="Simon Codearmo",
email="Simon@codearmo.com",
age=32)
conn.add(duplicate_id_user)
conn.commit()
'''
IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: users.id
[SQL: INSERT INTO users (id, name, fullname, email, age) VALUES (?, ?, ?, ?, ?)]
[parameters: (2, 'Simon', 'Simon Codearmo', 'Simon@codearmo.com', 32)]
'''
An IntegrityError is raised by databases when we try to duplicate rows that either can't be duplicated e.g. the primary key, or rows we have explictly told it not to duplicate. I think we have already established that it is better to let the database handle the primary key, but this is a useful example as to what the IntegrityError actually means.
Let's take a look at the next error, although this won't raise an exception.
with session_pool() as conn:
new_user = User(
name="Simon",
age=10)
conn.add(new_user)
conn.commit()
It looks like Simon is back and his is playing silly buggers with our database. It appears he has joined again, but hasn't supplied all the necessary data, but sqlite is allowing him to input data without the proper requirements. See below!
all_users = conn.query(User).all()
print('##################################################')
for user in all_users:
print(user)
print('##################################################')
'''
##################################################
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
User(id=2, name=Bill, fullname=Bill Codearmo, email=bill@codearmo.com, age=32)
User(id=3, name=Sally, fullname=Sally Codearmo, email=sally@codearmo.com, age=None)
User(id=4, name=Simon, fullname=None, age=10)
##################################################
'''
Delete Table and Redesign
So there are certain problems with the way we instantiated this database. Below we are going to delete all the data and start again, but take more care about the way we design the table. Note that this will be in the intro.py file rather than the notebook.
As I intended this post to be for absolute beginners, don't worry too much if you don't undertstand anything that has @property directly above it, this is simply for illustrative purposes for ensuring that we get the right data type, but if interested read my post on properties
The objectives of this redesign are as follows:
- Make sure that all variables are provided (except age) by setting the nullable=False
- Make sure that the email is unique i.e. no two users can have the same email.
- For illustrative purposes we want to make sure the age is either None or an integer , else raise an exception.
#intro.py
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base
# Create an engine this will create a new sqlite database in current workingdir
engine = create_engine("sqlite:///hello_world.db", echo=True)
# Define a base class using declarative_base
Base = declarative_base()
# THIS IS TO DELETE ALL THE SCHEMA AND START AGAIN
class DataError(Exception):
"""
This is for making the sqlite database replicate
behaviour of other databases with typing
only for illustrative purposes, not too sure
if this is something that is advisable
"""
pass
# Define the User model
class User(Base):
__tablename__ = "users"
# there are no changes we need to make here , this is as we expect it to be
id = Column(Integer, primary_key=True)
# make this a required field, if not passed then error
name = Column(String, nullable=False)
# make this a required field, if not passed then error
fullname = Column(String, nullable=False)
# this is a required field AND it must not be used by other user
email = Column(String, nullable=False, unique=True)
# here we try enforce some typing, replicate behaviour of more production type dbs
# if we are given an age it must be an integer.
_age = Column("age", Integer, nullable=True)
def __repr__(self):
return f"User(id={self.id}, name={self.name}, fullname={self.fullname}, email={self.email}, age={self.age})"
@property
def age(self):
return self._age
@age.setter
def age(self, value):
if value is not None and not isinstance(value, int):
raise DataError(f"DataError occurred: input syntax for integer: {value}")
self._age = value
if __name__ == "__main__":
# Drop all the old data
Base.metadata.drop_all(engine)
# make it again
Base.metadata.create_all(engine)
'''
CREATE TABLE users (
id INTEGER NOT NULL,
name VARCHAR NOT NULL,
fullname VARCHAR NOT NULL,
email VARCHAR NOT NULL,
age INTEGER,
PRIMARY KEY (id),
UNIQUE (email)
)
'''
Back to Jupyter Notebook to play around
Now let's test out our new and improved database.
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# make connection again to database
engine = create_engine("sqlite:///hello_world.db", echo=False)
# Create a sessionmaker
session_pool = sessionmaker(bind=engine)
from intro import User
with session_pool() as conn:
## add a user
user1 = User(
name="John",
fullname="John Codearmo",
email="admin@codearmo.com",
age=32)
conn.add(user1)
conn.commit()
# get all users (expect only 1)
all_users = conn.query(User).all()
# Iterate through the list of users and print them
for user in all_users:
print(user)
'''
User(id=1, name=John, fullname=John Codearmo, email=admin@codearmo.com, age=32)
'''
Looks like we can still add users. Let's see if some of the errors get raised that we expect
Let's see if someone can join using an email that is already present in the database.
with session_pool() as conn:
user = User(
name="Peter",
fullname="Peter Codearmo",
email="admin@codearmo.com",
age=40)
conn.add(user)
conn.commit()
'''
IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: users.email
[SQL: INSERT INTO users (name, fullname, email, age) VALUES (?, ?, ?, ?)]
[parameters: ('Peter', 'Peter Codearmo', 'admin@codearmo.com', 40)]
'''
Looks like someone tried to join our app with an email that is already taken and we get the expected behaviour with an error raised.
Let's see if someone can join the app by passing in an age that is in string format.
with session_pool() as conn:
## add a user
user1 = User(
name="Peter",
fullname="Peter Codearmo",
email="Peter@codearmo.com",
age="forty")
conn.add(user1)
conn.commit()
'''
DataError: DataError occurred: input syntax for integer: forty
'''
OK great, if someone is trying to join our app and they give an age, then it must be in a integer format else we can an error.
What about if someone tries to join without passing in an email?
with session_pool() as conn:
## add a user
user1 = User(
name="Peter",
fullname="Peter Codearmo",
age=40)
conn.add(user1)
conn.commit()
'''
IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: users.email
[SQL: INSERT INTO users (name, fullname, email, age) VALUES (?, ?, ?, ?)]
[parameters: ('Peter', 'Peter Codearmo', None, 40)]
'''
This NOT NULL constraint failed relates to the nullable=False definition we created when we redesigned the database.
'''
email = Column(String, nullable=False, unique=True)
'''
What is the Point of Using a Database?
In this example, the reader may rightly believe that this would all be much more simple if we just stored the data in an Excel sheet for example. So let's simulate a real use case and explain.
Let's assume we are the IT guy at some international company, we have users across the world, and we also have employees across the world. Some employees will be writing to the database and some will be reading from it (think adding new customers and performing analytics on existing customers respectively).
Well, in this case we need some sort of shared resource. Let's create an example of a bigger database than we have been using so far. Note that I created this in Jupyter notebook.
Again for those not so familiar with either Python or SQL it is not necessary to understand the code below, it is simply a function to create random data in the database we created.
Make sure you have faker installed with 'pip install faker'
#Jupyter
import random
from faker import Faker
def get_random_age(n):
ages = [age for age in range(18, 100)]
ages += [None, None, None]
return random.choices(ages, k=n)
def create_fake_users(n_users):
locales = [
"en_US",
"en_GB",
"en_AU",
"en_CA",
"de_DE",
"fr_FR",
"es_ES",
"it_IT",
"ja_JP",
"ru_RU",
]
rand_ages = get_random_age(n_users)
users = []
Faker.seed(0)
faker = Faker(locale=locales)
for i in range(n_users):
fake_name = faker.name()
name = fake_name.split()[0]
email = faker.unique.email()
age = rand_ages[i]
user = User(name=name, fullname=fake_name, email=email, age=age)
users.append(user)
return users
import time
with session_pool() as conn:
t = time.time()
fake_users = create_fake_users(30000)
print(f" creating fake data time was {time.time() - t} seconds")
t = time.time()
conn.bulk_save_objects(fake_users)
conn.commit()
print(f" SQL stuff was {time.time() - t} seconds")
'''
creating fake data time was 4.46756911277771 seconds
SQL stuff was 0.2603566646575928 seconds
'''
Now we have a sizeable database to work with let's take a look at the last 10 entries we inserted into the database.
## view 20 of our fake users
with session_pool() as conn:
all_users = conn.query(User).all()
for user in all_users[-10:]:
print(user)
'''
User(id=29992, name=Elizabeth, fullname=Elizabeth Smith, email=edward40@example.net, age=92)
User(id=29993, name=Carol, fullname=Carol Myers, email=vfranke@example.com, age=40)
User(id=29994, name=Kim, fullname=Kim Fry, email=nanaminakamura@example.com, age=54)
User(id=29995, name=Elsbeth, fullname=Elsbeth Ruppersberger, email=louiseebert@example.com, age=95)
User(id=29996, name=Cirillo, fullname=Cirillo Gelli, email=qleopardi@example.org, age=92)
User(id=29997, name=Daniel, fullname=Daniel Traore du Huet, email=tloewer@example.com, age=36)
User(id=29998, name=Caitlin, fullname=Caitlin Hanson, email=david00@example.net, age=65)
User(id=29999, name=Ing., fullname=Ing. Jolanthe Weller, email=fujiwarasayuri@example.net, age=94)
User(id=30000, name=Uta, fullname=Uta Preiß, email=nancyochoa@example.org, age=64)
User(id=30001, name=Jérôme, fullname=Jérôme Lemoine, email=marcher@example.org, age=89)
'''
Use case 1
Assume we are in the analytics department at the Japanese Branch and we want to know the average age of our users. Well, we can do this very easily.
with session_pool() as conn:
# think of this as pulling an excel sheet of all users from the cloud
all_users = conn.query(User).all()
valid_ages = []
for user in all_users:
if user.age:
valid_ages.append(user.age)
print(f"Average age of user is {sum(valid_ages)/len(valid_ages)}")
'''
Average age of user is 58.529297820823246
'''
Use case 2
Another employee is working in customer service in Germany, and a customer has called in to update his email.
fullname='Jérôme Lemoine'
email='marcher@example.org'
new_email = 'jerome_new@example.com'
with session_pool() as conn:
# This is the same sheet as being pulled in from Japan
all_users = conn.query(User).all()
for user in all_users:
if user.fullname == fullname and user.email == email:
user.email = new_email
conn.commit()
I think you get the idea. There are of course much more efficient ways to make the queries above, which we will link to here once they are completed. If you have made it this far, now you have a database to play around with!
Suggested Exercises
Answer each of the following questions by performing queries on the database.
- How many of our users did not give an 'age' when they signed up to our company?
- How many of how users use a .net email and how many use a .org ?
- What % of our users are aged between 18-25 excluding the users that did not provide an age.
- Look in to why the conn.query(User).all() method is not the best way to go about things and see if you can come up with an alternative.
Now young Padawan, you can put "Performed complex SQL queries on a big data-set" on your csv!
I don't think it is too controversial to assert "The most difficult part of working with databases, is designing them" , interacting with the database is quite easy, so in the next few posts we will go over some more details regarding database design.