SQLAlchemy (Flask) count model instances by unique values

One comes to a task that has to do with counting items in a database. We will describe the right approach here. Despite being so obvious I did not find much of the docs for junior developers to watch and learn. Here is a sample task and solution:
Let's assume we have a model like so:
class Cycle(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    object_id = db.Column(db.String, nullable=False)
Sample date populated into it will be:
{
   id: 1,
   object_id: 'unique1'
}, {
   id: 2,
   object_id: 'unique1'
}, {
   id: 3,
   object_id: 'unique2'
}, {
   id: 4,
   object_id: 'unique2'
}, {
   id: 5,
   object_id: 'unique2'
}, {
   id: 6,
   object_id: 'unique3'
}
We need to count unique model instances with same object_id. To achieve this relatively simple task one would go straightforward. E.g. Fetch all the Cycle instances with a simple query and then iterate through them via for cycle. Looks like so:
objects = Cycle.query.all()
cycles_unique = {}
for object in objects:
    if object.object_id in cycles_unique.items():
        cycles_unique[object.object_id] += 1
    else:
        cycles_unique[object.object_id] = 1

print(cycles_unique)
The output would be like so:
{
    'unique1': 2, 
    'unique2': 3, 
    'unique3': 1
}
Job done. All ok. HOWEVER! Imagine this database column would grow to big sizes. E.g. Those Cycles are 1 000 000 records. It is not a rare situation in coding world. Called - BigData ;)

We have a problem with cycle like so. Flask or whatever gear i used behind those models would need to load all those 1 million records into memory and then process them one by one. Even with fast server it will be not an easy task in the scope of rendering a webpage. User might wait quite long for the answer.

Making it faster means doing this count on a DB side. SQL databases are built specially for tasks like so and will do much much better at this. Incomparable to code written above.

Proper implementation would be by using SQL syntax and handling on a DB side. Code will use a query that counts unique records with that one particular field. Looks like so:
from sqlalchemy import func
from sqlalchemy.sql import label

session.query(
        label('object_id', Crop.object_id),  func.count(Cycle.object_id)
).group_by(Cycle.object_id).all()
This will output the data similar to the above. However output is with numbers of Long type (Python Long).
[('unique1', 2L), ('unique2', 3L), ('unique3', 1L)]
This is basically how to solve this task properly with SQL Alchemy and Flask models.
Hope it helps someone to write more reliable and better code.

Comments

Popular posts from this blog

Django: Resetting Passwords (with internal tools)

Time Capsule for $25

Vagrant error: * Unknown configuration section 'hostmanager'.