Widget actions and SQL FU for Capacity Management

sqlfu-mini

Say you have an object, widget, that you want to keep track of all of the actions taken on it, including CREATE, TASTE, and DELETE. Each widget will only have 1 CREATE and 1 DELETE (when deleted) action, but may include many TASTE actions. These widgets come in three flavors: VANILLA, CHOCOLATE and STRAWBERRY.

This article will demonstrate a DB structure to support this and then some very interesting SQL queries you can run later for statistics to help with capacity management. There are many real world applications for this pattern.

Let’s say you need different types of servers to support VANILLA, CHOCOLATE and STRAWBERRY widgets, and that each server will support 1000 LIVE widgets. If a widget is deleted, that’s space for another widget to live. After a month of running, how many LIVE widgets of each flavor did you increase by? How many servers do you need to buy so you don’t run out of capacity?

Here is a DB structure to support this scenario:

widget
  id
  name
  status ENUM['LIVE', 'DELETED']
  flavor ENUM['VANILLA', 'CHOCOLATE', 'STRAWBERRY']
widget_action
  id
  widget_id
  type ENUM['CREATE', 'TASTE', 'DELETE']
  created_at DATETIME

In order to run stats on things like run rate of the different flavors of widgets, you need to adopt the notion that you won’t delete rows from the widget table. Instead of deleting the row, mark the row status=’DELETED’ and log a widget_action with a type=’DELETE’.

Create widget Database

CREATE DATABASE widget;
use widget;
DROP TABLE IF EXISTS `widget`;
CREATE TABLE `widget` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL DEFAULT '',
  `status` enum('LIVE','DELETED') DEFAULT NULL,
  `flavor` enum('VANILLA','CHOCOLATE','STRAWBERRY') DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=18001 DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `widget_action`;
CREATE TABLE `widget_action` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `widget_id` bigint(20) NOT NULL,
  `type` enum('CREATE','TASTE','DELETE') DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `widget_id_fk` (`widget_id`),
  CONSTRAINT `widget_id_fk` FOREIGN KEY (`widget_id`) REFERENCES `widget` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=46583 DEFAULT CHARSET=utf8;

grant all privileges on widget.* to widget@localhost identified by 'widgetpass';

Install the Python MySQL Connector

pip install --allow-external mysql-connector-python mysql-connector-python

Fill the database with dummy data
This program fills the database with dummy data, because we don’t have anything else to do that right now. If this was a real program, the data would get filled through some other more normal means. This is just a quick and dirty script to get some data into our database. Note: Purposely inserting 2x the amount of VANILLA than CHOCOLATE and STRAWBERRY and adding a little bit of randomization.

from datetime import datetime, timedelta
import mysql.connector
import random


def reset_database(cursor):
    """reset the database"""
    query = "TRUNCATE TABLE widget_action"
    cursor.execute(query)
    query = "DELETE FROM widget"
    cursor.execute(query)
    query = "ALTER TABLE widget AUTO_INCREMENT=1"
    cursor.execute(query)


def insert_widget_lifecycle(cursor, flavor, create_at, taste_at,
                            delete_at=None):
    """Inserts a single widget and a set of widget_actions:
          CREATE, TASTE, DELETE
       at the given created_at timestamps"""
    # setup the queries
    query1 = ("INSERT INTO widget SET name=%(name)s, status='LIVE', "
              "flavor=%(flavor)s")
    query2 = ("INSERT INTO widget_action SET widget_id=%(widget_id)s, "
              "type='CREATE', created_at=%(created_at)s")
    query3 = ("INSERT INTO widget_action SET widget_id=%(widget_id)s, "
              "type='TASTE', created_at=%(created_at)s")
    query4 = ("INSERT INTO widget_action SET widget_id=%(widget_id)s, "
              "type='DELETE', created_at=%(created_at)s")
    query5 = ("UPDATE widget SET status='DELETED' WHERE id=%(widget_id)s")

    data = {'name': 'something', 'flavor': flavor}
    cursor.execute(query1, data)
    id = cursor.lastrowid
    data = {'widget_id': id, 'created_at': create_at}
    cursor.execute(query2, data)

    data = {'widget_id': id, 'created_at': taste_at}
    cursor.execute(query3, data)

    if delete_at is not None:
        data = {'widget_id': id, 'created_at': delete_at}
        cursor.execute(query4, data)
        data = {'widget_id': id}
        cursor.execute(query5, data)


def insert_all(cursor, now, num, spread):
    # insert widgets and a set of actions
    for i in range(num):
        ago = timedelta(seconds=(num+spread-i)*spread)
        until = timedelta(seconds=random.randint(0,
                                                 ago.total_seconds()-spread))
        flavor = random.choice(['VANILLA', 'VANILLA', 'CHOCOLATE',
                                'STRAWBERRY'])
        create_at = now - ago
        taste_at = now - timedelta(seconds=random.randint(
            until.total_seconds(), ago.total_seconds()))
        delete_at = None
        # delete approx 1 of 3 rows:
        if random.choice([1, 1, 2]) == 2:
            delete_at = now - until
        insert_widget_lifecycle(cursor, flavor, create_at, taste_at, delete_at)


connection = mysql.connector.connect(user='widget', password='widgetpass',
                                     host='localhost', database='widget')
cursor = connection.cursor()
reset_database(cursor)
now = datetime(2015, 3, 1, 23, 0, 0)
insert_all(cursor, now, 9000, 300)


connection.commit()
cursor.close()
connection.close()

Query the data

from datetime import datetime
import mysql.connector
from pprint import pprint


def count_matrix_at(cursor, at):
    answer = {}
    query = ("SELECT w.flavor AS flavor, COUNT(*) AS cnt "
             "FROM widget_action wa INNER JOIN widget w "
             "  ON (wa.widget_id=w.id and wa.type='CREATE') "
             "LEFT JOIN widget_action wad "
             "  ON (wad.widget_id=w.id AND wad.type='DELETE') "
             "WHERE wa.created_at <= %(at)s "
             "  AND (wad.id IS NULL OR wad.created_at > %(at)s) "
             "GROUP BY flavor")
    data = {'at': at}
    cursor.execute(query, data)
    for (flavor, cnt) in cursor:
        answer[flavor] = cnt
    return answer


def get_runrate(cursor, start, end):
    before = count_matrix_at(cursor, start)
    after = count_matrix_at(cursor, end)
    answer = []
    keys = {}
    for key in before.keys():
        keys[key] = 1
    for key in after.keys():
        keys[key] = 1
    for key in keys.keys():
        answer.append({'flavor': key,
                       'start_count': before.get(key, 0),
                       'end_count': after.get(key, 0),
                       'run_rate': after.get(key, 0) - before.get(key, 0)})
    return answer


connection = mysql.connector.connect(user='widget', password='widgetpass',
                                     host='localhost', database='widget')
cursor = connection.cursor()

start = datetime(2015, 2, 1, 0, 0, 0)
end = datetime(2015, 3, 1, 0, 0, 0)

run_rate = get_runrate(cursor, start, end)
print("Run rate from {} to {}".format(start, end))
for result in run_rate:
    print(result['flavor'])
    pprint(result)

cursor.close()
connection.close()

Output:

Run rate from 2015-02-01 00:00:00 to 2015-03-01 00:00:00
CHOCOLATE
{'end_count': 1661,
 'flavor': 'CHOCOLATE',
 'run_rate': 1421,
 'start_count': 240}
VANILLA
{'end_count': 3110, 'flavor': 'VANILLA', 'run_rate': 2647, 'start_count': 463}
STRAWBERRY
{'end_count': 1587,
 'flavor': 'STRAWBERRY',
 'run_rate': 1345,
 'start_count': 242}

We can see that VANILLA has a run rate of 2647, approximately double that of CHOCOLATE and STRAWBERRY.

I know I’m misusing the term run rate. Rate would imply a percentage and all we’re printing is the delta from end to start over the month of February. This is most interesting if you need to purchase supply to stay in front of the growth of widgets. Recall you purchase servers ahead of time to support the number of active widgets, and the type of server you buy is different for VANILLA, CHOCOLATE and STRAWBERRY, and you need 1 server for every 1000 flavors. Based on the output above, you’d need to purchase 3 VANILLA, 2 CHOCOLATE and 2 STRAWBERRY servers. If you plot the monthly run_rate over the course of many months, you’ll see if it’s constant or trending and can make adjustments for the curve. The goal of your capacity management is to purchase servers just in time. You want to never run out of capacity, but don’t want to over purchase either. That’s the basic idea anyway. Obviously this just the tip of the iceberg on concepts needed for full capacity management.

I hope this illustration helped you in some way.

Next up Write unit tests for insert and query programs. I am a big proponent of TDD. Normally, I’d write the unit tests first. I didn’t this time only because the example was about the pattern and I was rushing to make the point. When I have time I’ll come back and add unit tests for the programs above.