Tag Archives: History

Tracking database records history

If you Google for this you’ll find that the easiest way to provide a full audit of everything that happened in your database is to create a duplicate of each table you need history and insert a copy of the record you are changing in there for every update.

For example if you have a table called client

CREATE TABLE client (
  id INTEGER,
  firstname VARCHAR(255),
  lastname VARCHAR(255),
  datecreated TIMESTAMP
);

You will also create a table called client_history

CREATE TABLE client_history (
  id INTEGER,
  firstname VARCHAR(255),
  lastname VARCHAR(255),
  datecreated TIMESTAMP,
  history_creation TIMESTAMP,
  history_user INTEGER -- A foreign key to the user who updated this record if required
);

You’ll then attach an onUpdate trigger to the client table inserting all the data in your history table. This solution gives you the ability to keep track of everything that happens in your database, when it happened and who-done-it.
What this solution doesn’t deal with is schema changes. Also querying the state of a record at a point in time to join it with other tables might be a bit tricky.

Let’s start from the first point. Schema changes.
The most obvious and simple solution is to put in place a proper process for all database schema changes. i.e. whoever touches the schema is also in charge of updating the history table structure and the trigger on the main table. Doable but a bit too prone to human error if you ask me.

What I generally do is use a small function/stored procedure to create the history on a table. This procedure is in charge of both creating the history table the first time it’s called and updating the structure if the main table has changed.
In this example I’m going to use PostgreSQL. First because it’s a database I’m familiar with and more importantly being open-srouce you can just download it and try this yourself.

My example here is a bit verbose but it serves its purpose. There are many database-specific instructions you could use to extract a table structure or perform some of the simple tasks of this function. However, what I’m trying to demonstrate is the concept and not PostgreSQL trickery.
Most relational databases store the schema information in internal tables (pg_attribute, pg_class and pg_namespace in this case) so that’s what we are going to use to read the structure of our original table.

-- This function expects as a parameter the name of the table you want to
-- create history for and a unique identifier for the record.
-- in my case all tables have an ID column. Alternatively you could hand
-- as a parameter also the name of the unique id column
CREATE OR REPLACE FUNCTION create_history(tablename VARCHAR(255), recordid INTEGER) RETURNS BOOLEAN AS $$
DECLARE
  vhisttablename NAME; -- history table name
  vhistfieldcount INTEGER;  -- number of fields in history table
  vtablefieldcount INTEGER; -- number of fields in main table
  vtmprowcount INTEGER; -- temporary variable to store query results
  vcurfield RECORD; -- variable to loop over fields of main table to create history
  vhisttablesql TEXT; -- sql to create history table
  vfieldlist TEXT; -- list fo fields in main table
  vhisttablefields TEXT; -- list of fields in history table
  vhistinsertsql TEXT; -- SQL for insert statement
  vtmptablename VARCHAR(255); -- temp variable to check if history exists
BEGIN
  vhisttablename := tablename||'_hist';

  -- Check if the history table exists, if not we need to create it
  SELECT INTO vtmptablename relname
  FROM pg_class
  WHERE relname = vhisttablename;

  vfieldlist := '';
  vhisttablefields := '';

  -- count the fields in the history/current table if history exists
  IF vtmptablename IS NOT NULL
  THEN
    SELECT INTO vhistfieldcount count(*)
    FROM pg_attribute AS a
    INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
    INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
    WHERE a.attnum > 0
    AND c.relname = vhisttablename -- history table name
    AND NOT a.attisdropped
    AND pg_table_is_visible(c.oid);

    SELECT INTO vtablefieldcount count(*)
    FROM pg_attribute AS a
    INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
    INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
    WHERE a.attnum > 0
    AND c.relname = tablename -- main table
    AND NOT a.attisdropped
    AND pg_table_is_visible(c.oid);
  END IF;

  -- Get all the attributes and their type from the original table
  FOR vcurfield IN
  SELECT a.attname AS column, format_type(a.atttypid, a.atttypmod) as datatype
  FROM pg_attribute AS a
  INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
  INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
  WHERE a.attnum > 0
  AND c.relname = tablename
  AND NOT a.attisdropped
  AND pg_table_is_visible(c.oid)
  LOOP
    -- populate lists of fields both for history creation and select from main table
    vhisttablefields := vhisttablefields||vcurfield.column||' '||vcurfield.datatype||' NULL, ';
    vfieldlist := vfieldlist||vcurfield.column||', ';

    -- If the history table exists and the number of fields is different
    -- from the main table (+3 as we add a timestamp, a user field and an history unique id)
    IF vtmptablename IS NOT NULL AND vtablefieldcount+3  vhistfieldcount
    THEN
      -- make sure that this is the missing field in the history table
       PERFORM a.attname
       FROM pg_attribute AS a
       INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
       INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
       WHERE a.attnum > 0
       AND c.relname = vhisttablename
       AND NOT a.attisdropped
       AND a.attname = vcurfield.column
       AND pg_table_is_visible(c.oid);

       GET DIAGNOSTICS vtmprowcount = ROW_COUNT;

      -- If it is then generate the SQL and execute the alter table
       IF vtmprowcount = 0
       THEN
         vhisttablesql := 'ALTER TABLE '||vhisttablename||' ADD COLUMN '||vcurfield.column||' '||vcurfield.datatype||' NULL;';
         EXECUTE vhisttablesql;
       END IF;
     END IF;
  END LOOP;

  -- The history table doesn't exist. Create it
  IF vtmptablename IS NULL OR vtmptablename = ''
  THEN
    vhisttablesql := 'CREATE TABLE '||vhisttablename||' ( histid SERIAL PRIMARY KEY, ';

    vhisttablesql := vhisttablesql||vhisttablefields||' history_creation TIMESTAMP NOT NULL DEFAULT now() );';

    EXECUTE vhisttablesql;

    -- create indexes
    vhisttablesql := ' CREATE INDEX idx_'||vhisttablename||'_1 ON '||vhisttablename||' (id); ';
    EXECUTE vhisttablesql;

    vhisttablesql := ' CREATE INDEX idx_'||vhisttablename||'_2 ON '||vhisttablename||' (history_creation); ';
    EXECUTE vhisttablesql;
  END IF;

  -- Proceed with the history creation
  vhistinsertsql := 'INSERT INTO '||vhisttablename||' ('||vfieldlist||' history_creation) SELECT '||vfieldlist||' now() FROM '||tablename||' WHERE id='||recordid||';';

  RAISE NOTICE 'Executing %', vhistinsertsql;
  EXECUTE vhistinsertsql;

  GET DIAGNOSTICS vtmprowcount = ROW_COUNT;
  IF vtmprowcount > 0
  THEN
    RETURN TRUE;
  ELSE
    RETURN FALSE;
  END IF;
END;
$$
LANGUAGE plpgsql;

You’ll notice this function only adds fields to the history table. That’s because even if we remove a field from the original table we still want to keep track of it in our history.

Now on to querying the history to retrieve the state of a record at a specific date.

This is a bit tricky. At the moment as far as I know only Oracle has the ability return a “virtual table” from a function. PostgreSQL can return a RECORD variable – which is great to use inside functions but once outside it loses its structure and turns into a comma separated list of values – or a %ROWTYPE you can define.
This unfortunately means that you’d have to create a custom function for each history table to returns a SET OF client – in our case.

What we are trying to achieve is something like what this query does.

SELECT h.id, firstname, lastname, datecreated, hist_creation
FROM client_hist AS h
INNER JOIN (
  SELECT min(history_creation) AS mintstamp, id AS id
  FROM client_hist
  WHERE history_creation > THE-TIME-YOU-NEED
  AND id= YOUR-RECORD-ID
  GROUP BY id
) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);

Basically the next record in the history after the time specified.

At this point so far as I can see we have two options. Either create a function that returns a specific type of ROW, or write a more generic history function to return only the id of the history record we need and not the whole row (If you look at the history creation function we are putting a histid column in there).

You could use this function in your queries to get the history record id out like this.

-- This function returns the histid you need to look at in your history table
CREATE OR REPLACE FUNCTION select_hist_id(tablename VARCHAR(255), recordid INTEGER, tstamp TIMESTAMP) RETURNS INTEGER AS $$
DECLARE
  curs REFCURSOR;
  vid INTEGER;
BEGIN
  OPEN curs FOR EXECUTE 'SELECT h.histid
  FROM '||tablename||'_hist AS h 
  INNER JOIN (
    SELECT min(history_creation) AS mintstamp, id 
    FROM '||tablename||'_hist
    WHERE history_creation > '''||tstamp||'''
    AND id='||recid||'
    GROUP BY id
  ) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);';

  FETCH curs INTO vid;

  RETURN vid;
END;
$$
LANGUAGE plpgsql;

-- At this point you can just get ids like this
-- records in client as of this time 2009-11-02 21:13:41.552601
SELECT select_hist_id('client', id, '2009-11-02 21:13:41.552601'::TIMESTAMP) FROM client;

I’m sure you can figure out the rest.

Be careful this is very slow on large data sets. If you are planning to work on millions of records then you should consider building a history lookup function for each table either defining data types for your RECORD or using OUT variables.

CREATE FUNCTION foo(recordid int, firstname OUT VARCHAR(10)...)
Tagged , , , ,
Follow

Get every new post delivered to your Inbox.