If you Google for this you’ll find that the easiest way to provide a full audit of everything that happened in your database is to create a duplicate of each table you need history and insert a copy of the record you are changing in there for every update.
For example if you have a table called client
id INTEGER,
firstname VARCHAR(255),
lastname VARCHAR(255),
datecreated TIMESTAMP
);
You will also create a table called client_history
id INTEGER,
firstname VARCHAR(255),
lastname VARCHAR(255),
datecreated TIMESTAMP,
history_creation TIMESTAMP,
history_user INTEGER – A foreign key to the user who updated this record if required
);
You’ll then attach an onUpdate trigger to the client table inserting all the data in your history table. This solution gives you the ability to keep track of everything that happens in your database, when it happened and who-done-it.
What this solution doesn’t deal with is schema changes. Also querying the state of a record at a point in time to join it with other tables might be a bit tricky.
Let’s start from the first point. Schema changes.
The most obvious and simple solution is to put in place a proper process for all database schema changes. i.e. whoever touches the schema is also in charge of updating the history table structure and the trigger on the main table. Doable but a bit too prone to human error if you ask me.
What I generally do is use a small function/stored procedure to create the history on a table. This procedure is in charge of both creating the history table the first time it’s called and updating the structure if the main table has changed.
In this example I’m going to use PostgreSQL. First because it’s a database I’m familiar with and more importantly being open-srouce you can just download it and try this yourself.
My example here is a bit verbose but it serves its purpose. There are many database-specific instructions you could use to extract a table structure or perform some of the simple tasks of this function. However, what I’m trying to demonstrate is the concept and not PostgreSQL trickery.
Most relational databases store the schema information in internal tables (pg_attribute, pg_class and pg_namespace in this case) so that’s what we are going to use to read the structure of our original table.
– create history for and a unique identifier for the record.
– in my case all tables have an ID column. Alternatively you could hand
– as a parameter also the name of the unique id column
CREATE OR REPLACE FUNCTION create_history(tablename VARCHAR(255), recordid INTEGER) RETURNS BOOLEAN AS $$
DECLARE
vhisttablename NAME; – history table name
vhistfieldcount INTEGER; – number of fields in history table
vtablefieldcount INTEGER; – number of fields in main table
vtmprowcount INTEGER; – temporary variable to store query results
vcurfield RECORD; – variable to loop over fields of main table to create history
vhisttablesql TEXT; – sql to create history table
vfieldlist TEXT; – list fo fields in main table
vhisttablefields TEXT; – list of fields in history table
vhistinsertsql TEXT; – SQL for insert statement
vtmptablename VARCHAR(255); – temp variable to check if history exists
BEGIN
vhisttablename := tablename||‘_hist’;
– Check if the history table exists, if not we need to create it
SELECT INTO vtmptablename relname
FROM pg_class
WHERE relname = vhisttablename;
vfieldlist := ”;
vhisttablefields := ”;
– count the fields in the history/current table if history exists
IF vtmptablename IS NOT NULL
THEN
SELECT INTO vhistfieldcount count(*)
FROM pg_attribute AS a
INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
WHERE a.attnum > 0
AND c.relname = vhisttablename – history table name
AND NOT a.attisdropped
AND pg_table_is_visible(c.oid);
SELECT INTO vtablefieldcount count(*)
FROM pg_attribute AS a
INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
WHERE a.attnum > 0
AND c.relname = tablename – main table
AND NOT a.attisdropped
AND pg_table_is_visible(c.oid);
END IF;
– Get all the attributes and their type from the original table
FOR vcurfield IN
SELECT a.attname AS COLUMN, format_type(a.atttypid, a.atttypmod) AS datatype
FROM pg_attribute AS a
INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
WHERE a.attnum > 0
AND c.relname = tablename
AND NOT a.attisdropped
AND pg_table_is_visible(c.oid)
LOOP
– populate lists of fields both for history creation and select from main table
vhisttablefields := vhisttablefields||vcurfield.COLUMN||‘ ‘||vcurfield.datatype||‘ NULL, ‘;
vfieldlist := vfieldlist||vcurfield.COLUMN||‘, ‘;
– If the history table exists and the number of fields is different
– from the main table (+3 as we add a timestamp, a user field and an history unique id)
IF vtmptablename IS NOT NULL AND vtablefieldcount+3 <> vhistfieldcount
THEN
– make sure that this is the missing field in the history table
PERFORM a.attname
FROM pg_attribute AS a
INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
WHERE a.attnum > 0
AND c.relname = vhisttablename
AND NOT a.attisdropped
AND a.attname = vcurfield.COLUMN
AND pg_table_is_visible(c.oid);
GET DIAGNOSTICS vtmprowcount = ROW_COUNT;
– If it is then generate the SQL and execute the alter table
IF vtmprowcount = 0
THEN
vhisttablesql := ‘ALTER TABLE ‘||vhisttablename||‘ ADD COLUMN ‘||vcurfield.COLUMN||‘ ‘||vcurfield.datatype||‘ NULL;’;
EXECUTE vhisttablesql;
END IF;
END IF;
END LOOP;
– The history table doesn’t exist. Create it
IF vtmptablename IS NULL OR vtmptablename = ”
THEN
vhisttablesql := ‘CREATE TABLE ‘||vhisttablename||‘ ( histid SERIAL PRIMARY KEY, ‘;
vhisttablesql := vhisttablesql||vhisttablefields||‘ history_creation TIMESTAMP NOT NULL DEFAULT now() );’;
EXECUTE vhisttablesql;
– create indexes
vhisttablesql := ‘ CREATE INDEX idx_’||vhisttablename||‘_1 ON ‘||vhisttablename||‘ (id); ‘;
EXECUTE vhisttablesql;
vhisttablesql := ‘ CREATE INDEX idx_’||vhisttablename||‘_2 ON ‘||vhisttablename||‘ (history_creation); ‘;
EXECUTE vhisttablesql;
END IF;
– Proceed with the history creation
vhistinsertsql := ‘INSERT INTO ‘||vhisttablename||‘ (‘||vfieldlist||‘ history_creation) SELECT ‘||vfieldlist||‘ now() FROM ‘||tablename||‘ WHERE id=’||recordid||‘;’;
RAISE NOTICE ‘Executing %’, vhistinsertsql;
EXECUTE vhistinsertsql;
GET DIAGNOSTICS vtmprowcount = ROW_COUNT;
IF vtmprowcount > 0
THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
LANGUAGE plpgsql;
You’ll notice this function only adds fields to the history table. That’s because even if we remove a field from the original table we still want to keep track of it in our history.
Now on to querying the history to retrieve the state of a record at a specific date.
This is a bit tricky. At the moment as far as I know only Oracle has the ability return a “virtual table” from a function. PostgreSQL can return a RECORD variable – which is great to use inside functions but once outside it loses its structure and turns into a comma separated list of values – or a %ROWTYPE you can define.
This unfortunately means that you’d have to create a custom function for each history table to returns a SET OF client – in our case.
What we are trying to achieve is something like what this query does.
FROM client_hist AS h
INNER JOIN (
SELECT min(history_creation) AS mintstamp, id AS id
FROM client_hist
WHERE history_creation > THE-TIME-YOU-NEED
AND id= YOUR-RECORD-ID
GROUP BY id
) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);
Basically the next record in the history after the time specified.
At this point so far as I can see we have two options. Either create a function that returns a specific type of ROW, or write a more generic history function to return only the id of the history record we need and not the whole row (If you look at the history creation function we are putting a histid column in there).
You could use this function in your queries to get the history record id out like this.
CREATE OR REPLACE FUNCTION select_hist_id(tablename VARCHAR(255), recordid INTEGER, tstamp TIMESTAMP) RETURNS INTEGER AS $$
DECLARE
curs REFCURSOR;
vid INTEGER;
BEGIN
OPEN curs FOR EXECUTE ‘SELECT h.histid
FROM ‘||tablename||‘_hist AS h
INNER JOIN (
SELECT min(history_creation) AS mintstamp, id
FROM ‘||tablename||‘_hist
WHERE history_creation > ‘”||tstamp||”‘
AND id=’||recid||‘
GROUP BY id
) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);’;
FETCH curs INTO vid;
RETURN vid;
END;
$$
LANGUAGE plpgsql;
– At this point you can just get ids like this
– records in client as of this time 2009-11-02 21:13:41.552601
SELECT select_hist_id(‘client’, id, ’2009-11-02 21:13:41.552601′::TIMESTAMP) FROM client;
I’m sure you can figure out the rest.
Be careful this is very slow on large data sets. If you are planning to work on millions of records then you should consider building a history lookup function for each table either defining data types for your RECORD or using OUT variables.














