<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SAPessi &#187; Database</title>
	<atom:link href="http://sapessi.com/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://sapessi.com</link>
	<description>Perfection of means and confusion of aims...</description>
	<lastBuildDate>Fri, 02 Mar 2012 09:26:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Tracking database records history</title>
		<link>http://sapessi.com/2009/11/tracking-database-records-history/</link>
		<comments>http://sapessi.com/2009/11/tracking-database-records-history/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 12:40:14 +0000</pubDate>
		<dc:creator>Stefano Buliani</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[History]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://sapessi.com/?p=288</guid>
		<description><![CDATA[If you Google for this you&#8217;ll find that the easiest way to provide a full audit of everything that happened in your database is to create a duplicate of each table you need history and insert a copy of the record you are changing in there for every update. For example if you have a [...]<!-- Easy AdSense V2.82 -->
<!-- Post[count: 2] -->
<div class="ezAdsense adsense adsense-leadout" style="text-align:center;margin:12px;"><script type="text/javascript"><!--
google_ad_client = "pub-8456780651289352";
/* 468x60, created 11/24/09 */
google_ad_slot = "7140896000";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></div>
<!-- Easy AdSense V2.82 -->

]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;source=sapessi&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>If you Google for this you&#8217;ll find that the easiest way to provide a full audit of everything that happened in your database is to create a duplicate of each table you need history and insert a copy of the record you are changing in there for every update.</p>
<p>For example if you have a table called client</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="kw1">CREATE</span> <span class="kw1">TABLE</span> client <span class="br0">&#40;</span><br />
&nbsp; id INTEGER<span class="sy0">,</span><br />
&nbsp; firstname VARCHAR<span class="br0">&#40;</span>255<span class="br0">&#41;</span><span class="sy0">,</span><br />
&nbsp; lastname VARCHAR<span class="br0">&#40;</span>255<span class="br0">&#41;</span><span class="sy0">,</span><br />
&nbsp; datecreated TIMESTAMP<br />
<span class="br0">&#41;</span>;</div>
</div>
<p>You will also create a table called client_history</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="kw1">CREATE</span> <span class="kw1">TABLE</span> client_history <span class="br0">&#40;</span><br />
&nbsp; id INTEGER<span class="sy0">,</span><br />
&nbsp; firstname VARCHAR<span class="br0">&#40;</span><span class="nu0">255</span><span class="br0">&#41;</span><span class="sy0">,</span><br />
&nbsp; lastname VARCHAR<span class="br0">&#40;</span><span class="nu0">255</span><span class="br0">&#41;</span><span class="sy0">,</span><br />
&nbsp; datecreated TIMESTAMP<span class="sy0">,</span><br />
&nbsp; history_creation TIMESTAMP<span class="sy0">,</span><br />
&nbsp; history_user INTEGER <span class="co1">&#8211; A foreign key to the user who updated this record if required</span><br />
<span class="br0">&#41;</span>;</div>
</div>
<p>You&#8217;ll then attach an onUpdate trigger to the client table inserting all the data in your history table. This solution gives you the ability to keep track of everything that happens in your database, when it happened and who-done-it.<br />
What this solution doesn&#8217;t deal with is schema changes. Also querying the state of a record at a point in time to join it with other tables might be a bit tricky.</p>
<p>Let&#8217;s start from the first point. <strong>Schema changes</strong>.<br />
The most obvious and simple solution is to put in place a proper process for all database schema changes. i.e. whoever touches the schema is also in charge of updating the history table structure and the trigger on the main table. Doable but a bit too prone to human error if you ask me.</p>
<p>What I generally do is use a small function/stored procedure to create the history on a table. This procedure is in charge of both creating the history table the first time it&#8217;s called and updating the structure if the main table has changed.<br />
In this example I&#8217;m going to use <a href="http://www.postgresql.org/" target="_blank">PostgreSQL</a>. First because it&#8217;s a database I&#8217;m familiar with and more importantly being open-srouce you can just download it and try this yourself.</p>
<p>My example here is a bit verbose but it serves its purpose. There are many database-specific instructions you could use to extract a table structure or perform some of the simple tasks of this function. However, what I&#8217;m trying to demonstrate is the concept and not PostgreSQL trickery.<br />
Most relational databases store the schema information in internal tables (pg_attribute, pg_class and pg_namespace in this case) so that&#8217;s what we are going to use to read the structure of our original table.</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="co1">&#8211; This function expects as a parameter the name of the table you want to</span><br />
<span class="co1">&#8211; create history for and a unique identifier for the record.</span><br />
<span class="co1">&#8211; in my case all tables have an ID column. Alternatively you could hand</span><br />
<span class="co1">&#8211; as a parameter also the name of the unique id column</span><br />
<span class="kw1">CREATE</span> <span class="kw1">OR</span> <span class="kw1">REPLACE</span> <span class="kw1">FUNCTION</span> create_history<span class="br0">&#40;</span>tablename VARCHAR<span class="br0">&#40;</span>255<span class="br0">&#41;</span><span class="sy0">,</span> recordid INTEGER<span class="br0">&#41;</span> RETURNS <span class="kw1">BOOLEAN</span> <span class="kw1">AS</span> $$<br />
DECLARE<br />
&nbsp; vhisttablename NAME; <span class="co1">&#8211; history table name</span><br />
&nbsp; vhistfieldcount INTEGER; &nbsp;<span class="co1">&#8211; number of fields in history table</span><br />
&nbsp; vtablefieldcount INTEGER; <span class="co1">&#8211; number of fields in main table</span><br />
&nbsp; vtmprowcount INTEGER; <span class="co1">&#8211; temporary variable to store query results</span><br />
&nbsp; vcurfield RECORD; <span class="co1">&#8211; variable to loop over fields of main table to create history</span><br />
&nbsp; vhisttablesql TEXT; <span class="co1">&#8211; sql to create history table</span><br />
&nbsp; vfieldlist TEXT; <span class="co1">&#8211; list fo fields in main table</span><br />
&nbsp; vhisttablefields TEXT; <span class="co1">&#8211; list of fields in history table</span><br />
&nbsp; vhistinsertsql TEXT; <span class="co1">&#8211; SQL for insert statement</span><br />
&nbsp; vtmptablename VARCHAR<span class="br0">&#40;</span>255<span class="br0">&#41;</span>; <span class="co1">&#8211; temp variable to check if history exists</span><br />
BEGIN<br />
&nbsp; vhisttablename :<span class="sy0">=</span> tablename<span class="sy0">||</span><span class="st0">&#8216;_hist&#8217;</span>;</p>
<p>&nbsp; <span class="co1">&#8211; Check if the history table exists, if not we need to create it</span><br />
&nbsp; <span class="kw1">SELECT</span> <span class="kw1">INTO</span> vtmptablename relname<br />
&nbsp; <span class="kw1">FROM</span> pg_class<br />
&nbsp; <span class="kw1">WHERE</span> relname <span class="sy0">=</span> vhisttablename;</p>
<p>&nbsp; vfieldlist :<span class="sy0">=</span> <span class="st0">&#8221;</span>;<br />
&nbsp; vhisttablefields :<span class="sy0">=</span> <span class="st0">&#8221;</span>;</p>
<p>&nbsp; <span class="co1">&#8211; count the fields in the history/current table if history exists</span><br />
&nbsp; <span class="kw1">IF</span> vtmptablename <span class="kw1">IS</span> <span class="kw1">NOT</span> <span class="kw1">NULL</span><br />
&nbsp; THEN<br />
&nbsp; &nbsp; <span class="kw1">SELECT</span> <span class="kw1">INTO</span> vhistfieldcount count<span class="br0">&#40;</span><span class="sy0">*</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">FROM</span> pg_attribute <span class="kw1">AS</span> a<br />
&nbsp; &nbsp; <span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_class <span class="kw1">AS</span> c <span class="kw1">ON</span> <span class="br0">&#40;</span>c<span class="sy0">.</span>oid <span class="sy0">=</span> a<span class="sy0">.</span>attrelid<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_namespace <span class="kw1">AS</span> n <span class="kw1">ON</span> <span class="br0">&#40;</span>n<span class="sy0">.</span>oid <span class="sy0">=</span> c<span class="sy0">.</span>relnamespace<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">WHERE</span> a<span class="sy0">.</span>attnum <span class="sy0">&gt;</span> 0<br />
&nbsp; &nbsp; <span class="kw1">AND</span> c<span class="sy0">.</span>relname <span class="sy0">=</span> vhisttablename <span class="co1">&#8211; history table name</span><br />
&nbsp; &nbsp; <span class="kw1">AND</span> <span class="kw1">NOT</span> a<span class="sy0">.</span>attisdropped<br />
&nbsp; &nbsp; <span class="kw1">AND</span> pg_table_is_visible<span class="br0">&#40;</span>c<span class="sy0">.</span>oid<span class="br0">&#41;</span>;</p>
<p>&nbsp; &nbsp; <span class="kw1">SELECT</span> <span class="kw1">INTO</span> vtablefieldcount count<span class="br0">&#40;</span><span class="sy0">*</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">FROM</span> pg_attribute <span class="kw1">AS</span> a<br />
&nbsp; &nbsp; <span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_class <span class="kw1">AS</span> c <span class="kw1">ON</span> <span class="br0">&#40;</span>c<span class="sy0">.</span>oid <span class="sy0">=</span> a<span class="sy0">.</span>attrelid<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_namespace <span class="kw1">AS</span> n <span class="kw1">ON</span> <span class="br0">&#40;</span>n<span class="sy0">.</span>oid <span class="sy0">=</span> c<span class="sy0">.</span>relnamespace<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">WHERE</span> a<span class="sy0">.</span>attnum <span class="sy0">&gt;</span> 0<br />
&nbsp; &nbsp; <span class="kw1">AND</span> c<span class="sy0">.</span>relname <span class="sy0">=</span> tablename <span class="co1">&#8211; main table</span><br />
&nbsp; &nbsp; <span class="kw1">AND</span> <span class="kw1">NOT</span> a<span class="sy0">.</span>attisdropped<br />
&nbsp; &nbsp; <span class="kw1">AND</span> pg_table_is_visible<span class="br0">&#40;</span>c<span class="sy0">.</span>oid<span class="br0">&#41;</span>;<br />
&nbsp; END <span class="kw1">IF</span>;</p>
<p>&nbsp; <span class="co1">&#8211; Get all the attributes and their type from the original table</span><br />
&nbsp; <span class="kw1">FOR</span> vcurfield <span class="kw1">IN</span><br />
&nbsp; <span class="kw1">SELECT</span> a<span class="sy0">.</span>attname <span class="kw1">AS</span> <span class="kw1">COLUMN</span><span class="sy0">,</span> format_type<span class="br0">&#40;</span>a<span class="sy0">.</span>atttypid<span class="sy0">,</span> a<span class="sy0">.</span>atttypmod<span class="br0">&#41;</span> <span class="kw1">AS</span> datatype<br />
&nbsp; <span class="kw1">FROM</span> pg_attribute <span class="kw1">AS</span> a<br />
&nbsp; <span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_class <span class="kw1">AS</span> c <span class="kw1">ON</span> <span class="br0">&#40;</span>c<span class="sy0">.</span>oid <span class="sy0">=</span> a<span class="sy0">.</span>attrelid<span class="br0">&#41;</span><br />
&nbsp; <span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_namespace <span class="kw1">AS</span> n <span class="kw1">ON</span> <span class="br0">&#40;</span>n<span class="sy0">.</span>oid <span class="sy0">=</span> c<span class="sy0">.</span>relnamespace<span class="br0">&#41;</span><br />
&nbsp; <span class="kw1">WHERE</span> a<span class="sy0">.</span>attnum <span class="sy0">&gt;</span> 0<br />
&nbsp; <span class="kw1">AND</span> c<span class="sy0">.</span>relname <span class="sy0">=</span> tablename<br />
&nbsp; <span class="kw1">AND</span> <span class="kw1">NOT</span> a<span class="sy0">.</span>attisdropped<br />
&nbsp; <span class="kw1">AND</span> pg_table_is_visible<span class="br0">&#40;</span>c<span class="sy0">.</span>oid<span class="br0">&#41;</span><br />
&nbsp; LOOP<br />
&nbsp; &nbsp; <span class="co1">&#8211; populate lists of fields both for history creation and select from main table</span><br />
&nbsp; &nbsp; vhisttablefields :<span class="sy0">=</span> vhisttablefields<span class="sy0">||</span>vcurfield<span class="sy0">.</span><span class="kw1">COLUMN</span><span class="sy0">||</span><span class="st0">&#8216; &#8216;</span><span class="sy0">||</span>vcurfield<span class="sy0">.</span>datatype<span class="sy0">||</span><span class="st0">&#8216; NULL, &#8216;</span>;<br />
&nbsp; &nbsp; vfieldlist :<span class="sy0">=</span> vfieldlist<span class="sy0">||</span>vcurfield<span class="sy0">.</span><span class="kw1">COLUMN</span><span class="sy0">||</span><span class="st0">&#8216;, &#8216;</span>;</p>
<p>&nbsp; &nbsp; <span class="co1">&#8211; If the history table exists and the number of fields is different</span><br />
&nbsp; &nbsp; <span class="co1">&#8211; from the main table (+3 as we add a timestamp, a user field and an history unique id)</span><br />
&nbsp; &nbsp; <span class="kw1">IF</span> vtmptablename <span class="kw1">IS</span> <span class="kw1">NOT</span> <span class="kw1">NULL</span> <span class="kw1">AND</span> vtablefieldcount<span class="sy0">+</span><span class="nu0">3</span> <span class="sy0">&lt;&gt;</span> vhistfieldcount<br />
&nbsp; &nbsp; THEN<br />
&nbsp; &nbsp; &nbsp; <span class="co1">&#8211; make sure that this is the missing field in the history table</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;PERFORM a<span class="sy0">.</span>attname<br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">FROM</span> pg_attribute <span class="kw1">AS</span> a<br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_class <span class="kw1">AS</span> c <span class="kw1">ON</span> <span class="br0">&#40;</span>c<span class="sy0">.</span>oid <span class="sy0">=</span> a<span class="sy0">.</span>attrelid<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">INNER</span> <span class="kw1">JOIN</span> pg_namespace <span class="kw1">AS</span> n <span class="kw1">ON</span> <span class="br0">&#40;</span>n<span class="sy0">.</span>oid <span class="sy0">=</span> c<span class="sy0">.</span>relnamespace<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">WHERE</span> a<span class="sy0">.</span>attnum <span class="sy0">&gt;</span> 0<br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">AND</span> c<span class="sy0">.</span>relname <span class="sy0">=</span> vhisttablename<br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">AND</span> <span class="kw1">NOT</span> a<span class="sy0">.</span>attisdropped<br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">AND</span> a<span class="sy0">.</span>attname <span class="sy0">=</span> vcurfield<span class="sy0">.</span><span class="kw1">COLUMN</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">AND</span> pg_table_is_visible<span class="br0">&#40;</span>c<span class="sy0">.</span>oid<span class="br0">&#41;</span>;</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp;GET DIAGNOSTICS vtmprowcount <span class="sy0">=</span> ROW_COUNT;</p>
<p>&nbsp; &nbsp; &nbsp; <span class="co1">&#8211; If it is then generate the SQL and execute the alter table</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">IF</span> vtmprowcount <span class="sy0">=</span> <span class="nu0">0</span><br />
&nbsp; &nbsp; &nbsp; &nbsp;THEN<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;vhisttablesql :<span class="sy0">=</span> <span class="st0">&#8216;ALTER TABLE &#8216;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216; ADD COLUMN &#8216;</span><span class="sy0">||</span>vcurfield<span class="sy0">.</span><span class="kw1">COLUMN</span><span class="sy0">||</span><span class="st0">&#8216; &#8216;</span><span class="sy0">||</span>vcurfield<span class="sy0">.</span>datatype<span class="sy0">||</span><span class="st0">&#8216; NULL;&#8217;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;EXECUTE vhisttablesql;<br />
&nbsp; &nbsp; &nbsp; &nbsp;END <span class="kw1">IF</span>;<br />
&nbsp; &nbsp; &nbsp;END <span class="kw1">IF</span>;<br />
&nbsp; END LOOP;</p>
<p>&nbsp; <span class="co1">&#8211; The history table doesn&#8217;t exist. Create it</span><br />
&nbsp; <span class="kw1">IF</span> vtmptablename <span class="kw1">IS</span> <span class="kw1">NULL</span> <span class="kw1">OR</span> vtmptablename <span class="sy0">=</span> <span class="st0">&#8221;</span><br />
&nbsp; THEN<br />
&nbsp; &nbsp; vhisttablesql :<span class="sy0">=</span> <span class="st0">&#8216;CREATE TABLE &#8216;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216; ( histid SERIAL PRIMARY KEY, &#8216;</span>;</p>
<p>&nbsp; &nbsp; vhisttablesql :<span class="sy0">=</span> vhisttablesql<span class="sy0">||</span>vhisttablefields<span class="sy0">||</span><span class="st0">&#8216; history_creation TIMESTAMP NOT NULL DEFAULT now() );&#8217;</span>;</p>
<p>&nbsp; &nbsp; EXECUTE vhisttablesql;</p>
<p>&nbsp; &nbsp; <span class="co1">&#8211; create indexes</span><br />
&nbsp; &nbsp; vhisttablesql :<span class="sy0">=</span> <span class="st0">&#8216; CREATE INDEX idx_&#8217;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216;_1 ON &#8216;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216; (id); &#8216;</span>;<br />
&nbsp; &nbsp; EXECUTE vhisttablesql;</p>
<p>&nbsp; &nbsp; vhisttablesql :<span class="sy0">=</span> <span class="st0">&#8216; CREATE INDEX idx_&#8217;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216;_2 ON &#8216;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216; (history_creation); &#8216;</span>;<br />
&nbsp; &nbsp; EXECUTE vhisttablesql;<br />
&nbsp; END <span class="kw1">IF</span>;</p>
<p>&nbsp; <span class="co1">&#8211; Proceed with the history creation</span><br />
&nbsp; vhistinsertsql :<span class="sy0">=</span> <span class="st0">&#8216;INSERT INTO &#8216;</span><span class="sy0">||</span>vhisttablename<span class="sy0">||</span><span class="st0">&#8216; (&#8216;</span><span class="sy0">||</span>vfieldlist<span class="sy0">||</span><span class="st0">&#8216; history_creation) SELECT &#8216;</span><span class="sy0">||</span>vfieldlist<span class="sy0">||</span><span class="st0">&#8216; now() FROM &#8216;</span><span class="sy0">||</span>tablename<span class="sy0">||</span><span class="st0">&#8216; WHERE id=&#8217;</span><span class="sy0">||</span>recordid<span class="sy0">||</span><span class="st0">&#8216;;&#8217;</span>;</p>
<p>&nbsp; RAISE NOTICE <span class="st0">&#8216;Executing %&#8217;</span><span class="sy0">,</span> vhistinsertsql;<br />
&nbsp; EXECUTE vhistinsertsql;</p>
<p>&nbsp; GET DIAGNOSTICS vtmprowcount <span class="sy0">=</span> ROW_COUNT;<br />
&nbsp; <span class="kw1">IF</span> vtmprowcount <span class="sy0">&gt;</span> 0<br />
&nbsp; THEN<br />
&nbsp; &nbsp; <span class="kw1">RETURN</span> TRUE;<br />
&nbsp; ELSE<br />
&nbsp; &nbsp; <span class="kw1">RETURN</span> FALSE;<br />
&nbsp; END <span class="kw1">IF</span>;<br />
END;<br />
$$<br />
<span class="kw1">LANGUAGE</span> plpgsql;</div>
</div>
<p>You&#8217;ll notice this function only adds fields to the history table. That&#8217;s because even if we remove a field from the original table we still want to keep track of it in our history.</p>
<p>Now on to <strong>querying the history to retrieve the state of a record at a specific date</strong>.</p>
<p>This is a bit tricky. At the moment as far as I know only Oracle has the ability return a &#8220;virtual table&#8221; from a function. PostgreSQL can return a RECORD variable &#8211; which is great to use inside functions but once outside it loses its structure and turns into a comma separated list of values &#8211; or a %ROWTYPE you can define.<br />
This unfortunately means that you&#8217;d have to create a custom function for each history table to returns a SET OF client &#8211; in our case.</p>
<p>What we are trying to achieve is something like what this query does.</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="kw1">SELECT</span> h<span class="sy0">.</span>id<span class="sy0">,</span> firstname<span class="sy0">,</span> lastname<span class="sy0">,</span> datecreated<span class="sy0">,</span> hist_creation<br />
<span class="kw1">FROM</span> client_hist <span class="kw1">AS</span> h<br />
<span class="kw1">INNER</span> <span class="kw1">JOIN</span> <span class="br0">&#40;</span><br />
&nbsp; <span class="kw1">SELECT</span> min<span class="br0">&#40;</span>history_creation<span class="br0">&#41;</span> <span class="kw1">AS</span> mintstamp<span class="sy0">,</span> id <span class="kw1">AS</span> id<br />
&nbsp; <span class="kw1">FROM</span> client_hist<br />
&nbsp; <span class="kw1">WHERE</span> history_creation <span class="sy0">&gt;</span> THE<span class="sy0">-</span>TIME<span class="sy0">-</span>YOU<span class="sy0">-</span>NEED<br />
&nbsp; <span class="kw1">AND</span> id<span class="sy0">=</span> YOUR<span class="sy0">-</span>RECORD<span class="sy0">-</span>ID<br />
&nbsp; <span class="kw1">GROUP</span> <span class="kw1">BY</span> id<br />
<span class="br0">&#41;</span> <span class="kw1">AS</span> m <span class="kw1">ON</span> <span class="br0">&#40;</span>m<span class="sy0">.</span>id <span class="sy0">=</span> h<span class="sy0">.</span>id <span class="kw1">AND</span> m<span class="sy0">.</span>mintstamp <span class="sy0">=</span> h<span class="sy0">.</span>history_creation<span class="br0">&#41;</span>;</div>
</div>
<p>Basically the next record in the history after the time specified.</p>
<p>At this point so far as I can see we have two options. Either create a function that returns a specific type of ROW, or write a more generic history function to return only the id of the history record we need and not the whole row (If you look at the history creation function we are putting a histid column in there).</p>
<p>You could use this function in your queries to get the history record id out like this.</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="co1">&#8211; This function returns the histid you need to look at in your history table</span><br />
<span class="kw1">CREATE</span> <span class="kw1">OR</span> <span class="kw1">REPLACE</span> <span class="kw1">FUNCTION</span> select_hist_id<span class="br0">&#40;</span>tablename VARCHAR<span class="br0">&#40;</span>255<span class="br0">&#41;</span><span class="sy0">,</span> recordid INTEGER<span class="sy0">,</span> tstamp TIMESTAMP<span class="br0">&#41;</span> RETURNS INTEGER <span class="kw1">AS</span> $$<br />
DECLARE<br />
&nbsp; curs REFCURSOR;<br />
&nbsp; vid INTEGER;<br />
BEGIN<br />
&nbsp; OPEN curs <span class="kw1">FOR</span> EXECUTE <span class="st0">&#8216;SELECT h.histid<br />
&nbsp; FROM &#8216;</span><span class="sy0">||</span>tablename<span class="sy0">||</span><span class="st0">&#8216;_hist AS h <br />
&nbsp; INNER JOIN (<br />
&nbsp; &nbsp; SELECT min(history_creation) AS mintstamp, id <br />
&nbsp; &nbsp; FROM &#8216;</span><span class="sy0">||</span>tablename<span class="sy0">||</span><span class="st0">&#8216;_hist<br />
&nbsp; &nbsp; WHERE history_creation &gt; &#8216;</span><span class="st0">&#8221;</span><span class="sy0">||</span>tstamp<span class="sy0">||</span><span class="st0">&#8221;</span><span class="st0">&#8216;<br />
&nbsp; &nbsp; AND id=&#8217;</span><span class="sy0">||</span>recid<span class="sy0">||</span><span class="st0">&#8216;<br />
&nbsp; &nbsp; GROUP BY id<br />
&nbsp; ) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);&#8217;</span>;</p>
<p>&nbsp; FETCH curs <span class="kw1">INTO</span> vid;</p>
<p>&nbsp; <span class="kw1">RETURN</span> vid;<br />
END;<br />
$$<br />
<span class="kw1">LANGUAGE</span> plpgsql;</p>
<p><span class="co1">&#8211; At this point you can just get ids like this</span><br />
<span class="co1">&#8211; records in client as of this time 2009-11-02 21:13:41.552601</span><br />
<span class="kw1">SELECT</span> select_hist_id<span class="br0">&#40;</span><span class="st0">&#8216;client&#8217;</span><span class="sy0">,</span> id<span class="sy0">,</span> <span class="st0">&#8217;2009-11-02 21:13:41.552601&#8242;</span>::TIMESTAMP<span class="br0">&#41;</span> <span class="kw1">FROM</span> client;</div>
</div>
<p>I&#8217;m sure you can figure out the rest.</p>
<p>Be careful this is very slow on large data sets. If you are planning to work on millions of records then you should consider building a history lookup function for each table either defining data types for your RECORD or using OUT variables.</p>
<div class="codesnip-container" >
<div class="sql codesnip" style="font-family:monospace;"><span class="kw1">CREATE</span> <span class="kw1">FUNCTION</span> foo<span class="br0">&#40;</span>recordid int<span class="sy0">,</span> firstname OUT VARCHAR<span class="br0">&#40;</span><span class="nu0">10</span><span class="br0">&#41;</span><span class="sy0">&#8230;</span><span class="br0">&#41;</span></div>
</div>
<!-- Easy AdSense V2.82 -->
<!-- Post[count: 3] -->
<div class="ezAdsense adsense adsense-leadout" style="text-align:center;margin:12px;"><script type="text/javascript"><!--
google_ad_client = "pub-8456780651289352";
/* 468x60, created 11/24/09 */
google_ad_slot = "7140896000";
google_ad_width = 468;
google_ad_height = 60;
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script></div>
<!-- Easy AdSense V2.82 -->

<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark It</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://buzz.yahoo.com/submit?submitUrl=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;submitHeadline=Tracking+database+records+history&amp;submitSummary=" rel="nofollow" title="Add to&nbsp;Buzz"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/buzz.png" title="Add to&nbsp;Buzz" alt="Add to&nbsp;Buzz" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;title=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;title=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;title=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;Google Bookmarks"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/google.png" title="Add to&nbsp;Google Bookmarks" alt="Add to&nbsp;Google Bookmarks" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.mister-wong.com/index.php?action=addurl&amp;bm_url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;bm_description=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;Mister Wong"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/misterwong.png" title="Add to&nbsp;Mister Wong" alt="Add to&nbsp;Mister Wong" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.netscape.com/submit/?U=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;T=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;Netscape"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/netscape.png" title="Add to&nbsp;Netscape" alt="Add to&nbsp;Netscape" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;title=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;title=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://tipd.com/submit.php?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F" rel="nofollow" title="Add to&nbsp;Tip'd"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/tipd.png" title="Add to&nbsp;Tip'd" alt="Add to&nbsp;Tip'd" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Tracking+database+records+history+@+http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://myweb2.search.yahoo.com/myresults/bookmarklet?u=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Ftracking-database-records-history%2F&amp;t=Tracking+database+records+history" rel="nofollow" title="Add to&nbsp;Yahoo My Web"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/yahoo.png" title="Add to&nbsp;Yahoo My Web" alt="Add to&nbsp;Yahoo My Web" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://sapessi.com/2009/11/tracking-database-records-history/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keeping on top of your data</title>
		<link>http://sapessi.com/2009/11/keeping-on-top-of-your-data/</link>
		<comments>http://sapessi.com/2009/11/keeping-on-top-of-your-data/#comments</comments>
		<pubDate>Sun, 01 Nov 2009 14:58:20 +0000</pubDate>
		<dc:creator>Stefano Buliani</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Analyze]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Explain]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Query]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Vacuum]]></category>

		<guid isPermaLink="false">http://sapessi.com/?p=276</guid>
		<description><![CDATA[Not just because it&#8217;s vital for your business. For most production systems the speed-bottleneck lies with accessing your data. Database are excellent for storing all your data and keeping it organised. However, when it comes to getting it out quickly, especially if you have lots of it, they are not the sharpest of tools. That&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;source=sapessi&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Not just because it&#8217;s vital for your business.<br />
For most production systems the speed-bottleneck lies with accessing your data. Database are excellent for storing all your data and keeping it organised. However, when it comes to getting it out quickly, especially if you have lots of it, they are not the sharpest of tools.</p>
<p>That&#8217;s why I always try to press home the importance of keeping on top of your data.</p>
<p>Even though you think all your data is neatly organised in your perfectly structured database it keeps changing shape, or rather the understanding your database has of your data keeps changing.<br />
For example in <a href="http://www.postgresql.org" target="_blank">PostgreSQL</a> the &#8220;shape&#8221; of your data is stored in a table called pg_statistics. Data is collected and stored there by <a href="http://www.postgresql.org/docs/8.1/static/sql-analyze.html" target="_blank">analyze</a>.</p>
<p>The query planner uses the data collected in pg_statistics to pick the most efficient way to run your queries. Unfortunately no system is perfect, even the best planner makes mistakes. You have to strike a balance between letting analyze collect as much data as possible to give your database a better understanding of your data and keeping it slim enough for it to be quick.</p>
<p>So as much as you can trust machines I suggest you try to keep on top of your data yourself.</p>
<p>There&#8217;s a couple of very simple ways to do that.</p>
<p>First. Keep an eye on your database logs, exactly like you do with your webserver logs. There&#8217;s a few open-source applications that can help you do that like <a href="http://epqa.sourceforge.net/" target="_blank">Enterprise Postgres Query Analyser</a>.<br />
This will give you a basic understanding of which query/ies you will have to focus on.</p>
<p>Once you have an idea of which ones are the slowest queries you should keep an eye on the explain plan at regular intervals. In PostgreSQL I have a scheduled job that every day runs &#8220;EXPLAIN ANALYZE&#8221; on the heaviest queries in my system and compares the output with the previous day&#8217;s.</p>
<p>It&#8217;s a lot of work and you are probably better off confronting these problems as they come up and not waste valuable development time creating the most optimised database ever.<br />
Setting up <a href="http://www.postgresql.org/docs/current/static/routine-vacuuming.html#AUTOVACUUM" target="_blank">auto-vacuuming</a> properly will keep you safe for a long while.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark It</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://buzz.yahoo.com/submit?submitUrl=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;submitHeadline=Keeping+on+top+of+your+data&amp;submitSummary=" rel="nofollow" title="Add to&nbsp;Buzz"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/buzz.png" title="Add to&nbsp;Buzz" alt="Add to&nbsp;Buzz" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;title=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;title=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;title=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;Google Bookmarks"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/google.png" title="Add to&nbsp;Google Bookmarks" alt="Add to&nbsp;Google Bookmarks" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.mister-wong.com/index.php?action=addurl&amp;bm_url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;bm_description=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;Mister Wong"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/misterwong.png" title="Add to&nbsp;Mister Wong" alt="Add to&nbsp;Mister Wong" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.netscape.com/submit/?U=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;T=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;Netscape"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/netscape.png" title="Add to&nbsp;Netscape" alt="Add to&nbsp;Netscape" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;title=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;title=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://tipd.com/submit.php?url=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F" rel="nofollow" title="Add to&nbsp;Tip'd"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/tipd.png" title="Add to&nbsp;Tip'd" alt="Add to&nbsp;Tip'd" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Keeping+on+top+of+your+data+@+http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://myweb2.search.yahoo.com/myresults/bookmarklet?u=http%3A%2F%2Fsapessi.com%2F2009%2F11%2Fkeeping-on-top-of-your-data%2F&amp;t=Keeping+on+top+of+your+data" rel="nofollow" title="Add to&nbsp;Yahoo My Web"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/yahoo.png" title="Add to&nbsp;Yahoo My Web" alt="Add to&nbsp;Yahoo My Web" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://sapessi.com/2009/11/keeping-on-top-of-your-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recursive queries are evil</title>
		<link>http://sapessi.com/2009/10/recursive-queries-are-evil/</link>
		<comments>http://sapessi.com/2009/10/recursive-queries-are-evil/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 14:17:44 +0000</pubDate>
		<dc:creator>Stefano Buliani</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Interview]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Query]]></category>
		<category><![CDATA[Recursive]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Structure]]></category>
		<category><![CDATA[Tree]]></category>

		<guid isPermaLink="false">http://sapessi.com/?p=247</guid>
		<description><![CDATA[Maybe that&#8217;s a bit too harsh, maybe recursive query are not evil, it&#8217;s just the people who use them. I spend quite a lot of time working with PostgreSQL users helping them optimise their queries. When I read that PostgreSQL 8.4 added support for recursive queries I knew that a whole new hellish chapter in [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;source=sapessi&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Maybe that&#8217;s a bit too harsh, maybe recursive query are not evil, it&#8217;s just the people who use them.</p>
<p><span style="background-color: #ffffff;"><a href="http://www.postgresql.org"><img class="alignleft size-full wp-image-250" title="PostgreSQL Logo" src="http://sapessi.com/wp-content/uploads/2009/10/postgresql.png" alt="PostgreSQL Logo" width="150" height="150" /></a>I spend quite a lot of time working with <a href="http://www.postgresql.org" target="_blank">PostgreSQL</a> users helping them optimise their queries. When I read that <a href="http://www.postgresql.org/docs/8.4/static/release-8-4-1.html" target="_blank">PostgreSQL 8.4 added support for recursive queries</a> I knew that a whole new hellish chapter in my life would begin.</span></p>
<p><span style="background-color: #ffffff;">First off. What are recursive queries: (from <a href="http://www.postgresql.org/docs/8.4/static/queries-with.html" target="_blank">PostgreSQL manual</a>)</span></p>
<blockquote><p>Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:</p>
<p><span style="background-color: #ffffff;">WITH RECURSIVE included_parts(sub_part, part, quantity) AS (<br />
SELECT sub_part, part, quantity FROM parts WHERE part = &#8216;our_product&#8217;<br />
UNION ALL<br />
SELECT p.sub_part, p.part, p.quantity<br />
FROM included_parts pr, parts p<br />
WHERE p.part = pr.sub_part<br />
)<br />
SELECT sub_part, SUM(quantity) as total_quantity<br />
FROM included_parts<br />
GROUP BY sub_part</span></p></blockquote>
<p><span style="background-color: #ffffff;">These structures are commonly used in relational database. Just think about a threaded comment system for a blog or an industry classification for securities on multiple levels (financial data is what I&#8217;m most familiar with).</span></p>
<p><span style="background-color: #ffffff;">In this latter case you can image that you&#8217;ll hardly ever extract industry classification information by itself. It&#8217;s generally used as a sub-query to provide additional information about a security, a trade or what have you.</span></p>
<p>As I said earlier I have nothing against recursive queries per se. However, I can already see people out there creating monster-queries in production systems. The sort of monster query that needs to be executed 50 times a second, the one that just doesn&#8217;t work.</p>
<p><span style="background-color: #ffffff;">Storing and retrieving tree-structured data in SQL is one of my favourite questions in interviews. I always make a point of asking it. Not because it&#8217;s particularly challenging technically but because it will tell me a lot about the way the person I&#8217;m interviewing thinks about data.</span></p>
<p><span style="background-color: #ffffff;">The first part of the question is obviously do design a structure to hold threaded blog comments.</span></p>
<p><span style="background-color: #ffffff;">Whether you use a separate table to hold the relationship between nodes or a self-referencing parent id column in the same table I don&#8217;t really care. So long as you come up with an answer we can move on with the interview, because the answer to the next part of the question is what interests me.</span></p>
<p><span style="background-color: #ffffff;">I will now call your blog page with the ID from an element in your structure, any element. I want you to return instantly the ID of the root element for that branch of the tree.</span></p>
<p><span style="background-color: #ffffff;"> </span></p>
<pre>- Root comment 1
   - Child 1.1
   - Child 1.2
      - Child 1.2.1
   - Child 1.3
- Root comment 2
   - Child 2.1
      - Child 2.1.1
         - Child 2.1.1.1
   - Child 2.2</pre>
<p><span style="background-color: #ffffff;">I will call you with 2.1.1.1 and I want you to tell me 2, instantly. Feel free to change your database structure.</span></p>
<p><span style="background-color: #ffffff;">Their answer to this will tell me how they feel about de-normalisation and if they can think in those terms. We are talking about the daft requirements written by a product person who&#8217;s clearly gone quite mad. All he cares about is getting the data out quickly, nothing else.</span></p>
<p><span style="background-color: #ffffff;">Easiest de-normalised way out is to add a root id column in each comment row. It will make inserting new comments slower but it won&#8217;t require any recursion to go back to the top when selecting data.</span></p>
<p>If all you can come up with is recursive query I&#8217;ll be sorely disappointed. It&#8217;s cool and elegant but not nearly efficient enough for a high-availability production system.</p>
<p><span style="background-color: #ffffff;">Feel free to talk about recursive queries when I ask you this question, just remember to put the magic words &#8220;materialized view&#8221; in front of it. then we can talk.</span></p>
<p><span style="background-color: #ffffff;"><strong>Let this bet a warning to you. If I find a non-materialized/cached recursive query in your production code I will recursively kick you in the head.</strong></span></p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark It</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://buzz.yahoo.com/submit?submitUrl=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;submitHeadline=Recursive+queries+are+evil&amp;submitSummary=" rel="nofollow" title="Add to&nbsp;Buzz"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/buzz.png" title="Add to&nbsp;Buzz" alt="Add to&nbsp;Buzz" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;title=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;title=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;title=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;Google Bookmarks"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/google.png" title="Add to&nbsp;Google Bookmarks" alt="Add to&nbsp;Google Bookmarks" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.mister-wong.com/index.php?action=addurl&amp;bm_url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;bm_description=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;Mister Wong"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/misterwong.png" title="Add to&nbsp;Mister Wong" alt="Add to&nbsp;Mister Wong" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.netscape.com/submit/?U=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;T=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;Netscape"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/netscape.png" title="Add to&nbsp;Netscape" alt="Add to&nbsp;Netscape" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;title=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;title=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://tipd.com/submit.php?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F" rel="nofollow" title="Add to&nbsp;Tip'd"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/tipd.png" title="Add to&nbsp;Tip'd" alt="Add to&nbsp;Tip'd" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Recursive+queries+are+evil+@+http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://myweb2.search.yahoo.com/myresults/bookmarklet?u=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Frecursive-queries-are-evil%2F&amp;t=Recursive+queries+are+evil" rel="nofollow" title="Add to&nbsp;Yahoo My Web"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/yahoo.png" title="Add to&nbsp;Yahoo My Web" alt="Add to&nbsp;Yahoo My Web" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://sapessi.com/2009/10/recursive-queries-are-evil/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>caching time series &#8211; continued</title>
		<link>http://sapessi.com/2009/10/caching-time-series-continued/</link>
		<comments>http://sapessi.com/2009/10/caching-time-series-continued/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 10:52:50 +0000</pubDate>
		<dc:creator>Stefano Buliani</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[Index]]></category>
		<category><![CDATA[Memory]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Series]]></category>
		<category><![CDATA[Time]]></category>

		<guid isPermaLink="false">http://sapessi.com/?p=211</guid>
		<description><![CDATA[Last week I wrote a rather long post asking for suggestions on how to cache and access a time series stored in an array in memory. After some more research I have decided that the solution proposed by Uzair in his comment to the original post with some minor tweaks. My originial idea involved creating a map [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;source=sapessi&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Last week I wrote a <a href="http://sapessi.com/2009/10/caching-time-series/" target="_blank">rather long post</a> asking for suggestions on how to cache and access a time series stored in an array in memory.</p>
<p>After some more research I have decided that the solution proposed by <a href="http://sapessi.com/2009/10/caching-time-series/#comments" target="_blank">Uzair in his comment</a> to the original post with some minor tweaks.</p>
<p>My originial idea involved creating a map in memory with one entry for each point in my series and then lookup the item requested in there.<br />
This is very efficient when requests come in for a time which is in my index, if the point is not in there I&#8217;d have to run a binary search on the whole index. Which as Anon pointed out in his comment is still very fast.</p>
<blockquote><p>if your ordered index is in-memory.</p>
<p>even if you had 1TB of memory (quite a lot), and each memory access took you 1 microsecond (slow),</p>
<p>each search would only take worse case of 40 accesses (2^40 = 1 trillion)<br />
which would be 40 microseconds. this is not very long at all.</p>
<p>assuming some more reasonable ball-park figures of 4GB of memory, and<br />
200 nanosecond access times, gives you</p>
<p>32 accesses * 200 nanos = 6 microseconds</p>
<p>this is over 150,000 searches per second.</p></blockquote>
<p>The index will be a map of maps. Basically I&#8217;ll distribute a number of entries in my index at regular intervals. Each entry will contain a map of entries that preceded it in my time series.<br />
Since I will know the first and last point in the series and the distribution of my regular entries *z* when a request *x* comes in I will be able to work out where is my next regular index entry *y*)</p>
<p>y = x &#8211; mod(x/z) + z</p>
<p>Once I have retrieved the map contained in my regular entry in the index I can proceed and lookup the value requested in there. If it&#8217;s not a point in my map I will still have to do a binary search but on a much smaller dataset &#8211; which I should be able to work on locally without having to rely on memcached over the network. The only &#8220;downside&#8221; of this solution is that I&#8217;ll have to make at least 2 lookups in my index.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark It</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://buzz.yahoo.com/submit?submitUrl=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;submitHeadline=caching+time+series+%26%238211%3B+continued&amp;submitSummary=" rel="nofollow" title="Add to&nbsp;Buzz"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/buzz.png" title="Add to&nbsp;Buzz" alt="Add to&nbsp;Buzz" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;title=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;title=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;title=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;Google Bookmarks"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/google.png" title="Add to&nbsp;Google Bookmarks" alt="Add to&nbsp;Google Bookmarks" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.mister-wong.com/index.php?action=addurl&amp;bm_url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;bm_description=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;Mister Wong"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/misterwong.png" title="Add to&nbsp;Mister Wong" alt="Add to&nbsp;Mister Wong" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.netscape.com/submit/?U=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;T=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;Netscape"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/netscape.png" title="Add to&nbsp;Netscape" alt="Add to&nbsp;Netscape" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;title=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;title=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://tipd.com/submit.php?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F" rel="nofollow" title="Add to&nbsp;Tip'd"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/tipd.png" title="Add to&nbsp;Tip'd" alt="Add to&nbsp;Tip'd" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+caching+time+series+%26%238211%3B+continued+@+http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://myweb2.search.yahoo.com/myresults/bookmarklet?u=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series-continued%2F&amp;t=caching+time+series+%26%238211%3B+continued" rel="nofollow" title="Add to&nbsp;Yahoo My Web"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/yahoo.png" title="Add to&nbsp;Yahoo My Web" alt="Add to&nbsp;Yahoo My Web" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://sapessi.com/2009/10/caching-time-series-continued/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Caching time series</title>
		<link>http://sapessi.com/2009/10/caching-time-series/</link>
		<comments>http://sapessi.com/2009/10/caching-time-series/#comments</comments>
		<pubDate>Sat, 24 Oct 2009 12:31:49 +0000</pubDate>
		<dc:creator>Stefano Buliani</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Array]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Series]]></category>
		<category><![CDATA[Time]]></category>

		<guid isPermaLink="false">http://sapessi.com/?p=198</guid>
		<description><![CDATA[This is a problem I&#8217;ve been trying to deal with for a while now. In my case I&#8217;m talking about financial data. Most of it is &#8211; one way or another &#8211; a time series. Take for example transactions. A position management system needs to be able to store millions of a transactions for a [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;source=sapessi&amp;style=normal&amp;service=bit.ly&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">This is a problem I&#8217;ve been trying to deal with for a while now. In my case I&#8217;m talking about financial data. Most of it is &#8211; one way or another &#8211; a time series.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Take for example transactions. A position management system needs to be able to store millions of a transactions for a number of portfolios. Storing the data is not a problem for any relational database &#8211; The problem is querying it.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Financial data is, by its own nature, relational. This means, as far as I&#8217;m concerned, that you&#8217;d always want to store your raw data in a relational model without de-normalizing your data.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Now this is all good and well but when we need to access the transactions for a specific portfolio between the first of January 1999 and the 31st of December 2008 with all the related security information and prices and perform some analysis relational databases start to struggle.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Of course it is doable, but my argument against it is that CPU cycles used by your software to do the computation are less expensive than you database&#8217;s.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">What I&#8217;m trying to achieve here is cache a time-series of transactions in a memory structure thus giving n clients access to it without having to constantly refer back to the database.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Client tells me what is the last transaction it knows about and all newer data is returned from the structure in memory. Thing is, I want to do it without having to loop over my Array/Map of transaction and avoid wasting memory by duplicating data.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Here&#8217;s a possible solution I&#8217;ve been toying with today.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">The idea comes from a rendering technique used in videogames called voxel-based rendering, specifically an old game called Outcast.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">I&#8217;ll try and briefly explain what that is (at least as far as my idea is concerned):</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Say you are trying to render a landscape. Voxel rendering makes this fairly simple and is achieved column by column. For each column, a maximum Y value was stored (in a Y-Buffer). The pixels of the column in question are filled in starting at the bottom of the screen, only if their Y value was greater than the one stored in the Y-Buffer. This rendering method made the process of removing non-visible parts of the scene very efficient and very easy.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">Instead of caching array of transactions based on a time value of variable granularity we could just create a massive ArrayList of all the transactions for any period of time. While creating the array we should also create a HashMap (our Y-buffer). The hasmap will simply contain a transaction unique id as key and the Position in the array as value.</div>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;">This way when the client calls the APIs with the latest unique identifier it has received we&#8217;ll just have to make a very quick lookup in our Y-Buffer for the array position and then use the subList method of the List interface to retrieve everything in the transaction array from the position to the end (size()))This is a problem I&#8217;ve been trying to deal with for a while now. In my case I&#8217;m talking about financial data. Most of it is &#8211; one way or another &#8211; a time series.</div>
<p>This is something I&#8217;ve been thinking about for a while. How can I cache and have nearly-instant access to any data point within the time series without having to splash out on expensive time series databases. I have an half cooked solution and decided to write this post to see if anybody out there has worked on a similar problem and has a better solution to share.</p>
<p>Specifically I&#8217;m trying to work with financial data. Take for example transactions. A position management system needs to be able to store millions of a transactions for a number of portfolios. Storing the data is not a problem for any relational database &#8211; The problem is querying it.</p>
<p>Financial data is, by its own nature, relational. This means, as far as I&#8217;m concerned, that you&#8217;d always want to store your raw data in a relational model without de-normalizing it.</p>
<p>Now this is all good and well but when we need to access the transactions for a specific portfolio between the first of January 1999 and the 31st of December 2008 with all the related security information and prices relational databases start to struggle. Of course it is doable, but if you can get the data out first CPU cycles used by your software to do the computation are less expensive than you database&#8217;s.</p>
<p>What I want to be able to do is build a cache of the flattened (de-normalized) data and make it available to n clients at the same time.</p>
<p>The obvious solution is to build a massive time-ordered array of all the transactions and store it in memory. Considering the potential size of the array something like <a title="memcached" href="http://www.danga.com/memcached/" target="_blank">memcached</a> would be more appropriate.<br />
Here comes the second problem. Assume a client calls your software with a couple of dates, you will be forced to loop over the whole array to find the chunk you need. Again, doable, but inefficient.</p>
<p>Array is definitely the way to go to store the data. What I need is an &#8220;index&#8221; which will tell me exactly where a specific transaction is stored in the array.</p>
<p>Here&#8217;s a possible solution I&#8217;ve been toying with today.</p>
<p>The idea comes from a technique used in videogames called voxel-based-rending and ray-casting. I&#8217;ll try and briefly explain what that is (at least as far as my idea is concerned):</p>
<p>Say you are trying to render a landscape. Voxel rendering makes this fairly simple and is achieved column by column. For each column, a maximum Y value was stored (in a Y-Buffer). The pixels of the column in question are filled in starting at the bottom of the screen, only if their Y value was greater than the one stored in the Y-Buffer. This rendering method made the process of removing non-visible parts of the scene very efficient and very easy.</p>
<p><img class="aligncenter size-full wp-image-199" title="rendering ray casting voxels" src="http://sapessi.com/wp-content/uploads/2009/10/rendering-ray-casting-voxels5-Q-224414-13.jpg" alt="rendering ray casting voxels" width="248" height="338" /></p>
<p>While creating the array we should also create a map (our Y-buffer). The map will simply contain a transaction unique id as key and the position in the array as value.</p>
<p>This way when the client calls our APIs with the last unique identifier it has received we&#8217;ll just have to make a very quick lookup in our Y-Buffer for the array position and this will point us exactly to what we need to return.</p>
<p>This could work, but it&#8217;s only a partial solution, or rather a solution for a very specific problem. What we really want is for the client to be able to hand in 2 date-time values (from and to).<br />
We could create our index using time information. The problem is that transactions are not at constant time-intervals. Which means that if we are called with a time value that is not in our index we&#8217;d still end up looping over the values around the closest point to find the position of the first element in the series.</p>
<p>An alternative would be to build our index using the maximum precision allowed. say an index of milliseconds where the last element position in the array is repeated in the index until a new one appears. Doable but very memory consuming.</p>
<p>This is as far as I&#8217;ve gone. Any other ideas out there?</p>
<p><strong>UPDATE</strong>: <a href="http://sapessi.com/2009/10/caching-time-series-continued/" target="_self">follow-up article</a></p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a><strong><em>Bookmark It</em></strong></a>
<br />
<div class="d">
<br />
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://buzz.yahoo.com/submit?submitUrl=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;submitHeadline=Caching+time+series&amp;submitSummary=" rel="nofollow" title="Add to&nbsp;Buzz"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/buzz.png" title="Add to&nbsp;Buzz" alt="Add to&nbsp;Buzz" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;title=Caching+time+series" rel="nofollow" title="Add to&nbsp;Del.icio.us"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/delicious.png" title="Add to&nbsp;Del.icio.us" alt="Add to&nbsp;Del.icio.us" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;title=Caching+time+series" rel="nofollow" title="Add to&nbsp;digg"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/digg.png" title="Add to&nbsp;digg" alt="Add to&nbsp;digg" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F" rel="nofollow" title="Add to&nbsp;Facebook"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/facebook.png" title="Add to&nbsp;Facebook" alt="Add to&nbsp;Facebook" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;title=Caching+time+series" rel="nofollow" title="Add to&nbsp;Google Bookmarks"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/google.png" title="Add to&nbsp;Google Bookmarks" alt="Add to&nbsp;Google Bookmarks" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.mister-wong.com/index.php?action=addurl&amp;bm_url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;bm_description=Caching+time+series" rel="nofollow" title="Add to&nbsp;Mister Wong"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/misterwong.png" title="Add to&nbsp;Mister Wong" alt="Add to&nbsp;Mister Wong" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.netscape.com/submit/?U=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;T=Caching+time+series" rel="nofollow" title="Add to&nbsp;Netscape"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/netscape.png" title="Add to&nbsp;Netscape" alt="Add to&nbsp;Netscape" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://reddit.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;title=Caching+time+series" rel="nofollow" title="Add to&nbsp;reddit"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/reddit.png" title="Add to&nbsp;reddit" alt="Add to&nbsp;reddit" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.stumbleupon.com/submit?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;title=Caching+time+series" rel="nofollow" title="Add to&nbsp;Stumble Upon"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/stumbleupon.png" title="Add to&nbsp;Stumble Upon" alt="Add to&nbsp;Stumble Upon" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F" rel="nofollow" title="Add to&nbsp;Technorati"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/technorati.png" title="Add to&nbsp;Technorati" alt="Add to&nbsp;Technorati" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://tipd.com/submit.php?url=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F" rel="nofollow" title="Add to&nbsp;Tip'd"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/tipd.png" title="Add to&nbsp;Tip'd" alt="Add to&nbsp;Tip'd" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://twitter.com/home/?status=Check+out+Caching+time+series+@+http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F" rel="nofollow" title="Add to&nbsp;Twitter"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/twitter.png" title="Add to&nbsp;Twitter" alt="Add to&nbsp;Twitter" /></a>
<a onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://myweb2.search.yahoo.com/myresults/bookmarklet?u=http%3A%2F%2Fsapessi.com%2F2009%2F10%2Fcaching-time-series%2F&amp;t=Caching+time+series" rel="nofollow" title="Add to&nbsp;Yahoo My Web"><img class="social_img" src="http://sapessi.com/wp-content/plugins/social-bookmarks/images/yahoo.png" title="Add to&nbsp;Yahoo My Web" alt="Add to&nbsp;Yahoo My Web" /></a>
<br />
</div>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://sapessi.com/2009/10/caching-time-series/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

