SAPessi » Technology http://sapessi.com Perfection of means and confusion of aims... Wed, 10 Aug 2011 07:36:04 +0000 en hourly 1 http://wordpress.org/?v=3.2.1 Interviewing tips http://sapessi.com/2010/08/interviewing-tips/ http://sapessi.com/2010/08/interviewing-tips/#comments Mon, 02 Aug 2010 19:26:50 +0000 Stefano Buliani http://sapessi.com/?p=385
]]>


Over the past couple of weeks I’ve been interviewing candidates for a Java developer position.
I can’t find words to describe my feelings now so I’ll recycle something a friend of mine, who’s also hiring, said:

It’s next to impossible to hire anyone useful now.

My complaint with most of the applicants is that they behave – there’s no other way to say this – machines.

I did what I did because I was told to it. Never really thought about it.

This infuriates me beyond belief.

So here a few tips for your (and my) next interview:

  1. Don’t make up stuff on your CV. Sounds obvious but you’ll be surprised by how many people list technologies they have barely used on their CV and when asked about it don’t have a clue. You have no idea how bad this looks. (The red bullshitter light starts flashing in my head instantly)
  2. Show that you have passion for what you do. Being a Sun certified Java developer is all well and good but when I ask you what you have been looking into recently, or if you’ve worked on something on your own make sure to have an answer; or, if you don’t, make sure you have a good excuse.
  3. If you have 1000 years of experience with something I expect you to have some thoughts about it. Once again, Sun certified Java developer, if I ask you where do you see java going in the next 5 years have an answer! There must be something you think needs improving in Java or that you’d do differently. I’ll go wherever Sun takes me is just not an acceptable answer.
  4. Have an opinion! For God’s sake have one. I’ll even settle for half an opinion. If not about a technology you have used for 10 years at least about the coffee I just bought you during the interview. There’s no point in talking for an hour with somebody who has nothing to tell me other than “I’ll do everything you want, I always did.”
  5. Know what the company you have applied for does! This sounds stupid but I have called people after 1 week of their application and after 10 minutes with them over the phone I found out that they have no idea what the company they applied at was doing. That’s an instant goodbye!
  6. Fill the gaps in your CV, and proof-read it before sending it out for goodness sake. It’s fine that you have taken one year off to travel the world. If your CV does not list anything for the whole of 2009 I’m going to ask you about it. Give me a sincere instant answer and there’ll be no problem. Give me random excuses and it’s goodbye. As for proof-reading I had somebody applying who had listed “quick apprehension” as a skill. I’m not into hiring failed super-heroes. (I assume they ment quick comprehension)
  7. Apply for jobs you are genuinely interested in. You are not likely to get a job if your interviewer notices you don’t give a toss about what you do and who you do it for, and it shows. So Unless you are applying for a position as a nut-packer make sure you’re going for a job you’d actually enjoy doing and are interested in.

Needless to say my search continues.

After reviewing more than 60 CVs I have almost completely given up on websites such as Monster and CWJobs and I’m going to go entirely through connections now.

]]>
http://sapessi.com/2010/08/interviewing-tips/feed/ 0
Decode a GPolyline in Objective-C http://sapessi.com/2010/06/decode-a-gpolyline-in-objective-c/ http://sapessi.com/2010/06/decode-a-gpolyline-in-objective-c/#comments Mon, 14 Jun 2010 12:54:03 +0000 Stefano Buliani http://sapessi.com/?p=378

The new version of MapKit in iOS 4 supports polylines, finally. MKPolyline to be precise.

I have been working with the Google Maps API for a while and now I’m learning Objective-C and writing an iPhone app ro Route.ly

Since the best way to send around the net the data to draw a polyline on a map is in encoded format I decided to write a small Objective-C function to decode the encoded string. Since I couldn’t find this anywhere on the net I decided to publish the code here.

You’ll notice that at the beginning of the function I replace “\\\\” with “\\”. This is because I use the encoded data both in web pages with javascript and my iPhone app. In javascript “\\” would be considered an escape and would therefore break my array of points. This is not the case in Objective-C so I switch back to the normal “\\”.

- (NSMutableArray *) decodePolyline:(NSString *)encodedPoints {
        NSString escapedEncodedPoints = [encodedPoints stringByReplacingOccurrencesOfString:@"\\\\" withString:@"\\"];
        int len = [escapedEncodedPoints length];
        NSMutableArray waypoints = [[NSMutableArray alloc] init];
        int index = 0;
        float lat = 0;
        float lng = 0;
               
        while (index < len) {
                char b;
                int shift = 0;
                int result = 0;
                do {
                        b = [escapedEncodedPoints characterAtIndex:index++] - 63;
                        result |= (b & 0x1f) << shift;
                        shift += 5;
                } while (b >= 0×20);
                   
                float dlat = ((result & 1) ? ~(result >> 1) : (result >> 1));
                lat += dlat;
                       
                shift = 0;
                result = 0;
                do {
                        b = [escapedEncodedPoints characterAtIndex:index++] - 63;
                        result |= (b & 0x1f) << shift;
                        shift += 5;
                } while (b >= 0×20);
                   
                float dlng = ((result & 1) ? ~(result >> 1) : (result >> 1));
                lng += dlng;
               
                float finalLat = lat * 1e-5;
                float finalLong = lng * 1e-5;
               
                Waypoint *newPoint = [[Waypoint alloc] init];
                newPoint.lat = [[NSString alloc] initWithFormat:@"%f", finalLat];
                newPoint.lng = [[NSString alloc] initWithFormat:@"%f", finalLong];
                [waypoints addObject:newPoint];
                [newPoint release];
        }
        return waypoints;
}

Waypoints is a simple model class I’ve written with two NSString properties: lat and lng.

]]>
http://sapessi.com/2010/06/decode-a-gpolyline-in-objective-c/feed/ 10
The problem with software patents http://sapessi.com/2010/05/the-problem-with-software-patents/ http://sapessi.com/2010/05/the-problem-with-software-patents/#comments Sun, 09 May 2010 00:09:38 +0000 Stefano Buliani http://sapessi.com/?p=370

Brilliant video posted by Fred Wilson on his blog.

If you can spare half an hour definitely watch it!

]]>
http://sapessi.com/2010/05/the-problem-with-software-patents/feed/ 1
Draggable directions with Google Maps APIs http://sapessi.com/2010/03/draggable-directions-with-google-maps-apis/ http://sapessi.com/2010/03/draggable-directions-with-google-maps-apis/#comments Tue, 09 Mar 2010 11:51:20 +0000 Stefano Buliani http://sapessi.com/?p=358

I love calculating driving directions on Google Maps and then drag the blue line marking my directions to change the route. Everything is updated automatically on the page and the directions are re-calculated to go through the new point I defined.

I’ve been playing around with the Google Maps APIs recently and imagine my disappointment when I found out that Google does not allow directions calculated through the APIs to be dragged around.

Not put off by this I decided to try and replicate the directions-dragging myself. How hard can it be?
As it turns out. Very, at least if you want to make it look as smooth as Google’s own solution.

When generating directions Google Maps adds a GPolyline element overlay to your map. You can set the returned line to be editable but this makes an awful lot of vertices appear on it, which makes reading your directions quite hard. Even so, once you dragged one of this vertices around you are not editing the route but just changing the shape of the line.

Mine is not a complete solution and I’m interested in feedback and ideas on how to improve it.

First off calculate your directions:

map = new google.maps.Map2(document.getElementById(‘map_canvas’));
var wayPoints = [];
wayPoints.push(startPoint.getLatLng());
wayPoints.push(endPoint.getLatLng());

var myDir = new google.maps.Directions(map)
myDir.loadFromWaypoints(wayPoints, { travelMode: G_TRAVEL_MODE_DRIVING });

This will calculate your driving directions and plot a GPolyline on your map. The line is easily accessible in the GDirections object once the directions are calculated. To intercept this I have decided to use the addoverlay event on the GMap object. This event is triggered every time something is plotted over the map (says on the tin).

google.maps.Event.addListener(myDir, "addoverlay", function() {
var dirLine = myDir.getPolyline(); // Get the polyline from the directions object
});

At this point we can ask the APIs to make the GPolyline editable. This will make the vertices appear on the line and make then draggable. As I said before this only changes the shape of the line and doesn’t actually affect your directions object. Luckily the GPolyline comes with a nifty event called lineupdated.
This is triggered once the user has finished dragging a vertex. By intercepting this we can look through the vertices and know what’s been changed on the line and where the vertex has been moved to.
In order to do this we must also know the previous position of the vertices (latitude and longitude) to be able to compare the old and the new “edited” line.
Another challenge is the fact that the GDirections object can accept only so many waypoints (25 if I’m not wrong). Which means we’ll have to add only the vertex that has changed to the directions and not all of them.

// In the addoverlay event also save the original vertices of the line
var origLine = [];
for (var i = 0; i &lt; dirLine.getVertexCount(); i++) {
origLine.push(dirLine.getVertex(i));
}
// DONE saving vertices

// Now intercept the lineupdated event and add the new waypoints
google.maps.Event.addListener(dirLine, "lineupdated", function() {
routePoints = [];
for (var i = 0; i &lt; dirLine.getVertexCount(); i++) {
var savedPoint = origLine[i];
if (!savedPoint || (savedPoint.lat() != dirLine.getVertex(i).lat() &amp;&amp; savedPoint.lng() != dirLine.getVertex(i).lng())) {
routePoints.push(dirLine.getVertex(i));
}
}

// Now we remove the previous directions and recalculate the route
map.removeOverlay(dirLine)
calcRoute();
});

This works quite well but does not look as smooth as Google’s solution.
The problem is that while you are dragging a vertex only that bit of the GPolyline moves and the rest stays in its original position. which makes the shape of your directions quite awkward while you are dragging. Unforunately the GPolyline does not come with a “startdragging” event, otherwise we could just recalculate the route every few seconds while the vertex is being dragged.

This is not the most elegant of solutions but it does the job.

]]>
http://sapessi.com/2010/03/draggable-directions-with-google-maps-apis/feed/ 4
Want to be a programmer in an investment bank? http://sapessi.com/2009/11/want-to-be-a-programmer-in-an-investment-bank/ http://sapessi.com/2009/11/want-to-be-a-programmer-in-an-investment-bank/#comments Thu, 26 Nov 2009 10:45:20 +0000 Stefano Buliani http://sapessi.com/?p=346

What do you need to know? Java Enterprise Edition. That’s the simple answer.
If you read between the lines what I’m really saying is: “You better make sure you really don’t give a flying toss about technology and don’t care about how frustrating what you spend most of your day doing is.”

A few days ago I was talking with a friend who’s in the process of interviewing with a few big financial institutions in London. The most common questions according to him. Struts, JEE.
Spring was also mentioned by I don’t have much against Spring. If used properly it really does keep your code tidy and avoids duplication.

You see a bank’s idea of a high performance reliable system is a monolithic Java application running a million-billion threads at the same time on a huge machine. Maybe that’s a bit too harsh. Sometimes they do distribute. They just add 15 layers shooting messages at each other in between the 2 they actually need.
Then they wonder why the best talent winds up working for Google, and their budget to deliver 1 application is twice what Facebook spent to build its entire social network.

What I have a problem with is the architects who design these systems. And the clearly-drunken bank managers who hired them.

Dumb and DumberThe most common profile for architects in investment banks (and I met quite a few) is 50-something, stopped following/reading about technology in 1995 and prior to being employed by Mr Mighty WankBank (Thanks City Boy for the name) was VP of over-engineering at the Apache Software Foundation (focused on Struts).
First thing in their TODO list is hire some bright spark straight out of university who has a keen interest in over-engineering and never had to deliver anything in less than a geological era.
That’s when an overpaid team of consultants comes in to “sort-it-out”, at which point the bank loses control over its source-code and none of their developers can understand it any more.

Oh, and then there’s the process, oh the bureaucracy.

Fortunately I also know that in London there are a few sunny spots (techonology-wise) such as Hedge funds who started using Erlang (for example) to distribute their heavy computations over a network.

Now don’t get me wrong I have nothing against JEE per-se. Some of the libraries are actually very useful and improve a developer’s quality of life quite a lot. Just think about JTA.
What is wrong is the way they are used by people who never left the nineties.

]]>
http://sapessi.com/2009/11/want-to-be-a-programmer-in-an-investment-bank/feed/ 22
Server side JavaScript with node.js http://sapessi.com/2009/11/server-side-javascript-with-node-js/ http://sapessi.com/2009/11/server-side-javascript-with-node-js/#comments Sat, 21 Nov 2009 17:01:12 +0000 Stefano Buliani http://sapessi.com/?p=337

While skimming through my RSS feed I stumbled upon an interesting article on Ajaxian about server side JavaScript programming.

I really, really enjoy writing JavaScript code so I decided to read through the article and take a look at the node.js library linked there.

node.js logo

Node is a framework to build server-side event-driven JavaScript applications. Developed in C++ on top of Google’s V8 JavaScript engine and accompanied by a set of JavaScript libraries Node seems to make building distributed (over a network), fast applications a piece of cake even for inexperienced developers.
Event-driven here really is the keyword because it represent a big change in the way applications are built (and architected in your brain).

Node is similar in design to and influenced by systems like Ruby’s Event Machine or Python’s Twisted. Node takes the event model a bit further—it presents the event loop as a language construct instead of as a library. In other systems there is always a blocking call to start the event-loop. Typically one defines behavior through callbacks at the beginning of a script and at the end starts a server through a blocking call like

EventMachine::run()

. In Node there is no such start-the-event-loop call. Node simply enters the event loop after executing the input script.

The example below (taken from Node’s website) will make everything clear… hopefully.

var sys = require(‘sys’),
     http = require(‘http’);

http.createServer(function (req, res) {
  setTimeout(function () {
    res.sendHeader(200, {‘Content-Type’: ‘text/plain’});
    res.sendBody(‘Hello World’);
    res.finish();
  }, 2000);
}).listen(8000);
sys.puts(‘Server running at http://127.0.0.1:8000/’);

Can you see the beauty?!

I have only one worry. This is an open-source effort. The community behind it on Google groups is just 181 members strong (so far). What if node.js suddenly stops being the cool thing and the community disappears.

As much as I love writing JavaScript and I can really see the value in what they are building I’ll still wait until there are 100 Google Groups for node.js and a trillion members in each before using it in anything close to a production system.

Having said that I’m going back to writing silly node.js apps now. Well done to all the developers involved in the project and keep it up!

]]>
http://sapessi.com/2009/11/server-side-javascript-with-node-js/feed/ 20
SVG graphics with JavaScript http://sapessi.com/2009/11/svg-graphics-with-javascript/ http://sapessi.com/2009/11/svg-graphics-with-javascript/#comments Wed, 18 Nov 2009 13:47:03 +0000 Stefano Buliani http://sapessi.com/?p=327

When I started developing TweetSentiment I decided that the interface should have as little text as possible. Most of the information I was interested in could be displayed graphically, with a chart.

So I looked at all the options available for chart generation.

  1. Backend code to generate a static image (JFreeChart or PHP)
  2. Flash object to draw a chart retrieving the data from a URL
  3. Draw charts in JavaScript directly on the client’s browser

I’m not a huge fan of option one. Primarily because TweetSentiment is hosted on a tiny linux box which would not be able to handle the load for the traffic the site gets, also because it’s a static image – it’s just not very funky – no interaction possible.

Option two would certainly create spectacular looking charts but I have almost no experience with flash and I wasn’t about to start learning a new language/technology. Plus I’m not into browser plugins if I can avoid them.

JavaScript is a language I’m familiar with and I remember seeing some cool-looking charts generated with Dojo. Unfortunately for TweetSentiment I have used jquery since the most important thing for me there was DOM manipulation (and jquery is just better for that).
I then started shopping around for jQuery plugins to generate charts. There are a few around but none of them impressed me. They just weren’t as good looking as I’d hoped nor they were interactive.

By coincidence I stumbled on RaphaelJS. A JavaScript library to draw Scalable Vector Graphics directly from JavaScript based on jQuery. I tested the samples on the website with a few browsers and I was happy to discover it worked just fine with all of them.

Scalable Vector Graphics (SVG) is a family of specifications of an XML-based file format for describing two-dimensional vector graphics, both static and dynamic (i.e. interactive or animated).

The SVG specification is an open standard that has been under development by the World Wide Web Consortium (W3C) since 1999.

I also discovered that there is a charting library built on top of RaphaelJS, which is exactly what I was looking for. However, being a geek, I decided to go ahead and try to develop something on my own. You know, just for kicks.

As I delved deeper into RaphaelJS I found the library to be incredibly powerful. It’s a shame that the documentation provided on the website lets it down a bit.
The most powerful bit is the ability to extend objects and attach new functions to them. Something scarcely mentioned in the available documentation.

For example if you need to use curved lines (paths as SVG calls them) you can just defined a default function you can then call from your code simply by adding it to the el “object” in RaphaelJS

Raphael.el.curveTo = function () {
  var args = Array.prototype.splice.call(arguments, 0, arguments.length),
  d = [0, 0, 0, 0, "s", 0, "c"][args.length] || "";
  this.isAbsolute &amp;&amp; (d = d.toUpperCase());
  this._last = {x: args[args.length - 2], y: args[args.length - 1]};
  return this.attr({path: this.attrs.path + d + args});
};

Another very useful function I found in one of their samples is the andClose() This is used to close a polygon you have started drawing with paths. No matter where you got to it will reconnect to the initial point.

Raphael.el.andClose = function () {
  return this.attr({path: this.attrs.path + "z"});
};

This can then be used this way using chaining.

RaphaelJSElement.lineTo(x, opts.height - bottomgutter).andClose();

I’m still developing the chart library I used in TweetSentiment and I’m planning to publish it here with some documentation under MIT licence.

]]>
http://sapessi.com/2009/11/svg-graphics-with-javascript/feed/ 2
Google Go http://sapessi.com/2009/11/google-go/ http://sapessi.com/2009/11/google-go/#comments Wed, 11 Nov 2009 15:10:10 +0000 Stefano Buliani http://sapessi.com/?p=315

I have just stumbled on an article on Slashdot about Google Go. A programming language which was initially developed internally and it’s now been open-sourced.

Over the past few months I have been working with Erlang on multi-process distributed systems. Concurrency in Go is based on the same model Erlang uses: Hoare’s Communicating Sequential Processes, or CSP.

Go’s concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects.

The thing I appreciate the most of Erlang is how easy it is to distribute – even over a network – and monitor processes. After skimming through Go’s documentation I couldn’t help but notice the lack of integrated network-distribution of processes or goroutines, as they are called.

This for me means that while Go, exactly like Erlang and Occam, simplifies the approach to multi-threaded programming (at least from a developer’s point of view), it doesn’t go all the way and leaves developers to deal with distribution of processes over a network on their own.

This is a perfectly acceptable design choice in the name of flexibility, however it creates a whole new bunch of use cases for the developers to deal with, such as sending channels/channels-data over the network).

What I don’t like of Erlang is its handling of strings, which makes it unsuitable for a number of application that could benefit greatly from its multi-processing capabilities. Very few people would recommend Erlang as a high performance string manipulation language. To quote Sendmail’s case study in implementing their Sendmail load balancing “Client Daemon” in Erlang:

…But Erlang’s treatment of strings as lists of bytes is as elegant as it is impractical. The factor-of-eight storage expansion of text, as well as the copying that occurs during message-passing, cripples Erlang for all but the most performance-insensitive text-processing applications.

Go totally wins on this.

If you now asked me to pick one to use. I’d still go for Erlang.
Functional languages and CSP are changing the way big companies approach the development of business critical applications. In the last year alone I have seen the number of positions advertised for Erlang developers in the financial sector multiply.
Go, if it continues on this path definitely has a chance of making it there. Open-sourcing the language is the right decision to give it the exposure to real-life scenarios that tend to accelerate the growth and adoption rate.

]]>
http://sapessi.com/2009/11/google-go/feed/ 0
TweetSentiment – Twitter activity VS market activity http://sapessi.com/2009/11/tweetsentiment-twitter-activity-vs-market-activity/ http://sapessi.com/2009/11/tweetsentiment-twitter-activity-vs-market-activity/#comments Thu, 05 Nov 2009 18:30:47 +0000 Stefano Buliani http://sapessi.com/?p=307

Perhaps I’m bored. Maybe I’m just too single.
Anyway. A couple of days ago I was looking at StockTwits again and saw a considerable amount of activity. There and then I decided that it would have been pretty cool to check whether there was any correlation between the activity about a stock on Twitter and the actual trading volume on the market.

I set off to build a small system to do just that. A couple of groovy scripts to collect the data and save it plus some JavaScript and HTML to display it.

So here I am a couple of days later talking about TweetSentiment.

Collecting the data I needed was pretty trivial thanks to the fact that StockTwits asks all its members to tweet the symbols they are trading preceded by a dollar sign ($).
My first task was to build a list of securities I was interested in. I decided to go to Covestor.com and pick the most traded securities list.  Once that list was ready all I had to do was call the Twitter Search APIs for each stock and store the results.
Groovy made these tasks incredibly simple.

def output = new URL("http://search.twitter.com/search.atom?q=${"\$"+curSecurity.symbol}&amp;rpp=100").getText()

def parsedXml = new XmlParser().parseText(output)

parsedXml.entry.each() { curEntry -&gt;

I’m not doing much with the tweets. At the moment I just count them and look for “buy” and “sell”. I’m looking into smarter ways of analysing the text and look for positive/negative opinions. If you have any suggestion on this please do leave a comment.

Next in my TODO list was getting some market data. For that I’m using Yahoo finance asking for previous closing price and average volume for each security. Once again, piece of cake with Groovy. Same thing as before just different URL.
Yahoo finance actually has quite a simple interface to let you grab the data. Check out this page on Cliff’s Notes.

Now I had all the data I needed. Only thing left was to build some sort of interface to look at it. I wanted to display all of the data on a chart to be able to make sense of it as quickly as possible.
My server (i.e. the machine running this blog) with its puny power could have never handled the traffic I get while generating charts. Therefore what I decided to do is to export all the data in text files (JSON format) and do all of the chart generation with JavaScript.

I browsed the web for a bit looking for charting components for jQuery. Couldn’t find anything I was happy with. Not functionality-wise but aesthetically. Or maybe I just felt like playing around with JS for some time – unfortunately as it often happens with JS I spent the best part of the last 2 days working on it.

I decided to use a library called Raphaël. This library makes working with vector graphics incredibly simple and the results just look amazing. I know that they are in the process of building a charting library on top of it however… well I have no excuse now other than I wanted to work with JS.

What I did is build a very simple jQuery component called SAPchart. It only draws line charts but it’s quite simple and I think the results look pretty good.

You can download the source here. It requires jQuery, Raphaël, and raphael path methods.

// Constructor. Possible options are:
// showGrid: true
// width: 800
// height: 250
// legendWidth: 200
// showLabels: true
// showDots: true
$("#holder").SAPchart({legendWidth: 800});

// Give the library a list of labels for the x axis
$("#holder").setLabels(theLabels);

// Add to the chart as many series as you want – provided the length of the series is
// the same of the labels (quite unsophisticated but serves its purpose)
// array of values, unique id of the series, legend name, additional options)
$("#holder").addSeries(theTweets, "tweet", "Twitter Activity", { "color": [.6, 1, .75] });
$("#holder").addSeries(theVolumes, "volume", "Average Daily Volume", { "color": [.2, 1, .75] });

// Draws the chart in your html element
$("#holder").draw()

// This will return an HTML table containing the legend of all the series you specified
// the boolean parameter tells me whether you want your legend to be horizontal (default vertical)
$("#holder").getLegend(true).appendTo($("#legend"));

I’m now wondering whether I should keep working on it and improve it.

Anyway TweetSentiment is now available here. I’m planning to keep it running and collecting data for a while. I’d say you need at least 3/4 months worth of data before you can make any useful observations.

]]>
http://sapessi.com/2009/11/tweetsentiment-twitter-activity-vs-market-activity/feed/ 3
Tracking database records history http://sapessi.com/2009/11/tracking-database-records-history/ http://sapessi.com/2009/11/tracking-database-records-history/#comments Tue, 03 Nov 2009 12:40:14 +0000 Stefano Buliani http://sapessi.com/?p=288

If you Google for this you’ll find that the easiest way to provide a full audit of everything that happened in your database is to create a duplicate of each table you need history and insert a copy of the record you are changing in there for every update.

For example if you have a table called client

CREATE TABLE client (
  id INTEGER,
  firstname VARCHAR(255),
  lastname VARCHAR(255),
  datecreated TIMESTAMP
);

You will also create a table called client_history

CREATE TABLE client_history (
  id INTEGER,
  firstname VARCHAR(255),
  lastname VARCHAR(255),
  datecreated TIMESTAMP,
  history_creation TIMESTAMP,
  history_user INTEGER – A foreign key to the user who updated this record if required
);

You’ll then attach an onUpdate trigger to the client table inserting all the data in your history table. This solution gives you the ability to keep track of everything that happens in your database, when it happened and who-done-it.
What this solution doesn’t deal with is schema changes. Also querying the state of a record at a point in time to join it with other tables might be a bit tricky.

Let’s start from the first point. Schema changes.
The most obvious and simple solution is to put in place a proper process for all database schema changes. i.e. whoever touches the schema is also in charge of updating the history table structure and the trigger on the main table. Doable but a bit too prone to human error if you ask me.

What I generally do is use a small function/stored procedure to create the history on a table. This procedure is in charge of both creating the history table the first time it’s called and updating the structure if the main table has changed.
In this example I’m going to use PostgreSQL. First because it’s a database I’m familiar with and more importantly being open-srouce you can just download it and try this yourself.

My example here is a bit verbose but it serves its purpose. There are many database-specific instructions you could use to extract a table structure or perform some of the simple tasks of this function. However, what I’m trying to demonstrate is the concept and not PostgreSQL trickery.
Most relational databases store the schema information in internal tables (pg_attribute, pg_class and pg_namespace in this case) so that’s what we are going to use to read the structure of our original table.

– This function expects as a parameter the name of the table you want to
– create history for and a unique identifier for the record.
– in my case all tables have an ID column. Alternatively you could hand
– as a parameter also the name of the unique id column
CREATE OR REPLACE FUNCTION create_history(tablename VARCHAR(255), recordid INTEGER) RETURNS BOOLEAN AS $$
DECLARE
  vhisttablename NAME; – history table name
  vhistfieldcount INTEGER;  – number of fields in history table
  vtablefieldcount INTEGER; – number of fields in main table
  vtmprowcount INTEGER; – temporary variable to store query results
  vcurfield RECORD; – variable to loop over fields of main table to create history
  vhisttablesql TEXT; – sql to create history table
  vfieldlist TEXT; – list fo fields in main table
  vhisttablefields TEXT; – list of fields in history table
  vhistinsertsql TEXT; – SQL for insert statement
  vtmptablename VARCHAR(255); – temp variable to check if history exists
BEGIN
  vhisttablename := tablename||‘_hist’;

  – Check if the history table exists, if not we need to create it
  SELECT INTO vtmptablename relname
  FROM pg_class
  WHERE relname = vhisttablename;

  vfieldlist := ;
  vhisttablefields := ;

  – count the fields in the history/current table if history exists
  IF vtmptablename IS NOT NULL
  THEN
    SELECT INTO vhistfieldcount count(*)
    FROM pg_attribute AS a
    INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
    INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
    WHERE a.attnum > 0
    AND c.relname = vhisttablename – history table name
    AND NOT a.attisdropped
    AND pg_table_is_visible(c.oid);

    SELECT INTO vtablefieldcount count(*)
    FROM pg_attribute AS a
    INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
    INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
    WHERE a.attnum > 0
    AND c.relname = tablename – main table
    AND NOT a.attisdropped
    AND pg_table_is_visible(c.oid);
  END IF;

  – Get all the attributes and their type from the original table
  FOR vcurfield IN
  SELECT a.attname AS COLUMN, format_type(a.atttypid, a.atttypmod) AS datatype
  FROM pg_attribute AS a
  INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
  INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
  WHERE a.attnum > 0
  AND c.relname = tablename
  AND NOT a.attisdropped
  AND pg_table_is_visible(c.oid)
  LOOP
    – populate lists of fields both for history creation and select from main table
    vhisttablefields := vhisttablefields||vcurfield.COLUMN||‘ ‘||vcurfield.datatype||‘ NULL, ‘;
    vfieldlist := vfieldlist||vcurfield.COLUMN||‘, ‘;

    – If the history table exists and the number of fields is different
    – from the main table (+3 as we add a timestamp, a user field and an history unique id)
    IF vtmptablename IS NOT NULL AND vtablefieldcount+3 <> vhistfieldcount
    THEN
      – make sure that this is the missing field in the history table
       PERFORM a.attname
       FROM pg_attribute AS a
       INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
       INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
       WHERE a.attnum > 0
       AND c.relname = vhisttablename
       AND NOT a.attisdropped
       AND a.attname = vcurfield.COLUMN
       AND pg_table_is_visible(c.oid);

       GET DIAGNOSTICS vtmprowcount = ROW_COUNT;

      – If it is then generate the SQL and execute the alter table
       IF vtmprowcount = 0
       THEN
         vhisttablesql := ‘ALTER TABLE ‘||vhisttablename||‘ ADD COLUMN ‘||vcurfield.COLUMN||‘ ‘||vcurfield.datatype||‘ NULL;’;
         EXECUTE vhisttablesql;
       END IF;
     END IF;
  END LOOP;

  – The history table doesn’t exist. Create it
  IF vtmptablename IS NULL OR vtmptablename =
  THEN
    vhisttablesql := ‘CREATE TABLE ‘||vhisttablename||‘ ( histid SERIAL PRIMARY KEY, ‘;

    vhisttablesql := vhisttablesql||vhisttablefields||‘ history_creation TIMESTAMP NOT NULL DEFAULT now() );’;

    EXECUTE vhisttablesql;

    – create indexes
    vhisttablesql := ‘ CREATE INDEX idx_’||vhisttablename||‘_1 ON ‘||vhisttablename||‘ (id); ‘;
    EXECUTE vhisttablesql;

    vhisttablesql := ‘ CREATE INDEX idx_’||vhisttablename||‘_2 ON ‘||vhisttablename||‘ (history_creation); ‘;
    EXECUTE vhisttablesql;
  END IF;

  – Proceed with the history creation
  vhistinsertsql := ‘INSERT INTO ‘||vhisttablename||‘ (‘||vfieldlist||‘ history_creation) SELECT ‘||vfieldlist||‘ now() FROM ‘||tablename||‘ WHERE id=’||recordid||‘;’;

  RAISE NOTICE ‘Executing %’, vhistinsertsql;
  EXECUTE vhistinsertsql;

  GET DIAGNOSTICS vtmprowcount = ROW_COUNT;
  IF vtmprowcount > 0
  THEN
    RETURN TRUE;
  ELSE
    RETURN FALSE;
  END IF;
END;
$$
LANGUAGE plpgsql;

You’ll notice this function only adds fields to the history table. That’s because even if we remove a field from the original table we still want to keep track of it in our history.

Now on to querying the history to retrieve the state of a record at a specific date.

This is a bit tricky. At the moment as far as I know only Oracle has the ability return a “virtual table” from a function. PostgreSQL can return a RECORD variable – which is great to use inside functions but once outside it loses its structure and turns into a comma separated list of values – or a %ROWTYPE you can define.
This unfortunately means that you’d have to create a custom function for each history table to returns a SET OF client – in our case.

What we are trying to achieve is something like what this query does.

SELECT h.id, firstname, lastname, datecreated, hist_creation
FROM client_hist AS h
INNER JOIN (
  SELECT min(history_creation) AS mintstamp, id AS id
  FROM client_hist
  WHERE history_creation > THE-TIME-YOU-NEED
  AND id= YOUR-RECORD-ID
  GROUP BY id
) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);

Basically the next record in the history after the time specified.

At this point so far as I can see we have two options. Either create a function that returns a specific type of ROW, or write a more generic history function to return only the id of the history record we need and not the whole row (If you look at the history creation function we are putting a histid column in there).

You could use this function in your queries to get the history record id out like this.

– This function returns the histid you need to look at in your history table
CREATE OR REPLACE FUNCTION select_hist_id(tablename VARCHAR(255), recordid INTEGER, tstamp TIMESTAMP) RETURNS INTEGER AS $$
DECLARE
  curs REFCURSOR;
  vid INTEGER;
BEGIN
  OPEN curs FOR EXECUTE ‘SELECT h.histid
  FROM ‘
||tablename||‘_hist AS h
  INNER JOIN (
    SELECT min(history_creation) AS mintstamp, id
    FROM ‘
||tablename||‘_hist
    WHERE history_creation > ‘
||tstamp||
    AND id=’
||recid||
    GROUP BY id
  ) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);’
;

  FETCH curs INTO vid;

  RETURN vid;
END;
$$
LANGUAGE plpgsql;

– At this point you can just get ids like this
– records in client as of this time 2009-11-02 21:13:41.552601
SELECT select_hist_id(‘client’, id, ’2009-11-02 21:13:41.552601′::TIMESTAMP) FROM client;

I’m sure you can figure out the rest.

Be careful this is very slow on large data sets. If you are planning to work on millions of records then you should consider building a history lookup function for each table either defining data types for your RECORD or using OUT variables.

CREATE FUNCTION foo(recordid int, firstname OUT VARCHAR(10))
]]>
http://sapessi.com/2009/11/tracking-database-records-history/feed/ 0