Nov 26

What do you need to know? Java Enterprise Edition. That’s the simple answer.
If you read between the lines what I’m really saying is: “You better make sure you really don’t give a flying toss about technology and don’t care about how frustrating what you spend most of your day doing is.”

A few days ago I was talking with a friend who’s in the process of interviewing with a few big financial institutions in London. The most common questions according to him. Struts, JEE.
Spring was also mentioned by I don’t have much against Spring. If used properly it really does keep your code tidy and avoids duplication.

You see a bank’s idea of a high performance reliable system is a monolithic Java application running a million-billion threads at the same time on a huge machine. Maybe that’s a bit too harsh. Sometimes they do distribute. They just add 15 layers shooting messages at each other in between the 2 they actually need.
Then they wonder why the best talent winds up working for Google, and their budget to deliver 1 application is twice what Facebook spent to build its entire social network.

What I have a problem with is the architects who design these systems. And the clearly-drunken bank managers who hired them.

Dumb and DumberThe most common profile for architects in investment banks (and I met quite a few) is 50-something, stopped following/reading about technology in 1995 and prior to being employed by Mr Mighty WankBank (Thanks City Boy for the name) was VP of over-engineering at the Apache Software Foundation (focused on Struts).
First thing in their TODO list is hire some bright spark straight out of university who has a keen interest in over-engineering and never had to deliver anything in less than a geological era.
That’s when an overpaid team of consultants comes in to “sort-it-out”, at which point the bank loses control over its source-code and none of their developers can understand it any more.

Oh, and then there’s the process, oh the bureaucracy.

Fortunately I also know that in London there are a few sunny spots (techonology-wise) such as Hedge funds who started using Erlang (for example) to distribute their heavy computations over a network.

Now don’t get me wrong I have nothing against JEE per-se. Some of the libraries are actually very useful and improve a developer’s quality of life quite a lot. Just think about JTA.
What is wrong is the way they are used by people who never left the nineties.

Nov 21

While skimming through my RSS feed I stumbled upon an interesting article on Ajaxian about server side JavaScript programming.

I really, really enjoy writing JavaScript code so I decided to read through the article and take a look at the node.js library linked there.

node.js logo

Node is a framework to build server-side event-driven JavaScript applications. Developed in C++ on top of Google’s V8 JavaScript engine and accompanied by a set of JavaScript libraries Node seems to make building distributed (over a network), fast applications a piece of cake even for inexperienced developers.
Event-driven here really is the keyword because it represent a big change in the way applications are built (and architected in your brain).

Node is similar in design to and influenced by systems like Ruby’s Event Machine or Python’s Twisted. Node takes the event model a bit further—it presents the event loop as a language construct instead of as a library. In other systems there is always a blocking call to start the event-loop. Typically one defines behavior through callbacks at the beginning of a script and at the end starts a server through a blocking call like

EventMachine::run()

. In Node there is no such start-the-event-loop call. Node simply enters the event loop after executing the input script.

The example below (taken from Node’s website) will make everything clear… hopefully.

var sys = require(‘sys’),
     http = require(‘http’);

http.createServer(function (req, res) {
  setTimeout(function () {
    res.sendHeader(200, {‘Content-Type’: ‘text/plain’});
    res.sendBody(‘Hello World’);
    res.finish();
  }, 2000);
}).listen(8000);
sys.puts(‘Server running at http://127.0.0.1:8000/’);

Can you see the beauty?!

I have only one worry. This is an open-source effort. The community behind it on Google groups is just 181 members strong (so far). What if node.js suddenly stops being the cool thing and the community disappears.

As much as I love writing JavaScript and I can really see the value in what they are building I’ll still wait until there are 100 Google Groups for node.js and a trillion members in each before using it in anything close to a production system.

Having said that I’m going back to writing silly node.js apps now. Well done to all the developers involved in the project and keep it up!

Tagged with:
Nov 18

When I started developing TweetSentiment I decided that the interface should have as little text as possible. Most of the information I was interested in could be displayed graphically, with a chart.

So I looked at all the options available for chart generation.

  1. Backend code to generate a static image (JFreeChart or PHP)
  2. Flash object to draw a chart retrieving the data from a URL
  3. Draw charts in JavaScript directly on the client’s browser

I’m not a huge fan of option one. Primarily because TweetSentiment is hosted on a tiny linux box which would not be able to handle the load for the traffic the site gets, also because it’s a static image – it’s just not very funky – no interaction possible.

Option two would certainly create spectacular looking charts but I have almost no experience with flash and I wasn’t about to start learning a new language/technology. Plus I’m not into browser plugins if I can avoid them.

JavaScript is a language I’m familiar with and I remember seeing some cool-looking charts generated with Dojo. Unfortunately for TweetSentiment I have used jquery since the most important thing for me there was DOM manipulation (and jquery is just better for that).
I then started shopping around for jQuery plugins to generate charts. There are a few around but none of them impressed me. They just weren’t as good looking as I’d hoped nor they were interactive.

By coincidence I stumbled on RaphaelJS. A JavaScript library to draw Scalable Vector Graphics directly from JavaScript based on jQuery. I tested the samples on the website with a few browsers and I was happy to discover it worked just fine with all of them.

Scalable Vector Graphics (SVG) is a family of specifications of an XML-based file format for describing two-dimensional vector graphics, both static and dynamic (i.e. interactive or animated).

The SVG specification is an open standard that has been under development by the World Wide Web Consortium (W3C) since 1999.

I also discovered that there is a charting library built on top of RaphaelJS, which is exactly what I was looking for. However, being a geek, I decided to go ahead and try to develop something on my own. You know, just for kicks.

As I delved deeper into RaphaelJS I found the library to be incredibly powerful. It’s a shame that the documentation provided on the website lets it down a bit.
The most powerful bit is the ability to extend objects and attach new functions to them. Something scarcely mentioned in the available documentation.

For example if you need to use curved lines (paths as SVG calls them) you can just defined a default function you can then call from your code simply by adding it to the el “object” in RaphaelJS

Raphael.el.curveTo = function () {
  var args = Array.prototype.splice.call(arguments, 0, arguments.length),
  d = [0, 0, 0, 0, "s", 0, "c"][args.length] || "";
  this.isAbsolute && (d = d.toUpperCase());
  this._last = {x: args[args.length - 2], y: args[args.length - 1]};
  return this.attr({path: this.attrs.path + d + args});
};

Another very useful function I found in one of their samples is the andClose() This is used to close a polygon you have started drawing with paths. No matter where you got to it will reconnect to the initial point.

Raphael.el.andClose = function () {
  return this.attr({path: this.attrs.path + "z"});
};

This can then be used this way using chaining.

RaphaelJSElement.lineTo(x, opts.height - bottomgutter).andClose();

I’m still developing the chart library I used in TweetSentiment and I’m planning to publish it here with some documentation under MIT licence.

Tagged with:
Nov 15

Every single journalist on the interwebs has already chanted praises for this game. And I intend to do exactly the same.

Last week I optimistically bought a copy of the PC version of Dragon Age. Optimistically because I realised only after the purchase that my PC at home was an old AMD 2.2GhZ with 1GB or RAM and a puny geforce 7900 GTX (or something like that).

It took some compromising but the game runs. More then my computer being able to deal with it I suppose it’s the game caring for my PC as you would with a senile pensioner.

Nevertheless I played for a few hours yesterday and I have this to say. They got it just right.

Dragon Age Battle ScreenshotI was a huge fan of the original Baldur’s Gate series and played it over and over. Mostly because I was at school at the time and spare time was plentiful and cheap.
Why was I fan? Because of the storyline. It was stupendously enthralling.
After BG I stuck with BioWare and bought Neverwinter Nights. Everybody raved about it yet I didn’t like it. Very little control over the whole party. And the little you had was spoiled by a  slightly awkward control system.

The toolset they released with Neverwinter Nights was also mightily impressive. You could see that was the start of a revolution in the way RPG were played and content was created for them. Unfortunately, as most first editions/tries, it lacked in functionality and flexibility.
This, as far as I’m concerned, also put a cap on game designer’s creativity when they were writing the main campaign story.

I’m glad to see they have worked all these things out. Everything I didn’t like in Neverwinter Nights is now fixed in Dragon Age.
The game just flows.
This is thanks to a beautifully written and very compelling story, a more intuitive and usable control system and a fantastic toolset. Which I believe put the power back in the game designers’ hands and allowed them to really go to town with the storyline.

It’s a fantastic game. If you don’t know it watch the trailer video, buy it, lock yourself in your room and start playing.

Tagged with:
Nov 11

I have just stumbled on an article on Slashdot about Google Go. A programming language which was initially developed internally and it’s now been open-sourced.

Over the past few months I have been working with Erlang on multi-process distributed systems. Concurrency in Go is based on the same model Erlang uses: Hoare’s Communicating Sequential Processes, or CSP.

Go’s concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects.

The thing I appreciate the most of Erlang is how easy it is to distribute – even over a network – and monitor processes. After skimming through Go’s documentation I couldn’t help but notice the lack of integrated network-distribution of processes or goroutines, as they are called.

This for me means that while Go, exactly like Erlang and Occam, simplifies the approach to multi-threaded programming (at least from a developer’s point of view), it doesn’t go all the way and leaves developers to deal with distribution of processes over a network on their own.

This is a perfectly acceptable design choice in the name of flexibility, however it creates a whole new bunch of use cases for the developers to deal with, such as sending channels/channels-data over the network).

What I don’t like of Erlang is its handling of strings, which makes it unsuitable for a number of application that could benefit greatly from its multi-processing capabilities. Very few people would recommend Erlang as a high performance string manipulation language. To quote Sendmail’s case study in implementing their Sendmail load balancing “Client Daemon” in Erlang:

…But Erlang’s treatment of strings as lists of bytes is as elegant as it is impractical. The factor-of-eight storage expansion of text, as well as the copying that occurs during message-passing, cripples Erlang for all but the most performance-insensitive text-processing applications.

Go totally wins on this.

If you now asked me to pick one to use. I’d still go for Erlang.
Functional languages and CSP are changing the way big companies approach the development of business critical applications. In the last year alone I have seen the number of positions advertised for Erlang developers in the financial sector multiply.
Go, if it continues on this path definitely has a chance of making it there. Open-sourcing the language is the right decision to give it the exposure to real-life scenarios that tend to accelerate the growth and adoption rate.

Tagged with:
Nov 05

Perhaps I’m bored. Maybe I’m just too single.
Anyway. A couple of days ago I was looking at StockTwits again and saw a considerable amount of activity. There and then I decided that it would have been pretty cool to check whether there was any correlation between the activity about a stock on Twitter and the actual trading volume on the market.

I set off to build a small system to do just that. A couple of groovy scripts to collect the data and save it plus some JavaScript and HTML to display it.

So here I am a couple of days later talking about TweetSentiment.

Collecting the data I needed was pretty trivial thanks to the fact that StockTwits asks all its members to tweet the symbols they are trading preceded by a dollar sign ($).
My first task was to build a list of securities I was interested in. I decided to go to Covestor.com and pick the most traded securities list.  Once that list was ready all I had to do was call the Twitter Search APIs for each stock and store the results.
Groovy made these tasks incredibly simple.

def output = new URL("http://search.twitter.com/search.atom?q=${"\$"+curSecurity.symbol}&rpp=100").getText()

def parsedXml = new XmlParser().parseText(output)

parsedXml.entry.each() { curEntry ->

I’m not doing much with the tweets. At the moment I just count them and look for “buy” and “sell”. I’m looking into smarter ways of analysing the text and look for positive/negative opinions. If you have any suggestion on this please do leave a comment.

Next in my TODO list was getting some market data. For that I’m using Yahoo finance asking for previous closing price and average volume for each security. Once again, piece of cake with Groovy. Same thing as before just different URL.
Yahoo finance actually has quite a simple interface to let you grab the data. Check out this page on Cliff’s Notes.

Now I had all the data I needed. Only thing left was to build some sort of interface to look at it. I wanted to display all of the data on a chart to be able to make sense of it as quickly as possible.
My server (i.e. the machine running this blog) with its puny power could have never handled the traffic I get while generating charts. Therefore what I decided to do is to export all the data in text files (JSON format) and do all of the chart generation with JavaScript.

I browsed the web for a bit looking for charting components for jQuery. Couldn’t find anything I was happy with. Not functionality-wise but aesthetically. Or maybe I just felt like playing around with JS for some time – unfortunately as it often happens with JS I spent the best part of the last 2 days working on it.

I decided to use a library called Raphaël. This library makes working with vector graphics incredibly simple and the results just look amazing. I know that they are in the process of building a charting library on top of it however… well I have no excuse now other than I wanted to work with JS.

What I did is build a very simple jQuery component called SAPchart. It only draws line charts but it’s quite simple and I think the results look pretty good.

You can download the source here. It requires jQuery, Raphaël, and raphael path methods.

// Constructor. Possible options are:
// showGrid: true
// width: 800
// height: 250
// legendWidth: 200
// showLabels: true
// showDots: true
$("#holder").SAPchart({legendWidth: 800});

// Give the library a list of labels for the x axis
$("#holder").setLabels(theLabels);

// Add to the chart as many series as you want – provided the length of the series is
// the same of the labels (quite unsophisticated but serves its purpose)
// array of values, unique id of the series, legend name, additional options)
$("#holder").addSeries(theTweets, "tweet", "Twitter Activity", { "color": [.6, 1, .75] });
$("#holder").addSeries(theVolumes, "volume", "Average Daily Volume", { "color": [.2, 1, .75] });

// Draws the chart in your html element
$("#holder").draw()

// This will return an HTML table containing the legend of all the series you specified
// the boolean parameter tells me whether you want your legend to be horizontal (default vertical)
$("#holder").getLegend(true).appendTo($("#legend"));

I’m now wondering whether I should keep working on it and improve it.

Anyway TweetSentiment is now available here. I’m planning to keep it running and collecting data for a while. I’d say you need at least 3/4 months worth of data before you can make any useful observations.

Tagged with:
Nov 03

If you Google for this you’ll find that the easiest way to provide a full audit of everything that happened in your database is to create a duplicate of each table you need history and insert a copy of the record you are changing in there for every update.

For example if you have a table called client

CREATE TABLE client (
  id INTEGER,
  firstname VARCHAR(255),
  lastname VARCHAR(255),
  datecreated TIMESTAMP
);

You will also create a table called client_history

CREATE TABLE client_history (
  id INTEGER,
  firstname VARCHAR(255),
  lastname VARCHAR(255),
  datecreated TIMESTAMP,
  history_creation TIMESTAMP,
  history_user INTEGER – A foreign key to the user who updated this record if required
);

You’ll then attach an onUpdate trigger to the client table inserting all the data in your history table. This solution gives you the ability to keep track of everything that happens in your database, when it happened and who-done-it.
What this solution doesn’t deal with is schema changes. Also querying the state of a record at a point in time to join it with other tables might be a bit tricky.

Let’s start from the first point. Schema changes.
The most obvious and simple solution is to put in place a proper process for all database schema changes. i.e. whoever touches the schema is also in charge of updating the history table structure and the trigger on the main table. Doable but a bit too prone to human error if you ask me.

What I generally do is use a small function/stored procedure to create the history on a table. This procedure is in charge of both creating the history table the first time it’s called and updating the structure if the main table has changed.
In this example I’m going to use PostgreSQL. First because it’s a database I’m familiar with and more importantly being open-srouce you can just download it and try this yourself.

My example here is a bit verbose but it serves its purpose. There are many database-specific instructions you could use to extract a table structure or perform some of the simple tasks of this function. However, what I’m trying to demonstrate is the concept and not PostgreSQL trickery.
Most relational databases store the schema information in internal tables (pg_attribute, pg_class and pg_namespace in this case) so that’s what we are going to use to read the structure of our original table.

– This function expects as a parameter the name of the table you want to
– create history for and a unique identifier for the record.
– in my case all tables have an ID column. Alternatively you could hand
– as a parameter also the name of the unique id column
CREATE OR REPLACE FUNCTION create_history(tablename VARCHAR(255), recordid INTEGER) RETURNS BOOLEAN AS $$
DECLARE
  vhisttablename NAME; – history table name
  vhistfieldcount INTEGER;  – number of fields in history table
  vtablefieldcount INTEGER; – number of fields in main table
  vtmprowcount INTEGER; – temporary variable to store query results
  vcurfield RECORD; – variable to loop over fields of main table to create history
  vhisttablesql TEXT; – sql to create history table
  vfieldlist TEXT; – list fo fields in main table
  vhisttablefields TEXT; – list of fields in history table
  vhistinsertsql TEXT; – SQL for insert statement
  vtmptablename VARCHAR(255); – temp variable to check if history exists
BEGIN
  vhisttablename := tablename||‘_hist’;

  – Check if the history table exists, if not we need to create it
  SELECT INTO vtmptablename relname
  FROM pg_class
  WHERE relname = vhisttablename;

  vfieldlist := ;
  vhisttablefields := ;

  – count the fields in the history/current table if history exists
  IF vtmptablename IS NOT NULL
  THEN
    SELECT INTO vhistfieldcount count(*)
    FROM pg_attribute AS a
    INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
    INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
    WHERE a.attnum > 0
    AND c.relname = vhisttablename – history table name
    AND NOT a.attisdropped
    AND pg_table_is_visible(c.oid);

    SELECT INTO vtablefieldcount count(*)
    FROM pg_attribute AS a
    INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
    INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
    WHERE a.attnum > 0
    AND c.relname = tablename – main table
    AND NOT a.attisdropped
    AND pg_table_is_visible(c.oid);
  END IF;

  – Get all the attributes and their type from the original table
  FOR vcurfield IN
  SELECT a.attname AS COLUMN, format_type(a.atttypid, a.atttypmod) AS datatype
  FROM pg_attribute AS a
  INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
  INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
  WHERE a.attnum > 0
  AND c.relname = tablename
  AND NOT a.attisdropped
  AND pg_table_is_visible(c.oid)
  LOOP
    – populate lists of fields both for history creation and select from main table
    vhisttablefields := vhisttablefields||vcurfield.COLUMN||‘ ‘||vcurfield.datatype||‘ NULL, ‘;
    vfieldlist := vfieldlist||vcurfield.COLUMN||‘, ‘;

    – If the history table exists and the number of fields is different
    – from the main table (+3 as we add a timestamp, a user field and an history unique id)
    IF vtmptablename IS NOT NULL AND vtablefieldcount+3 <> vhistfieldcount
    THEN
      – make sure that this is the missing field in the history table
       PERFORM a.attname
       FROM pg_attribute AS a
       INNER JOIN pg_class AS c ON (c.oid = a.attrelid)
       INNER JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
       WHERE a.attnum > 0
       AND c.relname = vhisttablename
       AND NOT a.attisdropped
       AND a.attname = vcurfield.COLUMN
       AND pg_table_is_visible(c.oid);

       GET DIAGNOSTICS vtmprowcount = ROW_COUNT;

      – If it is then generate the SQL and execute the alter table
       IF vtmprowcount = 0
       THEN
         vhisttablesql := ‘ALTER TABLE ‘||vhisttablename||‘ ADD COLUMN ‘||vcurfield.COLUMN||‘ ‘||vcurfield.datatype||‘ NULL;’;
         EXECUTE vhisttablesql;
       END IF;
     END IF;
  END LOOP;

  – The history table doesn’t exist. Create it
  IF vtmptablename IS NULL OR vtmptablename =
  THEN
    vhisttablesql := ‘CREATE TABLE ‘||vhisttablename||‘ ( histid SERIAL PRIMARY KEY, ‘;

    vhisttablesql := vhisttablesql||vhisttablefields||‘ history_creation TIMESTAMP NOT NULL DEFAULT now() );’;

    EXECUTE vhisttablesql;

    – create indexes
    vhisttablesql := ‘ CREATE INDEX idx_’||vhisttablename||‘_1 ON ‘||vhisttablename||‘ (id); ‘;
    EXECUTE vhisttablesql;

    vhisttablesql := ‘ CREATE INDEX idx_’||vhisttablename||‘_2 ON ‘||vhisttablename||‘ (history_creation); ‘;
    EXECUTE vhisttablesql;
  END IF;

  – Proceed with the history creation
  vhistinsertsql := ‘INSERT INTO ‘||vhisttablename||‘ (‘||vfieldlist||‘ history_creation) SELECT ‘||vfieldlist||‘ now() FROM ‘||tablename||‘ WHERE id=’||recordid||‘;’;

  RAISE NOTICE ‘Executing %’, vhistinsertsql;
  EXECUTE vhistinsertsql;

  GET DIAGNOSTICS vtmprowcount = ROW_COUNT;
  IF vtmprowcount > 0
  THEN
    RETURN TRUE;
  ELSE
    RETURN FALSE;
  END IF;
END;
$$
LANGUAGE plpgsql;

You’ll notice this function only adds fields to the history table. That’s because even if we remove a field from the original table we still want to keep track of it in our history.

Now on to querying the history to retrieve the state of a record at a specific date.

This is a bit tricky. At the moment as far as I know only Oracle has the ability return a “virtual table” from a function. PostgreSQL can return a RECORD variable – which is great to use inside functions but once outside it loses its structure and turns into a comma separated list of values – or a %ROWTYPE you can define.
This unfortunately means that you’d have to create a custom function for each history table to returns a SET OF client – in our case.

What we are trying to achieve is something like what this query does.

SELECT h.id, firstname, lastname, datecreated, hist_creation
FROM client_hist AS h
INNER JOIN (
  SELECT min(history_creation) AS mintstamp, id AS id
  FROM client_hist
  WHERE history_creation > THE-TIME-YOU-NEED
  AND id= YOUR-RECORD-ID
  GROUP BY id
) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);

Basically the next record in the history after the time specified.

At this point so far as I can see we have two options. Either create a function that returns a specific type of ROW, or write a more generic history function to return only the id of the history record we need and not the whole row (If you look at the history creation function we are putting a histid column in there).

You could use this function in your queries to get the history record id out like this.

– This function returns the histid you need to look at in your history table
CREATE OR REPLACE FUNCTION select_hist_id(tablename VARCHAR(255), recordid INTEGER, tstamp TIMESTAMP) RETURNS INTEGER AS $$
DECLARE
  curs REFCURSOR;
  vid INTEGER;
BEGIN
  OPEN curs FOR EXECUTE ‘SELECT h.histid
  FROM ‘
||tablename||‘_hist AS h
  INNER JOIN (
    SELECT min(history_creation) AS mintstamp, id
    FROM ‘
||tablename||‘_hist
    WHERE history_creation > ‘
||tstamp||
    AND id=’
||recid||
    GROUP BY id
  ) AS m ON (m.id = h.id AND m.mintstamp = h.history_creation);’
;

  FETCH curs INTO vid;

  RETURN vid;
END;
$$
LANGUAGE plpgsql;

– At this point you can just get ids like this
– records in client as of this time 2009-11-02 21:13:41.552601
SELECT select_hist_id(‘client’, id, ’2009-11-02 21:13:41.552601′::TIMESTAMP) FROM client;

I’m sure you can figure out the rest.

Be careful this is very slow on large data sets. If you are planning to work on millions of records then you should consider building a history lookup function for each table either defining data types for your RECORD or using OUT variables.

CREATE FUNCTION foo(recordid int, firstname OUT VARCHAR(10))
Tagged with:
Nov 02

I have been interviewing quite a few people recently for a position as Java developer.
Every single CV came with either a Struts or Spring mention. If not both. Oh yes, I have even discussed systems where both frameworks were used at the same time.

Of course it’s your system, you are free to use whichever technology you prefer. However, you must also be able to explain why you picked that particular framework. And “It wasn’t me who picked it” is not a good answer. If you had a better idea why didn’t you question the decision?

Perhaps my problem is not exactly with MVC frameworks but with the people out there who use them without thinking things through. These are the sort of people who go around telling you that they used both Spring and Struts because it’s an “Architectural pattern”. The truth is that they haven’t thought about their architecture at all. They just thought that by using an MVC framework the problem would solve itself.

Yesterday I was reading a post by Albert Wenger at Union Square Ventures about Loose Coupling. I’m not going to repeat here what Albert already explained quite thoroughly. I will just say I agree with everything he said. Especially with this paragraph:

So how does loose coupling enable scaling and innovation?  Scaling bottlenecks tend to move around as a site or service grows.  With loose coupling it is often possible to isolate a bottleneck and prevent it from slowing down everything else.

This also points to how loose coupling enables innovation.  Different teams can more easily be in charge of their own part and make changes to it more quickly.  Entirely new parts can be added more easily too (a great example here is Facebook adding chat).

MVC frameworks themselves do not prevent loose coupling as an architectural decision. The problem is that most of the people who use them are the sort of over-engeneering maniacs who try to build the perfect system in one big blob of code, and inescapably, fail.
So here’s my point. Using an MVC should not be considered an architectural decision. Architecture is at a much higher level.

Most architecture design nowadays show systems split horizontally.

TWING MVC structure

This may allow your application to support higher load, but doesn’t give you the ability to isolate bottlenecks in your application. I believe what we’ll see in the future is architecture diagrams that are split vertically with every single component (or task your app has to perform) being completely independent from the rest, and feel free to use the MVC of your choice in each component. Just make sure you know why you decided to use it.

Tagged with:
Nov 01

Not just because it’s vital for your business.
For most production systems the speed-bottleneck lies with accessing your data. Database are excellent for storing all your data and keeping it organised. However, when it comes to getting it out quickly, especially if you have lots of it, they are not the sharpest of tools.

That’s why I always try to press home the importance of keeping on top of your data.

Even though you think all your data is neatly organised in your perfectly structured database it keeps changing shape, or rather the understanding your database has of your data keeps changing.
For example in PostgreSQL the “shape” of your data is stored in a table called pg_statistics. Data is collected and stored there by analyze.

The query planner uses the data collected in pg_statistics to pick the most efficient way to run your queries. Unfortunately no system is perfect, even the best planner makes mistakes. You have to strike a balance between letting analyze collect as much data as possible to give your database a better understanding of your data and keeping it slim enough for it to be quick.

So as much as you can trust machines I suggest you try to keep on top of your data yourself.

There’s a couple of very simple ways to do that.

First. Keep an eye on your database logs, exactly like you do with your webserver logs. There’s a few open-source applications that can help you do that like Enterprise Postgres Query Analyser.
This will give you a basic understanding of which query/ies you will have to focus on.

Once you have an idea of which ones are the slowest queries you should keep an eye on the explain plan at regular intervals. In PostgreSQL I have a scheduled job that every day runs “EXPLAIN ANALYZE” on the heaviest queries in my system and compares the output with the previous day’s.

It’s a lot of work and you are probably better off confronting these problems as they come up and not waste valuable development time creating the most optimised database ever.
Setting up auto-vacuuming properly will keep you safe for a long while.

Tagged with:
preload preload preload