I’ve been recently investigating on what it’d take to host a huge
ly successful website. For example, something like MySpace.com.
Good example according to its 27.4 billion page-views just during the month of April 2006 and to its performances notwithstanding that amount of traffic. And bad example for its recent problems and down-times.
MySpace is one of the most popular websites on the net, in just a few years it has become so famous that it is frankly difficult to imagine how a company can possibly keep up with its growth when it’s happening at such a fast pace.
Not a long time ago we’ve all seen the site fall miserably to the ground for what it was alleged to be a simple power failure in its server farm. Let me repeat this again because this is a point which ought to be stressed. Its server farm, singular.
I can’t believe that.
And it is in fact not true, as to the sixth of april this was MySpace’s extensive IT architecture: 2,682 Web servers, 90 Cache servers with 16GB RAM, 450 Dart Servers, 60 database servers, 150 media processing servers, 1,000 disks in a SAN (storage area network) deployment, three data centers and 17,000MB per second of bandwidth throughput.
Other still impressive numbers about MySpace.com are the amount of space it sets aside for MP3s and videos: 100TB. And the additional 200TB for dynamic content.
Three data centers. So what was alleged to be a simple power failure was clearly something more. Unless those three data centers remained power-less at the same time. For a long time.
Nevertheless MySpace.com deserves our admiration. It is, after all, a network that is, and has been, growing beyond any possible forecast. Roughly 250.000 new users per day.
Those failure are not, however, what you’d expect from a giant, a monster of the net.
Geographical mirroring is one of the fundamentals things in hosting a successful website. if you really want to ensure to all your customers the constant availability of your website you have better give it some serious thoughts.
Of course this feature feature does not come free nor it is easy to implement, the content of your website needs to be replicated in real time (as often as possible, as it happens) between all the mirrors you have scattered across the earth.
Even this would-be-bullet-proof solution, however, has its weaknesses. With the current DNS it is, in fact, impossible to send a client directly to the relevant mirror. All the connections have to reach/â€strikeâ€, the same location (Please correct me if I’m wrong here). Ideally just a load balancer/router configured to forward the requests to the designated mirror. Even this solution (involving more than one load balancer or router, ideally an indefinite number of backup machines) can sometimes prove itself unreliable.
Another factor to consider when creating a geographically separated mirror of your website is that you’ll need (and you’ll pay for it, to be sure) more bandwidth that you’d normally use. A large part to serve the website to your users, the rest to replicate it through all the mirrors.
You do not only need your content to be replicated all over the world to host flourishing websites, you also need a massive “fire power” in each and every one of these locations.
There are two possible ways to go at this. The first is “develop as fast as possible an application that is not scalable and then, if your business is successful, bank on massive computational power” and the second “Create a scalable software so that when the time comes you’ll rise up to the occasion without incurring in excessive hardware costs”.
Both creed works, because ideally, even if your application is not scalable, you’ll have enough success and enough money saved from the development to fix the pieces that need mending, thus effectively switching to plan B. I’ve talked about this in one of my previous posts (Critical Code Mass) and I’m certainly not going to repeat myself here.
The former approach saves you some costs at the beginning but might prove itself more expensive than the latter in an advanced stage. What most entrepreneurs would say is, as might be expected, lets pick the first one and hope to sell the entire enterprise before it actually gets to the critical point.
Unfortunately few of the buyers feel that way and have the annoying habit of putting your code through a due diligence process.
Enough already, lets get back to the main subject.
We covered GM (Geographical mirroring), but, as I said a couple of mouthfuls before. GM is not enough. You also need great deal of power behind each node.
You would, naturally, put a load balancer and an indefinite number of web servers in each mirror. And you’d discover at your expenses that it is still not enough. You’d soon run into storage performance problems.
Once you’ve reached that stage, the solution is to Switch to a SAN (Storage Area Network) architecture.
And now, after you’ve distributed your data over multiple servers in a SAN, you feel safe, you finally see the light at the end of the tunnel… Wishful thinking… You are wrong.
What you are hosing is a website, and, as such, it is bound to create troublesome “hot spots” on the storage system associated with the most frequented areas of the website.
The next natural step in this upward spiral is the virtualization of your storage space. This means that more than one server will be providing the same content and they will appear as a single storage device to their client.
This can be, as for all the previously mentioned technologies, achieved through hardware or software means. The most famous of which is the InForm® Operating System, owned by 3PAR. The firm which, indeed, serves MySpace.com (both hardware and software solution).
At this point you might argue that I’ve discussed only of storage solution and haven’t said a word about web-servers and their limits. I decided not to do so because, regrettably, or luckily if you’re not the kind of geek who appreciates the implementation of big and complicated networks, there’s no final or more comfortable solution. You just need to buy more bandwidth, optimize you code and queries, and of course, buy more servers to put behind the load balancers in each and everyone of your mirrors.
Now then, you might think we’ve covered allot of possible solutions. In fact, we’ve considered almost everything about MySpace’s architecture. Even so, this is just the beginning if you think about it.
My space is, as it happens, not yet providing its services to China, but looking at the feasibility of the venture.
Now that is a challenge, a challenge that I don’t feel like undertaking (even just in the form of a blog post, and it’s also too late). I’ll let MySpace.com do it and I’ll be following their progress and their architecture adjustments closely.
I’m sorry if I did make any mistake or if this article is in any way inexact, I’m not big as a technician and I don’t pretend to be an expert on net/hosting-matters.
Oh, I’m also sorry for writing such long articles but I can’t help myself, I like writing I still find this sort of things fascinating.
Take it easy,
Steph