Tag Archives: Memory

caching time series – continued

Last week I wrote a rather long post asking for suggestions on how to cache and access a time series stored in an array in memory.

After some more research I have decided that the solution proposed by Uzair in his comment to the original post with some minor tweaks.

My originial idea involved creating a map in memory with one entry for each point in my series and then lookup the item requested in there.
This is very efficient when requests come in for a time which is in my index, if the point is not in there I’d have to run a binary search on the whole index. Which as Anon pointed out in his comment is still very fast.

if your ordered index is in-memory.

even if you had 1TB of memory (quite a lot), and each memory access took you 1 microsecond (slow),

each search would only take worse case of 40 accesses (2^40 = 1 trillion)
which would be 40 microseconds. this is not very long at all.

assuming some more reasonable ball-park figures of 4GB of memory, and
200 nanosecond access times, gives you

32 accesses * 200 nanos = 6 microseconds

this is over 150,000 searches per second.

The index will be a map of maps. Basically I’ll distribute a number of entries in my index at regular intervals. Each entry will contain a map of entries that preceded it in my time series.
Since I will know the first and last point in the series and the distribution of my regular entries *z* when a request *x* comes in I will be able to work out where is my next regular index entry *y*)

y = x – mod(x/z) + z

Once I have retrieved the map contained in my regular entry in the index I can proceed and lookup the value requested in there. If it’s not a point in my map I will still have to do a binary search but on a much smaller dataset – which I should be able to work on locally without having to rely on memcached over the network. The only “downside” of this solution is that I’ll have to make at least 2 lookups in my index.

Tagged , , , , , ,

Get every new post delivered to your Inbox.