Archive for the ‘Miscellaneous’ Category

In-memory databases

Thursday, May 27th, 2010

There’s been a recent rise in interest in “in-memory databases”. The reasoning given is that the cost of synching commits to disk is high, and this is the bottleneck in write operations; ACID databases require that a commit is confirmed written to disk – which often actually requires two or more disk writes, each with a seek penalty of a few milliseconds. Therefore, on-disk databases struggle to commit more than a few hundred updates per second, unless you invest in very expensively large RAID stripe sets.

Reads aren’t an issue, as every disk-based database does caching in memory. If your database is large enough to fit in memory, or access to it is mainly concentrated on a subset that’s small enough to fit in memory, reads are just as fast as any in-memory database. It’s writes that are the issue, and an in-memory database can update records very
quickly indeed.

However, in-memory databases suffer a downside: if you reboot the server for any reason, everything is lost. Therefore they often offer the facility to snapshot the state to disk periodically, and to restart from a saved disk snapshot; in the event of failure, only recent updates are lost. Some go further and offer the ability to log updates since the last snapshot to a file, too, so that they can be replayed on top of the snapshot. People who cannot afford to lose an update, ever, can even request that the log file is synchronised after each update is logged to it, producing the reliability levels of a disk-based database – and the same performance, because you now have a disk-based database, after all.

While at the other end of the spectrum, many disk-based databases offer in-memory tables for non-critical data.

In other words, both sides have come full circle and ended up really being indistinguishable; disk-based databases sometimes offer in-memory tables, while in-memory databases sometimes offer fully durable updates. The technologies aren’t so inherently different, despite what some of the recent hype might suggest. It’s not quite right to think of database products as being “in-memory” versus “disk-based”; it’s more a distinction that applies to individual tables. It’s note quite accurate to call Redis an in-memory database and MySQL a disk-based database…

GenieDB developer mentioned in The Times Online

Wednesday, March 24th, 2010

From Times Online, March 24, 2010: No techs please, we’re British

Priya is a member of our development team. She’s professional, skilled, great to work with, and as far as we can tell, shares our enthusiasm for our work; and her gender has no bearing on her professional life whatsoever. The fact that she’s good with people may or may not be an attribute of her gender – some say women are more sociable, although whether that’s really true or not, and if so, whether it’s genetic or just social conditioning, is not something I’m sure has been settled – but that’s irrelevant; we took the fact that she’s good with people into account when deciding who to hire out of our selection of candidates – regardless of where this attribute comes from.

The UK technology community is undeniably male-dominated, which is a shame – because people are missing out on excellent careers in technology (and it is a very rewarding field to work in) just because of their gender.

NoSQL vs. SQL

Monday, February 1st, 2010

Your humble author was pressed, by persons who shall remain nameless, to face his nervousness about public speaking and give a five-minute lightning talk on the NoSQL movement at CloudCamp London January 2010.

Unfortunately, the event was filmed.

Further material may appear in future on my SkillsMatter profile or my CloudBook profile.

Why the most ‘alternatively dressed‘ (and certainly not the most attractive) member of the team should be asked to become the public face of the company is anyone’s guess; less technical materials will probably appear at GenieDB’s CloudBook profile.

Distributed Systems

Wednesday, August 19th, 2009

As we are a company producing a distributed database, it should be no surprise that we’re big fans of distributed systems.

“Distributed software”, in practice, means software that runs on multiple physical machines, connected by a network. This has many benefits to the user; generally, there is little or no dependency on central points of failure, meaning that the system can continue to operate (at least partially) in the event of failure of any given machine or network link.

But we like to eat our own dogfood, which is why we use git as our version control system, and are currently setting up a VPN with n2n for people who are working from home to connect securely.

(more…)