Why Use Eventsourcing Database

Yurii Rashkovskii
Eventsourcing Publications
6 min readJun 16, 2016

--

It took me a good while to start writing this article. It’s the hardest of them all, explaining the concepts, reasons and benefits that are already clear to yourself to somebody who has little to no clue. I wanted to answer the main question I keep hearing:

“What’s in it for me?”

Alright. Software is not just eating the world. Open source is really no different, perhaps, even worse. It is now impractical to try to catch up on what’s possible and what’s available. The days of occassional freshmeating are long gone. A sane choice is to stick to your guns. Personally, I would really resist changing something as foundational as a database.

Ironically enough, I want you to try a new one. Perhaps not for a production-grade project (at the timing of writing, it’s still a new, unstable, buggy piece of software anyway!), but for a toy one.

Here’s why.

The “data” in “database” is misleading

It leads us into seeing the world as a continuously captured state, but that’s a very limited model. Imagine having a users table (almost every project has it!)

What can we derive from this?

We can figure out when the user has signed up (created_at). We can figure out when something about this user has changed (updated_at) — but not what. We can check user’s latest password (hashed_password). We can see what was the last IP address this user was seen from, and when did that happen (last_seen_at, last_ip). We can even manage user lockouts if somebody is trying to bruteforce the password (failed_logins, locked_at).

Nothing particularly wrong about this. It’s useful and usable. However, this works well only if we see the world as a static, and nearly-perfect phenomenon. That is, if business rules don’t change, bugs don’t happen and there’s only one right way to see the world — a perfect consensus.

But consensus is an expensive protocol. We must agree on everything, and then, every time something changes, we have to amend that agreement.

Lets assume we want to make failed login attempts tracking more sophisticated to improve our risk analysis and fraud prevention. If we want to be able to analyze different behaviours and patterns, it would help us if we stored every login attempt, successful or not, in a separate table…

Great. If we want to track all IP addresses the user was coming from and be able to see irregularities, we’d need to do a very similar table for tracking IP address changes. Same applies to email changes — what if we want to be able to see if the email was recently changed? A similar table will appear.

What do these tables have in common? They certainly feel similar… What I see is that all of them in fact store events that have occurred and we make decisions (business logic) based on that. What we end up with now is a number of tables that represent models (such as users, groups, items, etc.) and a number of tables that represent events.

What if we can represent models through events?

A lot of user activities are already represented through events, even if we don’t acknowledge this at first. If we look at the users table, we can see that it can also be seen as a partial projection of a few user-related events that have happened: user creation, user update. If we record the creation event (UserCreated) and update events (NameChanged, EmailChanged, PasswordChanged) instead of a record in the users table, we can still query the latest name, the latest email, the latest password without losing the history and the possibilities, just like we did failed login attempts and visits.

That is how models are typically constructed using Eventsourcing. By querying events we can figure out the state of the model:

Lets go a little further. Imagine what we’re developing has a financial component to it (otherwise, why all the fraud mitigation efforts?). Our first feature is to enable transfer of funds between users.

Following the classic double-entry bookkeeping system, we need to record two entries, one for debit and one for credit (it’s obvious enough that we need to record these event entries as opposed to an “account balance state” to begin with).

Now we can calculate each user’s balance. The problem, however, is that we can’t easily tell who transfered the money to whom. We can either compare recorded_at and amount or add a new field signifying the source of the transaction. Let’s say source_user_id. But what if later we decide there could be other sources of funds, for example, ATM deposits? Should we create a fictitious user to represent a deposit ATM subsystem? How would they get credited, NULL source_user_id? How do we know which ATM the cash was deposited at?

Causal information to the rescue

What if, instead of the above, we simply remember the original intent of the action that led to an entry in the ledger?

If a user wants to transfer money to another user, he or she sends a request (command) like TransferMoney(sender: me, recipient: theOtherUser, amount: N) that results in two entry events, UserDebited(user: me, amount: N) and UserCredited(user: theOtherUser, amount: N). This way, we still maintain the same double-entry ledger and we can go and lookup what caused these entries and can easily figure out what happened, provided we also store the commands (and that’s what Eventsourcing does, too)

If it is the cash deposit scenario, an ATM would send a command DepositCash(terminal: T, user: someUser, amount: N) which, in turn, will result in one entry event: UserCredited(user: someUser, amount: N). The subsystem that renders the balance, doesn’t need to know anything about how the credit happened, it just did. But other systems, for example, one that would track unusual cash deposit activities, would be able to analyze the information of interest at source.

There is no single version of the state

Just like we, humans, experience the world and reality in different ways, programs can see the reality through different prisms and there are little to no reasons why they shouldn’t be able to.

The aggregation of events that you can see today as a user, can be rearranged and augmented and seen as a fraud risk analysis report.

If we can ignore entries associated with certain kind of activities or date ranges, we can build “what-if” balances or balances “as of” with little to no effort. While this is a common and well accepted practice in accounting, this is typically not possible for other types of information. For example, can we tell what was user’s email yesterday, before an attacker took over the account?

I see it as a way to design applications without worrying about the future too much. Instead of trying to predict the future, go with the flow.

Eventsourcing is really young, but give it a try

I can’t stress enough just how young the project is. I do build real software on it, but that’s because I can usually quickly fix things. That said, it’s definitely usable for learning and experiments.

It’s getting more usable every day and it has formally adopted the C4 process that gives everybody a right to contribute, even if their contribution doesn’t fit my personal vision. That’s my goal, after all — I want to help building software that improves lives of other people, too.

A friend of mine said he understood the value of this approach but he can’t really use it as his stack is Ruby on Rails, not Java. And that’s a great piece of feedback. While the current implementation is an embeddable Java database, there are two things that can happen.

One can develop a server API for the Java version (unless I get to this issue sooner than anybody else). The other approach is to take our specifications and create a Ruby implementation. Growing the project is never easier.

Getting started is also quite easy!

Visit at eventsourcing.com

There are also slides with some more detailed information on how Eventsourcing works. Check them out!

--

--

Yurii Rashkovskii
Eventsourcing Publications

Tech entrepreneur, open source developer. Amateur runner, skier, cyclist, sailor.