The Virtual Quill

Distributed and replicated IDaaS

In the last issue I waxed nostalgic about our discussions of directory services from twelve years ago, and why the directory needed to be both pervasive and ubiquitous. In today’s world it’s Identity Services as well as all other cloud services that needs to be, as we said, “available anywhere and every time we want to use it” as well as “available everywhere and any time we want to use it.”

In order to be pervasive and ubiquitous, cloud-based Services, including Identity, will need the ability to be distributed, replicated and partitioned.

The Services need to be replicated first and foremost for fault tolerance. If there's only one copy of the data, on one server, then the data is only available as long as that hardware is available. Any hardware or network failure could disable the system rendering the ability of users to access cloud-based services impossible. That's neither ubiquitous nor pervasive.

Replicating the data also helps balance the load on any particular hardware platform, but the mechanism of replication needs to be carefully drawn so that bandwidth is properly used. After replicas are initially moved to a platform, only data changes should be sent to the copies. The finer grained, the better - sending only a changed attribute is better than sending the entire object/attribute combination but that, in turn, is better than sending entire containers, branches or trees.

For Identity services, the ability to be replicated could be handled by a catalog service which periodically published a static listing to other platforms, while maintaining a single, changeable version. While this is less fault tolerant than having multiple read-write copies of the data itself, it does at least maintain multiple copies of the data which allows for reconstruction in case of disaster, a form of fault tolerance. But because a static catalog of the data is only synchronized at the moment the catalog is created, and immediately begins to become progressively less accurate as time goes by until the next synchronization, it does not satisfy the pervasive and ubiquitous criteria.

A distributed system, however, can be considered accurate because all of its replicas are synchronized as often as is needed to insure that whichever copy is read contains up-to-date information. The distributed system should also allow (but not require) that all replicas could be written to as well as read and would synchronize information written to any writable replica with all other replicas - in other words, there should be no requirement to choose one copy of the identity datastore as a "master" or sole authoritative source of data.

Finally, there’s partitioning. But that’s a topic for another post.