The Virtual Quill

Properties necessary for an IdP and an AP

In reviewing some early Directory Service newsletters, I came across a series of three defining necessary qualities of a DS. But they're also necessary qualities of an Identity Service (as offered by an Identity Provider - IdP) and an Attribute Service (as offered by an Attribute Provider - AP). I've updated them a bit (mostly for terminology) but the originals are here, here, and here. Enjoy!

In traipsing around the country giving my "Unlocking Directory Services" seminar the past few weeks [fall of 2000], I was struck by the number of times someone challenged (or, perhaps, just asked) about my assertion that a well-designed directory service needs to be capable of being distributed, replicated and partitioned. If my live audience questioned this, perhaps you too have some reservations, so for the next few issues [consolidated here] I'll put forward my reasoning.

First, though two more basic concepts. To be useful, the service must be pervasive and ubiquitous. Pervasive, meaning its available anywhere and every time we want to use it; ubiquitous, meaning its available everywhere and any time we want to use it. For an application or device (or even a user, for that matter) to be identity-enabled it has to be able to rely on the information being present when, and where, needed.

It follows, then, that the service needs to be distributable, replicatable and partitionable. First, we'll look at replication.

The identity service needs to be replicated first and foremost for fault tolerance. If there's only one copy of the data, on one server, then the data is only available as long as that hardware is available. That's neither ubiquitous nor pervasive.

Replicating the data also helps balance the load on any particular hardware platform, but the mechanism of replication needs to be carefully drawn so that bandwidth is properly used. After replicas are initially moved to a platform, only data changes should be sent to the copies. The finer grained, the better - sending only a changed attribute is better than sending the entire object/attribute combination but that, in turn, is better than sending entire containers, branches or trees.

The ability to be replicated could be handled by a catalog service which periodically published a static listing of the identity data to other platforms, while maintaining a single, changeable version of that information. While this is less fault tolerant than having multiple copies of the information itself, it does maintain multiple copies of the data which allows for reconstruction of the identity service in case of disaster, a form of fault tolerance.

But because a static catalog of the data is only synchronized with the service at the moment the catalog is created, and immediately begins to become progressively less accurate as time goes by until the next synchronization, it does not satisfy the pervasive and ubiquitous criteria.

A distributed identity service, however, can be considered accurate because all of its replicas are synchronized as often as is needed to insure that whichever copy is read contains up-to-date information.

A distributed identity service should also allow (but not require) that all replicas could be written to as well as read and would synchronize information written to any writable replica with all other replicas - in other words, there should be no requirement to choose one copy of the service as a "master" or sole authoritative source of data.

If we're going to maintain multiple replicas of the identity service, and if we're going to allow changes from multiple replicas which must then be synchronized to all other replicas then we're going to create quite a bit of network traffic. Add to that the sheer quantity of data which today's (but especially tomorrow's) identity-enabled applications and devices will be placing in the service’s datastore and you can see that all this replication and distribution could take up a huge amount of bandwidth.

One solution is partitioning - breaking up the identity service into manageable parts. Then, by a well-designed placement of replicas of the partitions you can insure that data is both physically and logically stored near to the point it will be used while still minimizing the amount of traffic on the network necessary for synchronizing the data.

Also, because the identity service is distributed as well as partitioned, you can view the entire information tree as if it were stored in one place - even if there is no physical copy of the entire tree. And, of course, you can see this (and so can the identity-enabled apps and devices) from anywhere in the network because now your identity service is pervasive and ubiquitous.