In traipsing around the country giving my "Unlocking
Directory Services" seminar the past few weeks [fall of 2000], I was
struck by the number of times someone challenged (or, perhaps, just asked)
about my assertion that a well-designed directory service needs to be capable
of being distributed, replicated and partitioned. If my live audience
questioned this, perhaps you too have some reservations, so for the next few
issues [consolidated here] I'll put forward my reasoning.
First, though two more basic concepts. To be useful, the service
must be pervasive and ubiquitous. Pervasive, meaning its available anywhere and
every time we want to use it; ubiquitous, meaning its available everywhere and
any time we want to use it. For an application or device (or even a user, for
that matter) to be identity-enabled it has to be able to rely on the information
being present when, and where, needed.
It follows, then, that the service needs to be
distributable, replicatable and partitionable. First, we'll look at
replication.
The identity service needs to be replicated first and
foremost for fault tolerance. If there's only one copy of the data, on one
server, then the data is only available as long as that hardware is available.
That's neither ubiquitous nor pervasive.
Replicating the data also helps balance the load on any
particular hardware platform, but the mechanism of replication needs to be
carefully drawn so that bandwidth is properly used. After replicas are
initially moved to a platform, only data changes should be sent to the copies.
The finer grained, the better - sending only a changed attribute is better than
sending the entire object/attribute combination but that, in turn, is better
than sending entire containers, branches or trees.
The ability to be replicated could be handled by a catalog
service which periodically published a static listing of the identity data to
other platforms, while maintaining a single, changeable version of that
information. While this is less fault tolerant than having multiple copies of
the information itself, it does maintain multiple copies of the data which
allows for reconstruction of the identity service in case of disaster, a form
of fault tolerance.
But because a static catalog of the data is only
synchronized with the service at the moment the catalog is created, and
immediately begins to become progressively less accurate as time goes by until
the next synchronization, it does not satisfy the pervasive and ubiquitous
criteria.
A distributed identity service, however, can be considered
accurate because all of its replicas are synchronized as often as is needed to
insure that whichever copy is read contains up-to-date information.
A distributed identity service should also allow (but not
require) that all replicas could be written to as well as read and would
synchronize information written to any writable replica with all other replicas
- in other words, there should be no requirement to choose one copy of the service
as a "master" or sole authoritative source of data.
If we're going to maintain multiple replicas of the identity
service, and if we're going to allow changes from multiple replicas which must
then be synchronized to all other replicas then we're going to create quite a
bit of network traffic. Add to that the sheer quantity of data which today's
(but especially tomorrow's) identity-enabled applications and devices will be
placing in the service’s datastore and you can see that all this replication
and distribution could take up a huge amount of bandwidth.
One solution is partitioning - breaking up the identity
service into manageable parts. Then, by a well-designed placement of replicas
of the partitions you can insure that data is both physically and logically
stored near to the point it will be used while still minimizing the amount of
traffic on the network necessary for synchronizing the data.
Also, because the identity service is distributed as well as
partitioned, you can view the entire information tree as if it were stored in
one place - even if there is no physical copy of the entire tree. And, of
course, you can see this (and so can the identity-enabled apps and devices)
from anywhere in the network because now your identity service is pervasive and
ubiquitous.