est Practices for Virtualizing Active Directory Domain Controllers (AD DC)

Best Practices for Virtualizing Active Directory Domain Controllers (AD DC)

By Matt Liebowitz
Global Multi-Cloud Infrastructure Discipline Lead, Dell Technologies Consulting Services 

Virtualized Active Directory is ready for Primetime, Part II!

In the first
of this two-part blog series, I discussed how virtualization-first is
the new normal and fully supported; and elaborated on best practices for
Active Directory availability, achieving integrity in virtual
environments, and making AD confidential and tamper-proof.

In this second installment, I’ll discuss the elements of time
in Active Directory, touch on replication, latency and convergence; the
preventing and mediating lingering objects, cloning and of much
relevance, preparedness for Disaster Recovery.

Proper Time with Virtualized Active Directory Domain Controllers (AD DC)

Time in virtual machines can easily drift if they are not receiving
constant and consistent time cycles. Windows operating systems keep time
based on interrupt timers set by CPU clock cycles. In a VMware ESXi host with multiple virtual machines, CPU cycles are not allocated to idle virtual machines.

To plan for an Active Directory
implementation, you must carefully consider the most effective way of
providing accurate time to domain controllers and understand the
relationship between the time source used by clients, member servers,
and domain controllers.

The Domain Controller with the PDC Emulator role for the forest root domain
ultimately becomes the “master” timeserver for the forest – the root
time server for synchronizing the clocks of all Windows computers in the forest.
You can configure the PDC to use an external source to set its time. By
modifying the defaults of this domain controller’s role to synchronize
with an alternative external stratum 1 time source, you can ensure that
all other DCs and workstations within the domain are accurate.

Why Time Synchronization Is Important in Active Directory

Every domain-joined device is affected by time!

Ideally, all computer clocks in an AD DS domain are synchronized with
the time of an authoritative computer. Many factors can affect time
synchronization on a network. The following factors often affect the
accuracy of synchronization in AD DS:

  • Network conditions
  • The accuracy of the computer’s hardware clock
  • The amount of CPU and network resources available to the Windows Time service

Prior to Windows Server 2016, the W32Time service was not designed to
meet time-sensitive application needs. Updates to Windows Server 2016
allow you to implement a solution for 1ms accuracy in your domain.

Figure 1: How Time Synchronization Works in Virtualized Environments

See Microsoft’s How the Windows Time Service Works for more information.

How Synchronization Works in Virtualized Environments

An AD DS forest has a predetermined time synchronization hierarchy.
The Windows Time service synchronizes time between computers within the
hierarchy, with the most accurate reference clocks at the top. If more
than one time source is configured on a computer, Windows Time uses NTP
algorithms to select the best time source from the configured sources
based on the computer’s ability to synchronize with that time source.
The Windows Time service does not support network synchronization from
broadcast or multicast peers.

Replication, Latency and Convergence

Eventually, changes must converge in a multi-master replication model…

The Active Directory database is replicated between domain
controllers. The data replicated between controllers called ‘data’ are
also called ‘naming context.’ Only the changes are replicated, once a
domain controller has been established. Active Directory uses a
multi-master model; changes can be made on any controller and the
changes are sent to all other controllers. The replication path in
Active Directory forms a ring which adds reliability to the replication.

Latency is the required time for all updates to be completed throughout all domain controllers on the network domain or forest.

Convergence is the state at which all domain controllers have the same replica contents of the Active Directory database.

Figure 2: How Active Directory Replication Works

For more information on Replication, Latency and Convergence, see Microsoft’s Detecting and Avoiding Replication Latency.”

Preventing and Remediating Lingering Objects

Don’t revert to snapshot or restore backups beyond the TSL.

Lingering objects
are objects in Active Directory that have been created, replicated,
deleted, and then garbage collected on at least the Domain Controller
that originated the deletion but still exist as live objects on one or
more DCs in the same forest. Lingering object removal has traditionally
required lengthy cleanup sessions using various tools, such as the Lingering Objects Liquidator (LoL).

Dominant Causes of Lingering Objects

  1. Long-term replication failures

While knowledge of creates and modifies are persisted in Active
Directory forever, replication partners must inbound replicate knowledge
of deleted objects within a rolling Tombstone Lifetime (TSL) # of days
(default 60 or 180 days depending on what OS version created your AD
forest). For this reason, it’s important to keep your DCs online and
replicating all partitions between all partners within a rolling TSL #
of days. Tools like REPADMIN /SHOWREPL * /CSV, REPADMIN /REPLSUM and
AD Replication Status should be used to continually identify and
resolve replication errors in your AD forest.

  1. Time jumps

System time jump
more than TSL # of days in the past or future can cause deleted objects
to be prematurely garbage collected before all DCs have inbound
replicated knowledge of all deletes. The protection against this is to
ensure that:

  • The forest root PDC is continually configured with a reference time source (including following FSMO transfers).
  • All other DCs in the forest are configured to use NT5DS hierarchy.
  • Time rollback and roll-forward protection has been enabled via the
    maxnegphasecorrection and maxposphasecorrection registry settings or
    their policy-based equivalents.
  • The importance of configuring safeguards can’t be stressed enough.
  1. USN rollbacks

USN rollbacks
are caused when the contents of an Active Directory database move back
in time via an unsupported restore. Root causes for USN Rollbacks
include:

  • Manually copying previous version of the database into place when the DC is offline.
  • P2V conversions in multi-domain forests.
  • Snapshot restores of physical and especially virtual DCs. For
    virtual environments, both the virtual host environment AND the
    underlying guest DCs should be compatible with VM Generation ID. Windows
    Server 2012 or later, and vSphere 5.0 Update 2 or later, support this
    feature.
  • Events, errors and symptoms that indicate you have lingering objects.

Figure 3: USN Rollbacks – How Snapshots Can Wreak Havoc on Active Directory

Cloning

You should always use a test environment before deploying the clones to your organization’s network.

DC Cloning enables fast, safer Domain Controller provisioning through clone operation.

When you create the first domain controller in your organization, you
are also creating the first domain, the first forest, and the first
site. It is the domain controller, through group policy, that manages
the collection of resources, computers, and user accounts in your
organization.

Active Directory Disaster Recovery Plan: It’s a Must

Build, test, and maintain an Active Directory Disaster Recovery Plan!

AD is indisputably one of an organization’s most critical pieces of
software plumbing and in the event of a catastrophe – the loss of a
domain or forest – its recovery is a monumental task. You can use Site Recovery to create a disaster recovery plan for Active Directory.

Microsoft Active Directory Disaster Recovery Plan
is an extensive document; a set of high-level procedures and guidelines
that must be extensively customized for your environment and serves as a
vital point of reference when determining root cause and how to proceed
with recovery with Microsoft Support.

Summary

There are several excellent reasons for virtualizing Windows Active
Directory. The release of Windows Server 2012 and its
virtualization-safe features and support for rapid domain controller
deployment alleviates many of the legitimate concerns that
administrators have about virtualizing AD DS. VMware® vSphere® and our recommended best practices also help achieve 100 percent virtualization of AD DS.

Please reach out to your Dell EMC representative or checkout Dell EMC Consulting Services to learn how we can help you with virtualizing AD DS or leave me a comment below and I’ll be happy to respond back to you.

Sources

Virtualizing a Windows Active Directory Domain Infrastructure