Sunday, September 23, 2012

Transactional Replication - Deep dive


Transactional Replication is most widely used form of SQL Server replication. It’s more full featured than snapshot replication but much easier to set up and manage than merge replication. There is common belief that merge replication is only replication type that supports bidirectional data replication. That’s not the case as both snapshot and transaction replication offer immediate and queued updating subscriptions.
SQL Server implements transactional replication via snapshot agent , log reader agent , and the distributor agent. The snapshot agent prepare the initial snapshot of a transactional publication. The log reader agent scans the T-log on the publisher and detects the changes made to the data after the snapshot has been taken and records them in the distribution database. The distribution agent reads the changes recorded in the distribution database and applies them to subscriber.
Each modification to a published table causes the Log Reader agent to write at least one row to MSrepl_commands. Unlike snapshot replication here data in MSrepl_commands is not stored in human readable form. You have to use sp_browsereplcmds.This sp returns readable version of replicated commands stored in distribution database.


The log reader agent uses the extended procedure sp_replcmds (implemented internally by SQL Server) to retrieve log records produced by DML statements in the published database's T-log. Each publisher database participating in transactional replication will have only one log reader agent regardless of howmany transactional publications it contains. Each database published using transactional replication has its own Log Reader Agent that runs on the Distributor and connects to the Publisher.
If one publication calls sp_replcmds for a db then log reader is considered for that database until it disconnects. Other publications attempting to run sp_replcmds before the first disconnects will get error stating that ' Another log reader is replicating the database'. One other reason that why only one log reader agent is permitted for each database is that scanning the log for changes can impact performance. Each time the log reader agent invokes sp_replcmds, it causes log reader code within the SQL Server process to scan the published database's transaction log for changes that need to be replicated. When it does this , the log reader agent changes the typical sequential method SQL Server uses to access the log into something more random. While the server writes new entries to the end of the T-log as changes are made to the database, the log reader agent may be reading a different section of log in order to write replication commands to MSrepl_commands and MSrepl_Transactions.This can cause resource contention for the T log and impact the performance of the server.
SQL Server maintains a global chache of article metadata known as the article cache. This cache stores metadata from sysarticles and syscolumns for the replicated article. SQL Server consults this chache when it requires metadata for a particular article.
Once the log reader agent finishes calling sp_replcmds and writing new entries to MSrepl_commands and MSrepl_Transactions, it calls sp_repldone to indicate that the specified log records have been successfully replicated. This allows SQL Server to purge log records as necessary. Log records for articles cannot be purged utill they are replicated to distributor.

 

No comments:

Post a Comment