About myself and everyone else.
I’ve worked quite a bit with replication and I decided to name some of the tools in replication that is useful.
I’ve worked with transactional , peer-to-peer and transactional replication with updatable subscriptions.
In almost all of the scenarios I’ve worked with remote distributors for production systems. Non production I don’t mind to use local distribution.
The replication was setup was done on OLTP databases as the publisher with a 2 day retention period.
Here are a few things to keep an eye on.
Only publish columns you need. There is a bit of data involved in replication and in order to speed things up you can only publish the columns you need. In some instances you maybe have a flag to toggle a record on the publisher and every update to this column will fire a replication transaction. If you have a high volume of transactions you could be doing unnecessary work.
Log reader – The log reader can read up to 10000 transactions per second. I’ve need seen the log reader unable to cope. I have seen the log reader reading for a long time before it finds any data. This happens due to large logging operations on the publisher. For instance. Rebuilding an index in the full recovery model will cause a bit of logging. I you haven’t learned to switch to bulk mode for this type of operation I suggest you head on over to K. Trip’s website. It’s in my links. Due to this large volume of entries in the log the reader my scan large amounts of log entries without finding anything for replication. The default scan size is relatively small . This can be increased for the log agent in order to prevent replication struggling with a big log.
Distribution Agent – There is some fun to be had here and you can really configure this agent to your needs. For instance. If your subscriber is being queries all the time you might prefer to do constant replication, i.e. a large polling interval. If you want constant up to date transactions you can lower the polling interval. You can set the commit batch size in order to commit larger or small batches. For systems without FK’s on the subscriber I can also suggest to use multiple subscriber streams. This only really works when you don’t require a specific sequence for the data to be delivered. High latency systems – On these you may prefer to user larger batches due to a reduced number of calls between distributor and subscriber.
Cleanup Agent – The cleanup code is not up to par for Microsoft if you ask me. The cleanup agent deletes in batches of 2k as far as I can remember. It is the most intense IO process for replication. I suggest you schedule this to times when you can afford the IO overhead. The delete for this will cause modifications to the clustered index on msrepl_commands and msrepl_transacions in the distribution database. This delete can adversely affect insert operations (log reader) . If you can afford to run this during quiet times schedule it that way. If you can’t run it after hours, i.e. 24/7 environment you can increase the schedule to 15 mins. This will delete data based on your retention period that has fallen outside the window.
Snapshot – I’ve never had a need to modify this however I know you can increase the number of bcp threads here which should make snapshot generation a bit faster. There is a few cases where this will have good results. You can read further on remarks for the agent in BOL.
Jobs I disable – Replication agent checkup, Replication monitor refresher. These jobs in my opinion do work that is not required. Note some items might not work correctly or as correctly as expected. Such as replication monitor. The checkup is also required to check for stale agent. The other job which I didn’t mention is the mark validation as inactive or expiration job. I also don’t enable this. I usually delete the jobs to avoid confusion.
Note the replication monitor will still work accurately enough in order to use it even after deleting the jobs.
Sysarticles , Sysarticlecolumns
msdistribution_status – this is a view that requires a special mention. This view calculates a whole bunch of valid goodies, but also some goodies which takes a whole lot of processing which I don’t think is needed. For instance it calculates both the pending commands and the commands delivered . In order to do this it used the xact_seqno and then look in the msrepl_commands and transaction table and count rows greater than the seqno and less than. The problem with this is that if you have a large distributor, say 500gb.Those two tables being the core tables of distribution information will account for 90% of the 500gb. THis means that you will spend considerable IO finding out how many commands have been delivered. I don’t want to know how many were delivered, I just want to know how many outstanding. SO this view is not great for that. W
The solution is to follow the same logic than the view except you only calculate the pending commands. This reduce the execution time significantly. If I remember correctly you can also add an index on artid in msrepl_commands. But you need to know you might fall out of SLA and support if you modify MS tables and you then experience issues.
If you run with error profiles such as skipping errors 20598 (row not found). This will be costly because the distribution agent will try to apply the transaction at the subscriber. If the error is encountered it will write the error back into the distribution db, it logs the error number , seqno and other detail in msrepl_errors. If there is a latency between your subscriber and distributor you can imagine this would delay commands being delivered.
Either fix the data in order for you to run the normal (default) profile and not the skip errors (continue with data consistency errors).
The other fix is going into your subscribers stored procedures. You can do this by using the system view sys.sql_modules. You can identify the delete and update procedures for replication because if you haven’t named them something other than the default, they would look like this ms_upd; ms_del. Now that you have the procedures you can modify them and take out the last bit where the error is trapped. It will look something like this. if @@rowcount = 0 raiserror.You can comment this out and put as note as to why. This will no longer throw any errors and replication will continue to run smoothly.
A note on this; if you do anything such as modify the article on the publisher or anything that will call a sp_changearticle, this will reset the proc on the subscriber and you will have to modify and comment out the error handling again.
If you use change tracking you might also see the log reader agent in the jobs. Change tracking feature is a great new way to track changes on the db which you can use to write your own replication. It employes the log reader which is super fast and you can do your own logic from there onwards. You could replace replication parts such as cleanup and distribution. It’s a great new area to be creative with. Between change tracking and service broker. wow. You can write something far superior to replication but it depends what you are into 🙂
If you have a specific replication query feel free to comment or email me.