The Many Faces of Database Replication

Thomas LaRock

(The following is an excerpt from "DBA Survivor: Become a Rock Star DBA", published by APress.)

You will often be involved in conversations about replication. My first word of caution is to make certain that everyone understand what type of replication is being discussed. You have two main types of replication to be concerned with: SQL replication and SAN replication. They are not the same thing-in fact they are very different, and can exist together or separately depending on your needs.

SAN replication is simply a way your shop has configured the SAN to accommodate for moving bits of data from point A to point B. This is done primarily for high availability (HA) purposes, but you will find someone who thinks it is also being done for DR purposes. Depending on how the SAN replication is configured, it most certainly is an HA option and is possibly a disaster recovey (DR) option. For example, if you have two sites and are doing SAN replication from site A to site B, then you certainly have an HA scenario configured. However, when an event happens that affects both site A and site B, both SANs will go down. Your HA is clearly going to be out of the water, and so will your DR unless you are also replicating to a third site.

I can hear people now saying that they can recover from tape, so technically they have a DR solution in place. And I will tell them the same thing I am telling you right now: then your tape backups are your DR solution, not your SAN replication. I hope that helps clear up the confusion.

As for SQL replication, you get your choice of flavors. Each one is unique from the rest, and each appropriate according to your particular needs. You should have a cursory knowledge of the different types and when it might make sense to deploy. Your environment and the nature of the requirements you are given will be your primary decision factors when selecting a mode of SQL replication.

Before we even get started, a quick overview of replication terminology is in order. Many of the terms used (publisher, subscriber, distributor) are used in the publishing industry, and thus the publishing industry is often used to describe replication concepts. So the term publisher means exactly what you would expect: someone who publishes information, like a newspaper. A distributor would be someone who distributes the newspapers to the people who are subscribers. Simple enough? Great, now let us have a look at the different types of SQL replication.

Transactional Replication

Transactional replication is when transactions committed at the publisher are then replicated and committed at the subscriber. This makes no sense in terms of newspapers or magazines, so think of this as if you were reading the online version of your local newspaper. If someone makes a change to the articles on the web site, you will see the changes the next time you refresh your browser.

When would you want to use this type of replication? Often the answer has to do with systems that have a high volume of transactions and the need for low latency, and the fact that one of your subscribers is a not a Microsoft SQL Server. Also, the subscribers in a transactional replication topology are usually considered to be read-only, but it is possible to allow for subscribers to make changes to data that are then pushed back to the publisher.

Transactional replication makes sense when you have a server that is not SQL Server, or when you have the need for more than one subscriber. If you only have one subscriber and you only have SQL Server database servers, then you should go with database mirroring instead. But take into consideration the
scope of the changes you want replicated. With database mirroring, all transactions are mirrored and thus replicated, but with transactional replication, only those transactions marked for replication will be replicated.

Merge Replication

This might be the biggest beast of all replication technologies. Changes at both the publisher and subscribers are merged together through the use of triggers. Think of a newspaper needing to publish the evening news that has ten reporters in the field updating stories. Now let's say that two reporters are covering the same story and try to publish different versions. Which version will get written? How would you resolve the conflicts?

Merge replication is not for the faint of heart. When this replication breaks, you had better have a method in place to put it back together again. There can be a lot of administrative overhead with merge replication. So why would anyone want to use this?

Most often I see it used when you have disconnected users from a central publisher. Often times you will see the example of having members of a sales team in the field visiting clients. Their mobile devices and laptops are not always connected, but they may be gathering data. When they do connect to the network, they want their data to update accordingly. Conflicts can (and will) arise if another sales representative tries to update the same data.

This replication topology should only be used if you have the need for many disconnected clients to connect back to the main office. If that is a requirement for your shop, then merge replication may be the answer. I would rather eat shards of broken glass than implement merge replication, but that's just me. Some people make a great living eating broken glass at carnivals. You should choose your own path in life.

Snapshot Replication

Snapshot replication requires less administrative overhead than the other two forms of replication. Just as it sounds, a snapshot of the database is taken and replicated as a whole to a subscriber. This is very useful when you have data that changes infrequently.

Perhaps you might need to update your list of available products to your sales team a few times a year. When the sales representatives connect to your network, a snapshot of your data can be pushed out to them so they see all the new products.

Be careful using snapshot replication on very large data sets. You can easily cause network issues if you were to decide to push a 100GB snapshot out to 100 subscribers in the middle of the day.

If you do not have the multi-server platform requirements that would force you into transactional replication, and if you do not want the hassles of merge replication, then snapshot replication may be the right solution for your shop. While snapshot replication is the easiest type to manage, there's another method of moving data around that requires even less work and time.

Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.