Distributed Locks with Consul and Golang

Maciej WinnickiMaciej Winnicki · December 28, 2016 · 4 minutes read

Consul is a powerful tool for service discovery and configuration management. One of the lesser known features is Sessions mechanism, which is useful for building distributed systems. In combination with the Key/Value store it allows creation of distributed locks.

What is distributed lock?

Distributed lock is a mechanism for controlling access to a shared resource in a mutually exclusive way while it is considered distributed because it coordinates processes running on a multiple servers.

Webhook diagram

For example, we are building a horizontally scalable webhook server. There is a queue with messages that are pushed to webhook receivers by emitters, as we do not want to flood receiving servers there can be only one emitter sending messages to a particular receiver at any given time. The receiver is a shared resource, and the emitter, before pushing a message, needs to lock the receiver to prevent another emitter pushing a message to the target receiver.

During normal operations the emitter acquires a lock, sends a message, and releases the lock. What happens when emitter exits (because of an underlying node failure) before releasing the lock, is the receiver locked indefinitely? Using a distributed lock prevents such situations, failure detection mechanisms will release locks owned by failed processes.

Consul acts as a distributed lock manager. It provides a way to acquire locks on entries in a Key/Value store and handles failure detection. Before the locking process needs to create a session.

Sessions

A session is a link between a remote process and a Consul node. It is created by a remote process and can be invalidated explicitly or as a result of the health check mechanism. Depending on session configuration, locks created with the session that has been invalidated are either destroyed or released.

Health checks

Consul supports a few kinds of checks (like HTTP, TCP, etc). During session creation a list of health checks can be defined. Those checks will be used to determine if the session needs to be invalidated.

Let us say that a remote process is a part of web server. We can define a health check that expects 200 OK responses returned by that web server. When web server stops responding, the health check goes to the critical state and the session created by the remote process is invalidated. By default, Consul uses the serfHealth check, and will invalidate sessions created on a Consul node that have failed.

TTL

In addition to health checks, sessions also have built-in support for TTL. When the TTL expires the session is invalidated. The remote process is responsible for renewing the session before the TTL expires.

Golang API

Consul API client provides a handy abstraction on top of Sessions and the K/V store. There is a Lock struct with Lock, Unlock and Destroy methods. There are also helper methods for creating a Lock instance. The API client also takes care of renewing sessions.

Creating the Consul client

client, err := api.NewClient(&api.Config{Address: "127.0.0.1:8500"})

Creating Lock instance

LockOpts accepts LockOptions struct as an argument. LockOptions is a container for all possible options and can be used for setting key and value, customising sessions, or setting TTL.

opts := &api.LockOptions{
  Key:        "webhook_receiver/1",
  Value:      []byte("set by sender 1"),
  SessionTTL: "10s",
  SessionOpts: &api.SessionEntry{
    Checks:   []string{"check1", "check2"},
    Behavior: "release",
  },
}
lock, err := client.LockOpts(opts)

Another helper method is LockKey. It creates a Lock with all options set to default except entry name.

lock, err := client.LockKey("webhook_receiver/1")

Acquiring lock

Both helper methods return a handle to a Lock struct that is used for acquiring the lock. Lock method is thoroughly described in the documentation:

Lock attempts to acquire the lock and blocks while doing so. Providing a non-nil stopCh can be used to abort the lock attempt. Returns a channel that is closed if our lock is lost or an error. This channel could be closed at any time due to session invalidation, communication errors, operator intervention, etc. It is NOT safe to assume that the lock is held until Unlock() unless the Session is specifically created without any associated health checks. By default Consul sessions prefer liveness over safety and an application must be able to handle the lock being lost.

It is important to use the returned channel for the cancelling processing when the lock is lost. Taking that into consideration, let us review code that will acquire a lock, emit a message to receiver (with cancellation signaled by the lock channel) and release the lock.

stopCh := make(chan struct{})
lockCh, err := lock.Lock(stopCh)
if err != nil {
  panic(err)
}
cancelCtx, cancelRequest := context.WithCancel(context.Background())
req, _ := http.NewRequest("GET", "https://example.com/webhook", nil)
req = req.WithContext(cancelCtx)
go func() {
  http.DefaultClient.Do(req)
  select {
  case <-cancelCtx.Done():
    log.Println("request cancelled")
  default:
    log.Println("request done")
    err = lock.Unlock()
    if err != nil {
      log.Println("lock already unlocked")
    }
  }
}()
go func() {
  <-lockCh
  cancelRequest()
}()

Summary

Locking mechanisms provided by Consul is very powerful. I only touched the surface of how it can be configured. I highly recommend studying Consul internals about Sessions.

Further reading