Evolution of Consensus Algorithms

16 min readMay 17, 2021

Blockchain technology

As you know blockchain technology is a disruptive technology in the current Distributed system has a specific set of characteristics including,

Concurrency: each computer in the network executes events independently at the same time as other computers in the network.
No common physical clock: there is no single global clock that determines the sequence of events happening across all the computers in the network. For a distributed system to work, we need a way to determine the order of events. However, in a set of computers operating concurrently, it is sometimes impossible to say that one of two events occurred first, as computers are spatially separated. So, time and order of events are fundamental obstacles in a system of distributed computers that are spatially separated(Lamport,1978).
Independent failure of components: components in a distributed system are faulty. This is why it’s called “fault-tolerant distributed computing. No distributed network is of zero faults. These faults occur in different computers/nodes in a network are independent to each other in a distributed system. Real systems are subject to a number of possible flaws or defects, such as, a process crashing, messages being lost, distorted, or duplicated, a network partition delaying or dropping messages or even a process going completely haywire and sending messages according to some malevolent plan.
No shared memory: This is a key feature that requires “message-passing” for communication. This feature implies the absence of the common physical clock. This message passing may between one or more computers in the system. The coordination and communication of the distributed systems are achieved by message passing. Message passing can be done synchronously or asynchronously.

The failures in distributed systems can be modeled as process failures and communication failures and can be broadly categorized into three categories,

Crash-fail: The component stops working without warning (e.g., the computer crashes)( Process failure models can be fail-stop or crash where in fail-stop model other processes can learn that the process has failed)
Omission: The component sends a message but it is not received by the other nodes (e.g., the message was dropped)(Process failure models can be receive omission errors, send omission errors or general omission errors)
Byzantine: The component behaves arbitrarily. This type of fault is irrelevant in controlled environments (e.g., Google or Amazon data centers) where there is presumably no malicious behavior. Instead, these faults occur in what’s known as an “adversarial context.” Basically, when a decentralized set of independent actors serve as nodes in the network, these actors may choose to act in a “Byzantine” manner. This means they maliciously choose to alter, block, or not send messages at all.

Synchronous message passing

In a synchronous system, it is assumed that,

There is a known upper bound on the message communication delay.
There is a known bounded drift rate for the local clock of each processor with respect to real-time. The drift rate between two clocks is defined as the rate at which their values diverge.
There is a known upper bound on the time taken by a process to execute a logical step in the execution.

This is conceptually less complex because users have a guarantee: when they send a message, the receiving component will get it within a certain time frame. This allows users to model their protocol with a fixed upper bound of how long the message will take to reach its destination. This model is not very practical for use in real world scenarios where computers can crash or go offline and messages can be dropped, duplicated, delayed, or received out of order.

Asynchronous message passing

In an asynchronous message-passing system, All the above assumptions are not satisfied. It is assumed that a network may delay messages infinitely, duplicate them, or deliver them out of order. In other words, there is no fixed upper bound on how long a message will take to be received.

For the implementation of blockchains and many other distributed systems, the architecture of replicated state machines have been used. The state machine approach is a general method for implementing fault-tolerant services in distributed systems. It simply means, set of distributed computers all start with the same initial value and for each state transition, each of the processes decides on the next value. Reaching “consensus” means that all the computers must collectively agree on the output of this value. This maintains a consistent transaction log across every computer in the system. The replicated state machine must continually accept new transactions into this log in spite of,

Some of computers/nodes being faulty.
The network is not reliable and messages may fail to deliver, be delayed, or be out of order.
There is no global clock to help determine the order of events.

These can be considered as fundamental goals of a consensus algorithm.

Byzantine agreement problem and consensus problem

1. Byzatine agreement problem

Byzatine agreement problem originally comes from byzantine generals problem which originally appeared in paper called “The Byzantine Generals Problem” (Lamport et al, 1982). Simply, the involved parties must agree on a single strategy in order to avoid complete failure, but where some of the involved parties are corrupt and disseminating false information or are otherwise unreliable.The Byzantine agreement problem requires a designated process, called the source process, with an initial value, to reach agreement with the other processes about its initial value.

For a byzantine fault tolerant algorithm to reach consensus, it must satisfy the following conditions:

Agreement: All non-faulty processes must agree on the same value.

Validity: If the source process is non-faulty, then the agreed upon value by all the non-faulty processes must be the same as the initial value of the source.

Termination: Each non-faulty process must eventually decide on a value.

2. Consensus problem

Consensus problem can be considered as a different type of byzantine problem where ,each process has an initial value and all correct processes must agree on a single value.

For an algorithm to reach consensus, it must satisfy the following conditions:

Agreement: All non-faulty nodes decide on the same output value.

Validity: If all the non-faulty processes have the same initial value, then the agreed upon value by all the non-faulty processes must be that same value.

Termination: All non-faulty nodes eventually decide on some output value.

Different algorithms have different variations of the conditions above. For example, some divide the Agreement property into consistency and totality or safety. Some consider word liveliness for termination, validity considered as non-triviality & Some have a concepts of integrity or efficiency.

FLP impossibility result.

But, overcoming above problems in synchronous environment was possible. In synchronous environments, messages are delivered within a fixed time frame. In asynchronous environments, there’s no guarantee of a message being delivered. Reaching consensus in a synchronous environment is possible because assumptions about the maximum time it takes for messages to get delivered can be taken. Thus, in this type of system, the different nodes in the system to take turns proposing new transactions can be allowed, poll for a majority vote, and skip any node if it doesn’t offer a proposal within the maximum time limit. But, in real world systems this synchronous assumptions are not practical without controlled environments.

In asynchronous systems if we cannot assume a maximum message delivery time, then achieving termination is much harder, if not impossible. As one of the conditions that must be met to achieve consensus is “termination,” which means every non-faulty node must decide on some output value to reach consensus. So, consensus is not attainable in asynchronous systems even if only one process is failed by crashing.(Fischer et al, 1985).

To overcome this impossibility, the modern consensus algorithms followed two paths.

Using synchrony assumptions.
Use non-determinism.

FLP impossibility result essentially shows that, if we cannot make progress in a system, then we cannot reach consensus. In other words, if messages are asynchronously delivered, termination cannot be guaranteed.

1. Consensus using synchrony assumptions

The first family of consensus algorithms emerged aiming to overcome the FLP impossibility in asynchronous consensus systems using synchrony assumptions.

Paxos

Asynchronous consensus protocols cannot guarantee both safety ( agreement) and liveness (termination), so they all come with their own inherent trade-offs. Paxos is ensuring that a proposed value is eventually selected by the group of participants in a consensus round.

There are three roles in Paxos consensus, known as agents:

Proposers
Acceptors
Learners

The goal of consensus is for a group of participants to come to an agreement on a single value per each round. A round of consensus begins when a proposer sends a proposed value to a group of acceptors. Acceptors may accept the proposed value by a given proposer, and once a certain threshold is met, then that value is approved by the network.

Condition 1: Acceptors must accept the first proposed value that they receive.

Phase 1: Prepare request

The proposer chooses a new proposal version number (n) and sends a “prepare request” to the acceptors.

(problem which could arise if several proposers issuing proposals but all of them accepts no majority value is avoided by uniquely indexing each proposed value that an acceptor receives which allows them to accept more than one proposal.)

If acceptors receive a prepare request (“prepare,” n) with n greater than that of any prepare request they had already responded to, the acceptors send out (“ack,” n, n’, v’) or (“ack,” n, ^ , ^).
Acceptors respond with a promise not to accept any more proposals numbered less than n.

(Restrictions for acceptors:

Promise never to accept a proposal less than n (n is the new proposal number)
Respond with the proposal with the highest number less than n that the acceptor has accepted.)

Acceptors suggest the value (v) of the highest-number proposal that they have accepted, if any. Or else, they respond with ^.

(Multiple proposals can be chosen, but it is necessary to validate the safety property by guaranteeing that these proposals all have the same value. As per Leslie Lamport’s definition of the required second condition of Paxos that ensures safety).

Condition 2: If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v.

Phase 2: Accept request

If the proposer receives responses from a majority of acceptors, then it can issue an accept request (“accept,” n, v) with number n and value v.
n is the number that appeared in the prepare request.
v is the value of the highest-numbered proposal among the responses.
If the acceptor receives an accept request (“accept,” n, v), it accepts the proposal unless it has already responded to a prepare request with a number greater than n.

Phase 3: Learning phase

Whenever an acceptor accepts a proposal, it responds to all learners (“accept,” n, v).
Learners receive (“accept,” n, v) from a majority of acceptors, decide v, and send (“decide,” v) to all other learners.(Variations of this process can be used where either all acceptors inform corresponding learners of their decisions or acceptors respond to a distinct set of learners who then propagate the message to the rest of the learners.)
Learners receive (“decide,” v) and the decided v.

The design of Paxos guarantees that it can accept values when a majority of nodes agree despite other nodes ignoring or denying a proposed value. This differs from previous iterations of consensus that required all nodes to agree and were subject to blockage of the protocol from the failure of single nodes.

Paxos is considered as one of the most difficult algorithms to understand because, In favor of offering flexibility in implementation, several specifications in key areas are left open-ended. Things like leader election, failure detection, and log management are vaguely or completely undefined.

It is often said in consensus science that there is only one consensus algorithm and it is Paxos. This is, on the one hand, a statement of the significance of the Paxos algorithm for the field and, on the other hand, a reflection of the universal foundation of consensus protocols, which is in every case “Paxos-like” (Howard, Malkhi, Spiegelman, 2016).

Raft

Raft on the other hand created for acquiring more understandable version of paxos. It guarantees the same fault-tolerance and performance matrices as paxos.

Raft employs a leader and follower model based on the assumption that a cluster of nodes only has one elected leader. The leader manages the log replication across the participating nodes and is replaced once it fails or disconnects.

The leader is elected at the start of the algorithm through a process initiated by a candidate node. This leader election process is distinguishable in many types of consensus algorithms (ex: Nakamoto Proof of Work (described below), leader selection is achieved through the lottery-like mining process for each round, which is approximately every 10 minutes. In Practical Byzantine Fault Tolerance (pBFT)(described below) , leader selection is performed through a round-robin style format.)

If candidates do not receive communication during a phase known as the election timeout(1), then they vote for themselves after increasing their term-counter and broadcast it to the other nodes. Candidates become followers of other candidates who have a term number(2) at least as large as theirs, and this ripple effect continues among the nodes until one candidate receives a majority of followers.

The leader controls log replication among the nodes where it sends the client request commands to its followers. If a majority of followers confirm replication, then the request is committed. Followers also apply the commits to their local state machines.

Raft retains fault-tolerance from nodes subject to failure or a leader failure by having a new leader force its followers to duplicate its own logs. Any entries that do not agree with each other are deleted, maintaining consistency of log replication.

Leader candidates are required to have a more up-to-date log than follower logs. If a candidate’s log is less up-to-date than a potential follower (a voter in this context), then the candidate is rejected.

Overall, Raft deconstructs consensus into 3 individual sub-problems:

Leader Election
Log Replication
Safety

The consensus protocol uses a strong leader, meaning that the leader node in Raft exerts substantial influence on the process while remaining restricted by the confines of the protocol. As a result, Raft is more straightforward in design than Paxos.

One important new thing in Raft is the concept of using a shared timeout to deal with termination. In Raft, if you crash and restart, you wait at least one timeout period before trying to get yourself declared a leader, and you are guaranteed to make progress.

Paxos and Raft are important consensus protocols that are core components of the larger distributed fault-tolerance ecosystem. While not directly employed in cryptocurrencies, the consensus protocols used in cryptocurrency networks derive many of their characteristic assumptions from the design of both Paxos and Raft.

But, above mentioned raft and paxos are only able to overcome the crashes where they are only crash fault tolerant. These algorithms assume that the processes can’t act maliciously and lie. Therefore they are suitable when a simple majority is enough to reach the consensus. But, in distributed systems like public blockchains this simple majority is not enough to reach the consensus. Half or more of the supposedly honest nodes can coordinate with each other to lie.

Byzantine fault tolerant algorithms

Practical Byzantine Fault Tolerant(PBFT)

As mentioned above, classical consensus algorithms were unable to overcome byzantine failures. So, algorithms emerged to solve the problem of byzantine agreement problem were known as byzantine fault tolerant algorithms. In their paper Leslie Lamport, Robert Shostak, and Marshall Pease showed that a system with x Byzantine nodes must have at least 3x + 1 total nodes in order to reach consensus.(Lamport et al,1985). But, the paper only designed to work in synchronous environments and this showed there is a trade off between byzantine and asynchronous systems where an environment both that is byzantine and asynchronous is much harder to develop. So, an algorithm called Practical Byzantine Fault Tolerant(PBFT) algorithm was introduced.

First attempts in BFT algorithms were aiming synchrony assumptions for safety. If nodes aren’t deciding on some output value, the system just halts. So, if some synchrony assumptions were made (i.e., timeouts) to guarantee termination and one of those fails, it makes sense that this would also bring the system to a stop. But if an algorithm is designed assuming timeouts (to guarantee correctness), carries the risk of leading to two valid chains which may lead to double spending problem.

Byzantine fault tolerant algorithms which could be implemented in a practical environment using synchrony assumptions,carries the risk of leading to two valid transaction logs(i.e forks) if the synchrony assumption fails. Designing a distributed system is always about trade-offs. There’s no point in having a useful service (i.e., liveliness) if the service is corrupt (i.e., no safety). Basically, having two different blockchains is worse than having the entire blockchain come to a halt.

PBFT made a weak synchrony assumption for achieving liveliness. In their paper Castro and Barbera argued that previous BFT algorithms, while shown to be “theoretically possible,” were either too slow to be used or assumed synchrony for safety(Castro & Barbera,1999).

The algorithm provided safety regardless of how many nodes were faulty. In other words, it didn’t assume synchrony for safety. The algorithm did, rely on synchrony for liveliness.

The algorithm moved through a succession of “views,” where each view had one “primary” node (i.e., a leader) and the rest were “backups.”

A new transaction happened on a client and was broadcast to the primary.
The primary multicasted it to all the backups.
The backups executed the transaction and sent a reply to the client.
The client wanted x + 1 replies from backups with the same result. This was the final result, and the state transition happened.

If the leader was non-faulty, the protocol worked just fine. In order to reach consensus, PBFT required a quadratic number of message exchanges, meaning every computer had to communicate with every other computer in the network. This was not practical enough to the real world cases which involve large amount of peers(eg: public blockchains). But, PBFT have evolved to many variants of protocols in modern blockchain platforms like tendermint, Q/U, HQ, ABsTRACT, Zyzzyva, etc. address issues of performance and costs. Implementation of pBFT in the segmentation architecture (sharding), meaning that the pBFT consensus groups remain smaller within certain segments, thereby maintaining a high throughput mechanism while limiting the size of the consensus group, Aardvark and RBFT, have solved the reliability problems.

As using synchrony assumptions classical consensus protocols had some drawbacks too (e.g., the quadratic communication costs [O(n²)] among the participating nodes). This was considerable when developing public permission-less platforms which has a large amount of nodes. So, a substitute is needed to overcome the FLP impossibility, achieve safety and liveliness in asynchronous environments and the latest problem, scalability. A new group of consensus was introduced as Nakamoto consensus.

2. Non deterministic consensus algorithms

Nakamoto consensus algorithms

Nakamoto consensus, by the name suggests the first versions were introduced with the advent of blockchains. Rather than electing the leader for consensus, communicating with all nodes, nakamoto consensus employ a method of selecting a node which can solve the computation puzzle the fastest. Their main shared feature is the presence of some kind of “difficulty” (work, resource, asset, etc.), which a network member must “invest” in order to be able to create a new block. At the same time, other members of the network should be able to easily verify that the work was actually done. The network continues to build on this time-stamped chain, and the canonical chain is the one with the most cumulative computation effort expended (i.e., cumulative difficulty). Nakamoto consensus is probabilistic, meaning that the algorithm validates a new entry by utilizing the whole history of previous entries, which the probability that a transaction will not be reverted with more blocks appended after that transaction. However, the cost of performance, scalability, throughput and wastefulness of energy some major concerns in nakamoto consensus . Also, in contrast to other BFT algorithms, some special features are there in nakamoto consensus to overcome some issues which must be concerned such as,

Block rewards: Nodes must use a huge amount of energy in-order to compete to become the leader. In Order to economically motivate nodes to repeatedly perform computationally expensive puzzles a block reward is introduced. By this nodes get chance of randomly winning a large reward for appending a correct block.
Sybil resistance: which is resisting attacks one person tries to take over the network by creating multiple accounts, nodes or computers. This may lead to out-voting honest nodes or even 51% attack(1) which may lead to double spending.
Peer-to-peer gossip: gossip protocol is a peer to peer communication protocol which is based on the way epidemics spread.Assume a node is only connected to a subset of other nodes. Then use the peer-to-peer protocol where messages are being gossiped between nodes.

Nakamoto consensus is essentially byzantine fault tolerant and it does not technically guarantee safety in asynchronous environments because the consensus algorithms discussed prior to this deterministically finalize a value at every step but, in nakamoto consensus it is possible for there to be a network partition that lets an attacker with a sufficiently large hash power on one side of the partition to create an alternative chain faster than the honest chain on the other side of the partition. Which may probably lead to double spending problem. But the assumption is that the nodes won’t rush on to create an alternative chain because it is too difficult to coordinate enough hashing power to adapt to new chain.

There are many variants of Nakamoto consensus which was introduced after the advent of Proof of Work(PoW) by bitcoin.

Proof of Work (PoW)

The energy footprint is an issue in PoW which is estimated that a bitcoin mining consumer roughly consumes 24TeraWatt hours of electricity per year — approximately the same amount of electricity used by the entire nation of Ireland per year. ( O’Dwyer K.J. and Malone , 2014) ( de Vries A. , 2018).

As per PoW, due to the immense use of energy single miners couldn’t bear the cost so that several miners gather to form grouped hashing which is called mining pools. Due to this the hashing power of the network centralized to several mining pools which again allow 51% attack and eventually double spending. There is also the argument that the use of pure Proof of Work can lead to a decrease in network security with a decreasing mining award over time (as is the case with Bitcoin). Even with transaction fees in place, many miners will find it unprofitable to support the network and will remove their processing power, which will reduce the complexity of performing PoW tasks and therefore the cost of potential attacks.

This leads for finding alternative consensus mechanisms to overcome these issues, but still PoW is the most widely implemented consensus algorithms for modern blockchain systems.