Practical Byzantine Dynamic Slashing (PBDS)

Mitja · August 21, 2020, 11:38am

by Mitja Goroshevsky and Leonid Kholodov

with an invaluable contribution from Dmitry Shtukenberg and Sergei Zaytzev

Abstract

There is a fishermen-based slashing implemented in TON currently. Why do we think it is not enough? What security assumptions do we think slashing should follow? What mechanisms we should implement and what would be the benefits of such mechanisms?

Firstly, fisherman mechanism will remain there. This mechanism can provide a layer of protection against something like sudden wrong block approval, transaction miscalculation or improper spending. Yet these events are extremely rare or even may be altogether impossible without breaking broader network protocol functions. Bugs can and will happen, for sure, wrong messages may appear, wrong blocks might be approved — therefore fisherman protection mechanism is important to have, but collator missing slots, validator not functioning due to connectivity or memory issue is way more common and therefore represents way greater practical risk to the functionality of the network, its security and other properties.

We have witnessed several catastrophic events related to nodes running incompatible software, malconfiguration, data-center failures, etc. When 1/3 of nodes are not participating in a consensus as a result of such, often unintentional, attack — the network stalls.

Secondly, lets agree for the purpose of this proposal that a >2/3 attack is outside the scope. Some would argue that a 2/3 attack is not an attack but a simple super majority protocol change. Even in the event of a 2/3 approved double spending we should assume it is supported by the community of validators. In the end, if that decision was a result of a sudden event such as a hack, the community can easily restart the network from the desired block after the hack is discovered and fixed.

1/3 attack in an entirely different matter because it is conducted by a minority. In most Proof-of-Stake networks the intentional >1/3 attack will result in practical slashing of the attacking party, because the stopped consensus will inevitably destroy the network value. Because a 33% attack can not promote double spent transactions and wrong blocks, it will not be able to fully economically benefit from such an attack, except in the case of some sort of off-chain ransom extortion from other network participants. In order to mitigate an intentional 1/3 attack Free TON blockchain takes other measures, such as DePool staking. DePools are outside the scope of this proposal and are relevant here merely since whatever protection they are offering to the staking economy, they are not protecting against the Validator Node assigned to a DePool going rogue or simply failing.

Such individual Validator Node failures will result in gradual decrease of network security over the whole period where such nodes are registered as part of the validator set until the next elections. In practice, at any given moment in any Proof-of-Stake blockchain there are many Validators which are not performing partially or at all, meaning we are no longer talking about a fault rate of up to 1/3, but a much smaller number. We believe this security assumption is intolerable.

One of the ways to mitigate this risk is to run a very large number of validator nodes, like thousands or tens of thousands of them. There are POS protocols which take this path. Unfortunately it comes with a drastic cost to their performance (which should be evident even if you don’t like PACELC theorem, for example, for whatever reason).

We take a different approach. Instead of taking a large set of validators and trying to improve their performance (by, for example, trying to rotate them fast) we propose to have a limited set of validators as per current Free TON configuration (maximum 1000) and try to dynamically slash the ones that are not performing.

This proposal describes an agreement of 2/3 of all validators on periodic slashing of non-performing Validator Nodes up to the point where they are dynamically excluded from the current validator set.

Description

The idea of PBDS is to apply fines in each case of evidence for non-performance, forks and incorrect validations during some predefined number of blocks for the blamed validators (let’s call it a slashing period), up to the point in which their stake is decreased below the last election participation threshold, at which point, in turn, the Elector smart contract will issue a message with new network configuration parameters which will exclude the blamed validator from the current validator set.

During the slashing period each validator (reporter validator) computes performance metrics about other validators for all masterchain and shardchain validation sessions. Then the reporter validator computes slashing score based on gathered metrics for each validator and compares slashing scores gathered for different validators using mathematical expectation and variance for the slashing scores from all validators. Based on these metrics the reporter validator computes slashing threshold after which validators will be blamed. If the particular validator’s slashing score is greater than the slashing threshold, the reporter validator generates new messages with this score and sends it to a special Smart Contract which is responsible for slashing conditions processing for each validator (Validator Smart Contract — VSC). VSC is deployed for each validator after each election for processing of related blames. VSC processes messages from different validators and if 2/3 of validators (in the meaning of their weights) blame the particular validator, the VSC computes median for received slashing scores from all reporter validators (median computation is needed to cut off the noise and use the average value of scoring). This median is sent to the Elector Smart Contract. The Elector Smart Contract computes a fine based on it. This fine is subtracted from the Validator’s stake and, if the stake after the fine becomes less than the minimum stake required to win the last elections, the new key block with updated network configuration parameters is produced, in which the validator is removed from the validator set. Once voted by 2/3 of the current validator set (including the excluded validators) the new validator set comes into effect.

Metrics

Short list of metrics which may be used for validators performance benchmarking:

number of proposed blocks of a particular validator which have been signed compared with number of validation rounds where validator might have the ability to propose a block [0;1]
number of rounds where a particular validator approves blocks compared with number of rounds where validator might have the ability to approve blocks [0;1]
number of rounds where a particular validator signs blocks compared with number of rounds where validator might have the ability to sign blocks [0;1]
number of attempts where a particular validator proposes block for voting with number of attempts where validator might have the ability to propose block for voting [0;1]
number of attempts where a particular validator votes for blocks with number of attempts where validator might have the ability to vote [0;1]
one minus average number of warnings per validator session round compared with predefined max average number of allowed warnings clamped by [0;1]
one minus catchain blames count compared to number of catchain sessions [0;1]

How metrics are gathered

The metrics are gathered during the work of consensus layer in validator session and catchain components at the end of each round (just before block is committed or skipped):

for each round rounds counter is increased
collation rounds counter is increased for all validators which have priority more than 0 for this round, so they could propose blocks
vote attempts counter is increased by total number of votes for this round
collations counter is increased for a validator which proposed the signed in this round block
approvals counter is increased for all validators which approved block in this round
signs counter is increased for all validators which signed block in this round
votes counter is increased if node votes for block in an attempt
vote-for counter is increased if node proposed block for voting in an attempt
vote-for attempts counter is increased if node has possibility in attempt to propose block for voting
catchain sessions count is increased for each new created catchain session where validator is participated
catchain blames count is increased each time when blame for validator is appeared (but only once for each catchain session)

Slashing computation

The general idea is to measure M normalized metrics for N validators in matrix Score[N][M]. Each Score[i][j] is in range [0;1], where 0 means absence of performance, 1 is max possible performance. The SlashingScore[i] = 1-SUM(Score[i][j]*Weight[j] ***for*** j ***in*** [0;M)) whereSUM(Weight[j] ***for*** j ***in*** [0;M))=1 and Weight[j] ≥ 0.

Having the SlashingScore[N] matrix makes it possible to find first two moments for it, especially math expectation and standard deviation (m and sigma).

Now it is possible to find out slashing threshold:

SlashingThreshold=min(1,m+sigma*RelativeSlashingThreshold), where RelativeSlashingThreshold is a configurable parameter which allows to clamp validators with bad slashing score metrics based on computed moments (proposed value for RelativeSlashingThreshold is 3, based on Three-sigma rule which allows to slash only 0.3% of all validators assuming that 99.7% of them have enough performance)

If slashing score of a particular validator is greater than SlashingThreshold, the reporter sends blame-message to corresponding validator smart-contract with the normalized slashing score:

NormalizedSlashingScore=(SlashingScore-SlashingThreshold)/(1-SlashingThreshold)

The validator’s smart-contract has dictionary which maps reporter validator public key hash to normalized slashing score. The steps of the smart-contract are following:

check that the reporter validator is allowed to blame someone (only validators can send such messages)
check if the received normalized slashing score is less than already saved slashing score from the specified reporter validator:
- yes → do nothing,
- no → update the slashing score
check if 2/3 of validators (in terms of weights) are blaming a particular validator; if yes, find out median of normalized slashing scores and send the blame message to elector contract with the number of block when slashing condition appeared (2/3 of blames appeared) median slashing score (for performance reasons it may be more convenient to add slashing score to the always sorted list/array for each blame rather than sort slashing scores before sending to elector); after messages sending all internal structures should be reset.

The math above is approximate and may be improved during the concrete implementation of slashing conditions in a validator.

Dynamic Fines

The elector smart-contract receives messages from slashing smart-contracts and does the following steps:

computes max fine bases on the following parameters which tries to predict number of blocks between elections based on current blockchain performance:

blamed validator’s stake
unix time interval between elections
unix time interval between first block for the current validator set and current time
number of blocks produced between first block for the current validator set and current time
number of blocks in a slashing period
computes fine based on the normalized slashing score and max fine (fine=max_fine*slashing_score)
subtracts fine from validator’s stake, and adds it to rewards of other validators
if new validator’s stake is less than min stake of validators in last elections, elector removes the blamed validator from current validators’ set
elector produces new key block with updated validators set and their weights

Indulgences

Despite the fact that proposed fines seem adequate to the performance degradation regarding current validation period, it may seems a little harsh in respect to the total stake of such a validator in particular if it represents other participants stakes in a DePool, for instance.

Therefore the following could be proposed:

to freeze the amount of slashed stake in the elector contract
to count that amount as previously described towards the total validator stake
to exclude the validator from the current validator set if the stake minus freeze is less than minimum stake required
to return the frozen amount minus the DePool Minimum Validator Stake (currently set to 20%)
to distribute the leftover to other validators

Note:

This Proposal is submitted to the Free TON Slashing Condition Specification Contest