DevOps Failover Solution - Validator Redundancy and Network Availability

Contest: Validator Failover Solution - Validator Redundancy and Network Availability

[Dates TBD - Depending on Rust validator release date]
According to Meetup #23, Rust validator may be ready soon. I think it would be good idea to wait for for Rust validator before starting this contest

Contest dates

  • Warm-up / commenting / contest improvement period: TBD
  • Submission period: TBD

Short description

Develop an architecture solution, scripts, and instructions for deploying secure validator failover system.

Motivation

A secure failover solution for validators will provide two important functions:

  1. Prevent slashing - if main server fails, a backup server will continue validating

  2. Improve network availability - significantly reduce the probability of > 1/3 of mainnet validators going offline

Professional validators need redundancy to prevent slashing in case of catastrophic server failure such as hardware failure or local internet outage. Also, if a large percentage of validators use a redundant/failover solution, this will improve the network’s availability. For example, if > 1/3 of servers are in the same geographic area, and there is regional internet outage, the backup servers (in a different geographic location) will prevent the network from stalling.

General requirements

  • After each election, main server will securely send to backup server all keys, files, and information required for backup server to continue validating if the main server fails / goes offline.
  • Remote monitoring / triggering solution:
    • Monitor status of main and backup servers
    • Detect when main server fails to validate, and trigger backup server to start validating
    • Alerting system for status of main/backup servers and failure events
  • A method for preventing conflict between main server and backup server (if main server comes back online after failure)

Terms

  • System must be secure

Evaluation criteria and winning conditions

Hard criteria

  • Final document should be presented in form of white-paper, including an abstract as a preliminary overall description of the system
  • Link to documentation at Github/Gitlab or another open repository, with the obligatory backlink to your submission in the repository’s README
  • Detailed structured documentation for each part of the system
  • Easy to follow instructions for setting up and operating failover system
  • System must be tested and proven to operate reliably

Soft criteria

  • Simple, elegant solution
  • Maximize network robustness / availability
  • Minimize the resources/cost required for maintaining backup server(s) and monitoring. The backup server only needs to operate until next elections/validation round (minimum requirement), or until main server comes back online. Assume validation rounds can be one day to one month long.
  • Bonus: Obscure/hide location of backup server, for additional network security. For example, external network analysis (or synced node) can not determine location or IP of main server’s backup server. This could use mix-network (for sending election keys/files) with other validators using the same failover solution, onion routing, or a different solution.
  • Bonus: Provide optional shared solution for validators that do not want to maintain a full, personal backup server. For example, several validators could share one dedicated server with multiple containers and IPs to backup multiple main servers. (must be additional to personal backup solution).

Artifacts

  • Google Doc with the white-paper open for commenting and containing the backlink to the submission.
  • Preferably to use block diagrams, schemes, etc.

Rewards

1 place………………………… 100,000 TONs
2 place…………………….…… 75,000 TONs
3 place………….……………… 50,000 TONs
4 place…………………….…… 25,000 TONs

Bonuses:

  • +50% of the main reward amount for each bonus (above) achieved

If no participant will demonstrate a reliable working system, an additional stage of contest may be announced later.

Voting

  • Jury members who vote in this contest must have a solid understanding of the technology. Those jurors who don’t, should not vote or choose “Abstain.”
  • Jurors whose team(s) intend to participate in this contest by providing submissions lose their right to vote in this contest.
  • Each juror will vote by rating each submission on a scale of 1 to 10 or can choose to reject it if it does not meet requirements or choose to abstain from voting if they feel unqualified to judge.
  • Jurors will provide feedback on your submissions.
  • The Jury will reject duplicate, sub-par, incomplete, or inappropriate submissions.

Jury rewards

An amount equal to 5% of the prize fund will be divided equitably between all jurors who vote and provide feedback based on their votes’ quantity and quality. Both voting and feedback are mandatory to collect this reward.

Procedural reminders to all contestants

  • Accessibility. All submissions must be accessible for the Jury to open and view, so please double-check your submission. If the submission is inaccessible or does not fit the criteria described, jurors may reject the submission.
  • Timing. Contestants must submit their work before the closing of the filing of applications. If not submitted on time, the submission will not count.
  • Contact information. All submissions must contain the contestant’s contact information, preferably a Telegram username by which jurors can verify that the submission belongs to the individual who submitted it. If not, jurors may reject your submission.
  • Content. The content published in the forum and the provided PDF file should not differ, except for formatting. Otherwise, jurors may reject the submission.
  • Well-formed links. If your submission has links to the work performed, the content of those links must have the contestant’s contact details, preferably a Telegram username, so jurors can match it and verify whom the work belongs. If not, jurors may reject your submission.
  • Multiple submissions.
    • Each contestant has the right to provide several submissions if they contain different approaches to the contest problem’s solving. However, if works are not unique enough or differ just in insignificant details, jurors may reject such repeating submissions.
    • If the contestant wants to make an additional submission that overrides the one previously published, he must inform the Jury about this fact and indicate the correct revision to assess. In this case, only the indicated work will count. If the contestant hasn’t indicated the updated submission as the correct one, only the first one will count, the Jury will reject all the others.

Disclaimer

Anyone can participate, but Free TON cannot distribute Tons to US citizens or US entities.

Feedback on This Contest is Strongly Encouraged

Please consider this proposal a draft. Feedback to improve this contest specifications and design are strongly encouraged.

12 Likes

Please add

2 Likes

It’s very important for all validators to ensure high uptime and contain network performance at high levels. Great proposal.

4 Likes

Immediately the question, I think many of us will prepare this decision and some of them will be even similar, how will this contest be evaluated and if there will be 10-15 applicants for four prizes of absolutely different size?

1 Like

Good competition, there will be more and more validators, but there is no good instruction

1 Like

I think 100000 TONs for creating instructions and not scripts for automatic deployment to run failover FreeTON nodes is a bit too much.
And I agree with Aleksandr that the Kriteria is too vague.

1 Like