Validator Contest: Devops tools [Finished]

rainblowing · May 24, 2020, 3:47pm

Here are some more validation automation scripts - https://github.com/rainblowing/ton-auto-validationhttps://github.com/rainblowing/ton-auto-validation

And some ideas regarding multisig wallet decentralization - https://github.com/rainblowing/ton-auto-validation/wiki/Validation-Decentralization-Ideas

Mitja · May 25, 2020, 12:08pm

vgk88 · May 25, 2020, 4:43pm

I think that you need to add staked amount and stale weight .

dnugget · May 26, 2020, 7:49pm

Dear All,

I’ve published some scripts too:

script to install systemd service for existing node, as well as a couple of improvements to existing scripts to transparently support running in service mode;
script to perform node update with minimal downtime. It automatically gets fresh git updates, builds them, stops node, cleanup node.log to zip archive while leaving tail lines to examine the transition process for manual examination, and then starts new version. It also supports automatic crontab execution, say once a day;
a couple of utility scripts for external monitoring tools:
– one to get average performance duration metric from node.log (which displays famous “SLOW” tag). It is a very basic way of measuring TON node operations performance;
– and the second script to get current wallet balance - it is necessary for monitoring tools to get clean amount without anything else.

Available at: net.ton.dev/scripts at toolscripts · samorodkin/net.ton.dev · GitHub
under Apache 2.0 license (cheers goes to M )

Also I have made a very pleasant dashboard on TIG stack (Telegraf+Influxdb+Grafana), please have a look:

The idea was to separate 3 layers - server, network and business (later). Stay clean and do not abuse dashboard with excessive indicators.

Network interface utilization %;
CPU utilization %, iowait, Load average;
Memory and disk utilization %;
FreeTON network sync status (TIME_DIFF);
Node.log duration (famous SLOW tags in log);
Wallet(s) balance history.

Filters support several hosts/disks/wallets/net interfaces. Also dashboard indicators contains threshould values so you could easily tune Grafana alerts.

In order to setup dashboard besides standard TIG stack you need to update your telegraf.conf according to dashboard variables https://samorodkin.grafana.net/d/IeFxBvzMk/freeton-example-dashboard

sergemedvedev · May 27, 2020, 12:50am

Updated! Thanks a lot for the idea!

Also I’ve added a “SLOW-meter” which shows SLOW-to-all logs ratio.

Stanislav · May 27, 2020, 12:43pm

This post has so many screens, and if I put there all screens for all features, the page will load so long

Hello!
Zabbix, Grafana, ELK… We need to know how to install all this software and how to configure them.
But what if we are not near the computer? We are on the trip, for example, relaxing on a beach or somewhere, where we have no access to our computers?
We didn’t know what our validator node is down, or our CPU/RAM is overloaded. Sound sad =(
Or we get SMS from the monitoring tool, that something is wrong? And again, we need a computer.
SO? Validators need a fast and satisfactory solution for them. I create “TON Telegram Bot” with alerts, statistics, and many useful tools for validators.
Are you ready?
And yes

Goedenmiddag, God eftermiddag, Guten Tag, Buenas tardes, Bonne après-midi, नमस्कार, Buon pomeriggio, Boa tarde, Hyvää iltapäivää, God eftermiddag, Tünaydın, Καλό απόγευμα, Добрый день, Доброго дня, こんにちは

My telegram bot supports all languages above!

Let’s go!

What this bot can do for now(This is only start))

Monitoring

Validator node
CPU load
RAM load
Network
Time diff
Wallet balance
Stake monitoring
Error log monitoring
Slow log monitoring

Historical data

1. CPU Utilization (Dinamic)

!

2. RAM Load (Dinamic)

3. Time Diff (Dinamic)

4. Slow log events

5. Disk I/O (Dinamic)

6. Network perfomance (Dinamic)

7. Ping test (Dinamic)

Alert

1. Validator node down

2. High CPU Utilization
alert1

3. High RAM load
/No screenshot, but, it will be like other alerts/

4. Network degradation
Screenshot 2020-06-02 at 15.47.09
5. Stake < Wallet balance

Features

Validator

Restart validotor node
Check current stake
Update stake
Check wallet balance
Check current time diff + Historical data
Know your adnl key
Get your error log
Get your slow log + Historical data
Validators count (New)
Election status & validators count

Server

Check CPU load + Historical data
Check RAM load + Historical data
Check disk usage
Check disk i/o + Historical data
Check validator ports
Check server ping + Historical data
Alalyze server traceroute
Get top processes
Check uptime
Check network load + Historical data
Check server network speed to different countries (Some countries may not work because speedtest servers may have problems. On Hetzner, many countries didn’t work. In the future, I will add much more servers for tests)

Some screenshots
Start screen

Español example

Alerts (Node not running, diff time, high ping, high CPU load, High RAM load, Validator node is down!, Stake lower than your wallet balance)

Validator tools (You can just restart your node in a second)

Linux tools

Check server network speed test

On mobile all looks awesome

photo_2020-05-27 15.34.52590×1280 107 KB

Future: history graphics for (diff time, cpu, ram, network etc… ) and many other interesting things
Looks good?
And installation for a minute!
Download GitHub - anvme/TONTgBot: Like a Swiss Army knife, this Telegram bot will help you in any situation with your validator server. Well, almost any, but I'm working.

bakarapara · May 27, 2020, 2:12pm

very nice! thank you!

4hash · May 28, 2020, 10:23am

Nice and beautiful, I’m working on a similar one, but with a distinct set of features.
There won’t be any possibility to change state of the node (i.e. change stake or restart the node), though, for security’s sake.

W1ldberry · May 28, 2020, 12:20pm

Hello!

This is script for TON validator nodes for automatic registration in elections and automatic confirmation by custodians of transactions to the elector smart contract

Features:

More reliable than validator_msig.sh
Checks wallet balance before transactions
Confirms registration with “participant_list” method
Telegram and email notifications
Fully supports multisig wallets with reqConfirm > 1
Auto confirmation of multisig transactions
Requests blockchain global configuration parameters (minimal hardcode)
Uses tonos-cli

Criticism and suggestions are welcome!

exs4all · May 28, 2020, 11:29pm

Thank you! Handy telegram bot!!!
It works perfectly on ubuntu 18.04. My greetings. I hope you will get the first place!

Stanislav · May 29, 2020, 1:36am

Upgrades in my telegram bot.

Added Historical data (New)

1. CPU Utilization (Dinamic)

!

2. RAM Load (Dinamic)

3. Time Diff (Dinamic)

4. Disk i/o

5. Network performance

6. Ping test (Dinamic)

Jack85 · May 29, 2020, 7:52am

@Stanislav
Can you please allow the bot to reboot the server? A command like /rebootmyserver
Thank you! All commands work well!

Stanislav · May 29, 2020, 5:32pm

Maybe, but I think that we don’t need this function here now(today).
Maybe in the next few months, I do

Katoshicoins · May 30, 2020, 11:08am

I also want to join on the validators, let me know.

Dezz · May 30, 2020, 2:59pm

Great solution! This bot is like Swiss Army Knife, has tons of functions in one place, it’s easy to install, and don’t have to open any additional ports. Thank you!

Can you please improve updating stake functionality? I’d prefer to update stake in dialogue style instead of writing down /updstake command. It is not a big deal, but rather inconvenient to type commands using smartphone.

4hash · May 31, 2020, 5:03am

Introducing ftvmon - Free TON Validator’s Node Monitoring and Alerting, written in Go.

Uses Telegram as an endpoint for status messages and alerts, supports multiple users. Sends alerts or reports status for every metric if a user issues the /status command.

Has powerful log inspection engine, can monitor multiple logs simultaneously in real-time, with multiple event-matching criteria per log. Event-matching can be done against simple substring or using regex, regular expressions are compiled and guaranteed to run in time linear in the size of the input (thousands of log records per second can be inspected). Log files are seeked to the end during launch and all new log records are inspected against match criteria in real-time. An alert message for every log event class can be triggered by a single event or by a number of events exceeding a predefined threshold during a predefined time window, in this case the system will send an off message if the condition clears (i.e. if the number of events during last n minutes becomes lower than a threshold set in the config). All log inspection parameters are set in the config file. Some of the validator’s specific log matching entries have been added to config template.

Constantly monitors a number of system performance metrics. System metrics are monitored using native code without invoking any external processes. Sends a message when a condition arises and when it clears. Metrics include CPU, Memory, Free Disk Space, Disk Device IOPS, Disk Mb/s, Network Mb/s, existence of a process with a given name in the system, and Disk I/O % utilization. Disk I/O % utilization (derived from Weighted time spent doing I/Os) is the most meaningful disk counter, device saturation occurs when this value is close to 100% for a single disk (for RAIDs capable of multiple I/O operations simultaneously it can be higher).

Validator’s node specific metrics are:

Sync status (TIME_DIFF);
Is validator’s node in the active set? Checks status using ADNL address, since default scripts overwrite ADNL key file after submitting a stake for the elections, software saves previous ADNL address. Sends an alert if neither of the ADNL keys can be found in the active set;
Is validator’s node in the elections? During elections, if the validator tried to submit a stake for the elections, but its public key can’t be found in the list of election participants, sends an alert. If the validator is found, adds stake amount to status message;
Is validator’s node in the next set? If the next set is active, checks status using current ADNL key and sends an alert if the validator is not found.
Thus, monitoring covers the whole validation cycle.

Easily extendable. Uses run-time reflection, a metric can be added by adding a function (returning status and setting corresponding messages) and creating a config entry with the name of the function.

FreeTONi · May 31, 2020, 5:12pm

Cloud bot for monitoring validators node. @TON_Validators_Bot

Hello everybody!

Monitoring is good. You can configure many different monitoring systems on your server where the validator’s node is running. But what to do if the server crashes, freezes, something happened, the monitoring system is fail? You cannot track monitoring if it is unavailable or broken.

I propose a solution. My telegram bot @TON_Validators_Bot does not require installation on your server. It runs in my cloud and does not make any requests to your validator node. However, he can check the time when your node signed the last block in the blockchain. If your node does not sign new blocks for a long time, you will receive a notification in the telegram. You will immediately see that your server requires attention.

What this bot can do?

Monitoring

The bot periodically checks whether your validator node signs blocks.
The bot checks to see if your validator is participating in future validator elections.
The bot can automatically calculate your public keys and adnl addresses. It’s comfortable.
You can find out information about the validator without knowing its Account Address, just enter the public key or adnl address.

Alerts*

If for a long time there are no new signed blocks, you will receive a notification.
If your node does not participate in future elections of validators, you will see it.

*Alerts is under constructions.

Functions

Easy to use! The bot does not require you to take any steps to install and configure it. Just “/start” and enter your Account Address in hex.
The bot does not interact with your server. It is completely autonomous.
The bot checks the result of the validator, not the process. It is only important for him that the validator correctly signs the blocks and they are accepted by the network.
The entire message history will be saved in the telegram chat history.

Screenshot when the validator broken

This bot @TON_Validators_Bot is already running in test mode on the network net.ton.dev
Telegram: Contact @TON_Validators_Bot - just type “/start”

Future: *Alerts is under constructions.
Sources: GitHub - FreeTONi/Ton_Validators_Bot: Telegram: @TON_Validators_Bot

4hash · May 31, 2020, 6:09pm

Fixed a bug: Status set incorrectly on elections close.

Lev · May 31, 2020, 7:02pm

docker-compose for a validator node.
General ideas are:

To not use bash scripts from Tonlabs repo for build and startup.
Fast and easy node upgrade with almost zero downtime.

Started a couple of hours ago so don’t have a lot.
https://github.com/asddsa1137/ton-compose

jarig · May 31, 2020, 7:12pm

Hey,

I’ve joined contest just yesterday and seems won’t be able to deliver working solution by 1st of June, but wanted to share architecture that I came up with and going to implement next week.

Notes:

Validator node doesn’t have any extra ports exposed
Every deployment can be scaled independently and whenever is required
Very flexible in controlling costs - Validator, Controller and Logstash are deployed via Docker (backed-up with docker-compose) either to bare-metal machine or VM. (Ansible can help in some maintenance later on, I’ve excluded Terraform as it’s not that good for bare-metal cases).
At the same time monitoring can be either custom solution or one of SaaS solutions with pay-as-you-go subscriptions. The same applies for message-queue (either custom deployment or SaaS).
With the current specs for a Validator node bare-metal machines will be the most cost-effective I believe comparing to any VM in any of cloud providers.
Pub/Sub layer provides good abstraction and allows to inject many type of notifications and ways to control validator(s), including safe for the validator web interfaces.
It will be easy to integrate any kind of alerting and automatic response to those alerts.
Controller plan to implement as set of standalone libraries for tonos-cli, lite-client, validator-engine (re-usable for any other python apps as well) + controller logic itself with interface to message queue (so that extension to any MQ will be possible)
Controller will be responsible for automation of participation in elections, querying for configuration and blockchain data, help interaction with smart contracts.

Sad that tackled this TON context that late :\ But anyway will be striving to join validators group

Implementation going to land here and once in working-order, some parts likely will be moved to separate repositories (ex py-tonos-cli, py-ton-lite-client, py-validator-engine).