Validator Contest: Devops tools [Finished]

4hash · May 31, 2020, 5:03am

Introducing ftvmon - Free TON Validator’s Node Monitoring and Alerting, written in Go.

Uses Telegram as an endpoint for status messages and alerts, supports multiple users. Sends alerts or reports status for every metric if a user issues the /status command.

Has powerful log inspection engine, can monitor multiple logs simultaneously in real-time, with multiple event-matching criteria per log. Event-matching can be done against simple substring or using regex, regular expressions are compiled and guaranteed to run in time linear in the size of the input (thousands of log records per second can be inspected). Log files are seeked to the end during launch and all new log records are inspected against match criteria in real-time. An alert message for every log event class can be triggered by a single event or by a number of events exceeding a predefined threshold during a predefined time window, in this case the system will send an off message if the condition clears (i.e. if the number of events during last n minutes becomes lower than a threshold set in the config). All log inspection parameters are set in the config file. Some of the validator’s specific log matching entries have been added to config template.

Constantly monitors a number of system performance metrics. System metrics are monitored using native code without invoking any external processes. Sends a message when a condition arises and when it clears. Metrics include CPU, Memory, Free Disk Space, Disk Device IOPS, Disk Mb/s, Network Mb/s, existence of a process with a given name in the system, and Disk I/O % utilization. Disk I/O % utilization (derived from Weighted time spent doing I/Os) is the most meaningful disk counter, device saturation occurs when this value is close to 100% for a single disk (for RAIDs capable of multiple I/O operations simultaneously it can be higher).

Validator’s node specific metrics are:

Sync status (TIME_DIFF);
Is validator’s node in the active set? Checks status using ADNL address, since default scripts overwrite ADNL key file after submitting a stake for the elections, software saves previous ADNL address. Sends an alert if neither of the ADNL keys can be found in the active set;
Is validator’s node in the elections? During elections, if the validator tried to submit a stake for the elections, but its public key can’t be found in the list of election participants, sends an alert. If the validator is found, adds stake amount to status message;
Is validator’s node in the next set? If the next set is active, checks status using current ADNL key and sends an alert if the validator is not found.
Thus, monitoring covers the whole validation cycle.

Easily extendable. Uses run-time reflection, a metric can be added by adding a function (returning status and setting corresponding messages) and creating a config entry with the name of the function.

FreeTONi · May 31, 2020, 5:12pm

Cloud bot for monitoring validators node. @TON_Validators_Bot

Hello everybody!

Monitoring is good. You can configure many different monitoring systems on your server where the validator’s node is running. But what to do if the server crashes, freezes, something happened, the monitoring system is fail? You cannot track monitoring if it is unavailable or broken.

I propose a solution. My telegram bot @TON_Validators_Bot does not require installation on your server. It runs in my cloud and does not make any requests to your validator node. However, he can check the time when your node signed the last block in the blockchain. If your node does not sign new blocks for a long time, you will receive a notification in the telegram. You will immediately see that your server requires attention.

What this bot can do?

Monitoring

The bot periodically checks whether your validator node signs blocks.
The bot checks to see if your validator is participating in future validator elections.
The bot can automatically calculate your public keys and adnl addresses. It’s comfortable.
You can find out information about the validator without knowing its Account Address, just enter the public key or adnl address.

Alerts*

If for a long time there are no new signed blocks, you will receive a notification.
If your node does not participate in future elections of validators, you will see it.

*Alerts is under constructions.

Functions

Easy to use! The bot does not require you to take any steps to install and configure it. Just “/start” and enter your Account Address in hex.
The bot does not interact with your server. It is completely autonomous.
The bot checks the result of the validator, not the process. It is only important for him that the validator correctly signs the blocks and they are accepted by the network.
The entire message history will be saved in the telegram chat history.

Screenshot when the validator broken

This bot @TON_Validators_Bot is already running in test mode on the network net.ton.dev
Telegram: Contact @TON_Validators_Bot - just type “/start”

Future: *Alerts is under constructions.
Sources: GitHub - FreeTONi/Ton_Validators_Bot: Telegram: @TON_Validators_Bot

4hash · May 31, 2020, 6:09pm

Fixed a bug: Status set incorrectly on elections close.

Lev · May 31, 2020, 7:02pm

docker-compose for a validator node.
General ideas are:

To not use bash scripts from Tonlabs repo for build and startup.
Fast and easy node upgrade with almost zero downtime.

Started a couple of hours ago so don’t have a lot.
https://github.com/asddsa1137/ton-compose

jarig · May 31, 2020, 7:12pm

Hey,

I’ve joined contest just yesterday and seems won’t be able to deliver working solution by 1st of June, but wanted to share architecture that I came up with and going to implement next week.

Notes:

Validator node doesn’t have any extra ports exposed
Every deployment can be scaled independently and whenever is required
Very flexible in controlling costs - Validator, Controller and Logstash are deployed via Docker (backed-up with docker-compose) either to bare-metal machine or VM. (Ansible can help in some maintenance later on, I’ve excluded Terraform as it’s not that good for bare-metal cases).
At the same time monitoring can be either custom solution or one of SaaS solutions with pay-as-you-go subscriptions. The same applies for message-queue (either custom deployment or SaaS).
With the current specs for a Validator node bare-metal machines will be the most cost-effective I believe comparing to any VM in any of cloud providers.
Pub/Sub layer provides good abstraction and allows to inject many type of notifications and ways to control validator(s), including safe for the validator web interfaces.
It will be easy to integrate any kind of alerting and automatic response to those alerts.
Controller plan to implement as set of standalone libraries for tonos-cli, lite-client, validator-engine (re-usable for any other python apps as well) + controller logic itself with interface to message queue (so that extension to any MQ will be possible)
Controller will be responsible for automation of participation in elections, querying for configuration and blockchain data, help interaction with smart contracts.

Sad that tackled this TON context that late :\ But anyway will be striving to join validators group

Implementation going to land here and once in working-order, some parts likely will be moved to separate repositories (ex py-tonos-cli, py-ton-lite-client, py-validator-engine).

Gofman · May 31, 2020, 9:46pm

Hello! I want to support validator node at least for the one year by this infrastructure. It’s just initial configuration, a lot of tests and additional functionality will be available soon.

For now I have Dockerfile for C++ node (which can be run in Openshift also)
Helm chart for Validator Node
High Availability infrastructure based on AWS
Logging system based on CloudWatch
EC2 monitoring

qwertys318 · June 1, 2020, 1:25am

Hello
Where to put our solution of this contest?

Stanislav · June 1, 2020, 1:42am

Hello!
Many Validators ask me about the graphic of slow events, and now it is ready! Graphic by slow event groups will be available later.
Of course, you can download the latest version of my telegram bot on GitHub

Best regards,
Stanislav

qwertys318 · June 1, 2020, 2:43am

Hello
This is our solution based on Kubernetes, ELK and Zabbix with alerting to telegram
https://github.com/freeton-dreamteam/contest

isheldon · June 1, 2020, 9:07am

Hi
Please check out validator contest submission from our team

br3d · June 1, 2020, 11:10am

Hello everyone
We are pleased to present our ansible-freeton repository
GitHub - br3d/ansible-freeton: Ansible playbook for deploy freeton
Roles list:

common - preparing system and install dependencies
freeton - build and setup FreeTon node
netdata - real-time monitoring
prometheus-node-exporter - exporter for hardware and OS metrics exposed, also this gives opportunity get balance and diff in freeton network

Dashboard example based on prometheus-node-exporter with custom metrics from freeton network

nka1202 · June 1, 2020, 5:37pm

wiz wiz2

cli wizard for manual managing of the requests for the participation in elections with a possibility of making a bid (with checking of the validity) returning of the stake.
possibility of working with many nodes in different networks: FreeTON mainnet/testnet with the calls tonos-cli, or nodes of other ton networks with the calls lite-client
keeping the wallet keys on the local computer, not on the node. On the node are generated only permkey and adnlkey.
Script of automatic deploying of surrounding, which receives from the sources tonos-cli, lite-client, validator-engine-console, fift, fift-scripts and delivers it across its file system.

Setting-up is realized in /tmp, and then it erase everything
rested after the compilation.

Stanislav · June 1, 2020, 9:31pm

Hello!

The new category opens in the Validator tool - Info.
Step by step, I will add useful information there.
Monitoring based on this information will be later.

Let’s start

Election status & validators count

Screenshot 2020-06-01 at 23.30.13788×692 382 KB

Gofman · June 1, 2020, 9:42pm

UDP:
Init TONTgBot integration. Thanks @anvme for the bot and support .

Link to repo: https://github.com/SkySonR/freeton-infra

Stanislav · June 1, 2020, 10:02pm

Some functions, right now, can don’t work correctly in docker. I hope that we will solve this.
But, on Ubuntu, all works perfectly.

PhillHuge · June 1, 2020, 10:35pm

When will be the result?

Mitja · June 1, 2020, 11:11pm

Will discuss that tomorrow on a weekly call.

PhillHuge · June 2, 2020, 7:23am

Great news! Thanks a lot.

Gofman · June 5, 2020, 5:39pm

UDP: Fixed all docker/kubernetes issues with TGbot integration. Works 99% of functionality.

Repository: https://github.com/SkySonR/freeton-infra

hortonelectric · June 5, 2020, 10:33pm

In this submission there is no reference to any other network than the FREETON network. The toolkit is called ‘GRAM’ because it’s easy to differentiate it from the currency TONs while still being unique enough to differentiate between itself and the actual freeTON code.

Right now you must build and compile the TON C++ locally (even if you are using Docker). I’m working on replacing generate-initial-keys binary and many other improvements.

You can build docker images for a number of OS’s (probably one of the more robust docker test kits out there and the images are optimized somewhat)

It uses pm2 to manage the different processes: validator node, nodeJS API, documentation server, web server to house the Vue app toolkit.

It also uses tmux to throw up some logs so you can kind of see what’s going on at the core layer, though I recommend some of the other great tools for more advanced monitoring and joining elections, I didn’t waste my time since others have done such a great job there.

The toolkit also comes with a nodeJS API which can talk to lite-client on the node, so it makes it as easy as a POST request to send files and GET request to get account info/run methods, no pub key needed. It also serves up the config json and liteserver pub keys.

I’ve added a JSON-based block explorer to the TON C++ (which is merely a copy of the other block explorer with all of the garbage HTML removed).

It also comes with a VueJS / cordova / electron app that you can extend to your liking, and I plan much more to come there.

The documentation is nice, you can check that out by looking in the docs/README.md folder.

It does include the WASM kit for fift/func in case you want to use those.