script to install systemd service for existing node, as well as a couple of improvements to existing scripts to transparently support running in service mode;
script to perform node update with minimal downtime. It automatically gets fresh git updates, builds them, stops node, cleanup node.log to zip archive while leaving tail lines to examine the transition process for manual examination, and then starts new version. It also supports automatic crontab execution, say once a day;
a couple of utility scripts for external monitoring tools:
– one to get average performance duration metric from node.log (which displays famous “SLOW” tag). It is a very basic way of measuring TON node operations performance;
– and the second script to get current wallet balance - it is necessary for monitoring tools to get clean amount without anything else.
The idea was to separate 3 layers - server, network and business (later). Stay clean and do not abuse dashboard with excessive indicators.
Network interface utilization %;
CPU utilization %, iowait, Load average;
Memory and disk utilization %;
FreeTON network sync status (TIME_DIFF);
Node.log duration (famous SLOW tags in log);
Wallet(s) balance history.
Filters support several hosts/disks/wallets/net interfaces. Also dashboard indicators contains threshould values so you could easily tune Grafana alerts.
This post has so many screens, and if I put there all screens for all features, the page will load so long
Hello!
Zabbix, Grafana, ELK… We need to know how to install all this software and how to configure them.
But what if we are not near the computer? We are on the trip, for example, relaxing on a beach or somewhere, where we have no access to our computers?
We didn’t know what our validator node is down, or our CPU/RAM is overloaded. Sound sad =(
Or we get SMS from the monitoring tool, that something is wrong? And again, we need a computer.
SO? Validators need a fast and satisfactory solution for them. I create “TON Telegram Bot” with alerts, statistics, and many useful tools for validators.
Are you ready?
And yes
Check server network speed to different countries (Some countries may not work because speedtest servers may have problems. On Hetzner, many countries didn’t work. In the future, I will add much more servers for tests)
Nice and beautiful, I’m working on a similar one, but with a distinct set of features.
There won’t be any possibility to change state of the node (i.e. change stake or restart the node), though, for security’s sake.
This is script for TON validator nodes for automatic registration in elections and automatic confirmation by custodians of transactions to the elector smart contract
Features:
More reliable than validator_msig.sh
Checks wallet balance before transactions
Confirms registration with “participant_list” method
Telegram and email notifications
Fully supports multisig wallets with reqConfirm > 1
Auto confirmation of multisig transactions
Requests blockchain global configuration parameters (minimal hardcode)
Great solution! This bot is like Swiss Army Knife, has tons of functions in one place, it’s easy to install, and don’t have to open any additional ports. Thank you!
Can you please improve updating stake functionality? I’d prefer to update stake in dialogue style instead of writing down /updstake command. It is not a big deal, but rather inconvenient to type commands using smartphone.
Introducing ftvmon - Free TON Validator’s Node Monitoring and Alerting, written in Go.
Uses Telegram as an endpoint for status messages and alerts, supports multiple users. Sends alerts or reports status for every metric if a user issues the /status command.
Has powerful log inspection engine, can monitor multiple logs simultaneously in real-time, with multiple event-matching criteria per log. Event-matching can be done against simple substring or using regex, regular expressions are compiled and guaranteed to run in time linear in the size of the input (thousands of log records per second can be inspected). Log files are seeked to the end during launch and all new log records are inspected against match criteria in real-time. An alert message for every log event class can be triggered by a single event or by a number of events exceeding a predefined threshold during a predefined time window, in this case the system will send an off message if the condition clears (i.e. if the number of events during last n minutes becomes lower than a threshold set in the config). All log inspection parameters are set in the config file. Some of the validator’s specific log matching entries have been added to config template.
Constantly monitors a number of system performance metrics. System metrics are monitored using native code without invoking any external processes. Sends a message when a condition arises and when it clears. Metrics include CPU, Memory, Free Disk Space, Disk Device IOPS, Disk Mb/s, Network Mb/s, existence of a process with a given name in the system, and Disk I/O % utilization. Disk I/O % utilization (derived from Weighted time spent doing I/Os) is the most meaningful disk counter, device saturation occurs when this value is close to 100% for a single disk (for RAIDs capable of multiple I/O operations simultaneously it can be higher).
Validator’s node specific metrics are:
Sync status (TIME_DIFF);
Is validator’s node in the active set? Checks status using ADNL address, since default scripts overwrite ADNL key file after submitting a stake for the elections, software saves previous ADNL address. Sends an alert if neither of the ADNL keys can be found in the active set;
Is validator’s node in the elections? During elections, if the validator tried to submit a stake for the elections, but its public key can’t be found in the list of election participants, sends an alert. If the validator is found, adds stake amount to status message;
Is validator’s node in the next set? If the next set is active, checks status using current ADNL key and sends an alert if the validator is not found.
Thus, monitoring covers the whole validation cycle.
Easily extendable. Uses run-time reflection, a metric can be added by adding a function (returning status and setting corresponding messages) and creating a config entry with the name of the function.
Cloud bot for monitoring validators node. @TON_Validators_Bot
Hello everybody!
Monitoring is good. You can configure many different monitoring systems on your server where the validator’s node is running. But what to do if the server crashes, freezes, something happened, the monitoring system is fail? You cannot track monitoring if it is unavailable or broken.
I propose a solution. My telegram bot @TON_Validators_Bot does not require installation on your server. It runs in my cloud and does not make any requests to your validator node. However, he can check the time when your node signed the last block in the blockchain. If your node does not sign new blocks for a long time, you will receive a notification in the telegram. You will immediately see that your server requires attention.
What this bot can do?
Monitoring
The bot periodically checks whether your validator node signs blocks.
The bot checks to see if your validator is participating in future validator elections.
The bot can automatically calculate your public keys and adnl addresses. It’s comfortable.
You can find out information about the validator without knowing its Account Address, just enter the public key or adnl address.
Alerts*
If for a long time there are no new signed blocks, you will receive a notification.
If your node does not participate in future elections of validators, you will see it.
*Alerts is under constructions.
Functions
Easy to use! The bot does not require you to take any steps to install and configure it. Just “/start” and enter your Account Address in hex.
The bot does not interact with your server. It is completely autonomous.
The bot checks the result of the validator, not the process. It is only important for him that the validator correctly signs the blocks and they are accepted by the network.
The entire message history will be saved in the telegram chat history.
I’ve joined contest just yesterday and seems won’t be able to deliver working solution by 1st of June, but wanted to share architecture that I came up with and going to implement next week.
Validator node doesn’t have any extra ports exposed
Every deployment can be scaled independently and whenever is required
Very flexible in controlling costs - Validator, Controller and Logstash are deployed via Docker (backed-up with docker-compose) either to bare-metal machine or VM. (Ansible can help in some maintenance later on, I’ve excluded Terraform as it’s not that good for bare-metal cases).
At the same time monitoring can be either custom solution or one of SaaS solutions with pay-as-you-go subscriptions. The same applies for message-queue (either custom deployment or SaaS).
With the current specs for a Validator node bare-metal machines will be the most cost-effective I believe comparing to any VM in any of cloud providers.
Pub/Sub layer provides good abstraction and allows to inject many type of notifications and ways to control validator(s), including safe for the validator web interfaces.
It will be easy to integrate any kind of alerting and automatic response to those alerts.
Controller plan to implement as set of standalone libraries for tonos-cli, lite-client, validator-engine (re-usable for any other python apps as well) + controller logic itself with interface to message queue (so that extension to any MQ will be possible)
Controller will be responsible for automation of participation in elections, querying for configuration and blockchain data, help interaction with smart contracts.
Sad that tackled this TON context that late :\ But anyway will be striving to join validators group
Implementation going to land here and once in working-order, some parts likely will be moved to separate repositories (ex py-tonos-cli, py-ton-lite-client, py-validator-engine).