What a day it was resolving issues with Hive Engine nodes
I did not expect my weekend to go like this. I actually had some plans for this weekend and everything got changed because of an issue with Hive Engine nodes today. When I woke up in the morning, I saw a lot of messages and tags in the Hive Engine witness chat channel. It is usually practiced for me to take a look at this channel every single day. Today also I did the same and was surprised to see that the nodes had been down for several hours already.
The main origin of the issue was because one of the Hive Node operators was testing out an HF26's feature for some time to see if something was breaking. Unfortunately, it broke Hive Engine nodes and most of the nodes had to restore the database from a backup. It is really a painful process to restore the database from backups. Mongo should improve a lot in that aspect.
Usually, when there is a problem, only a few nodes will get affected and it would be easy for me to restore the other nodes from one of the existing working nodes. But to my surprise, all the witness nodes I manage had issues and there was only one node that did not have any problem luckily and that was a light node by the way. But I was happy with that because I was able to at least restore some of the nodes with the help of a DB backup from that node.
Usually, when something like this happens, the withdrawal system will also get affected. There will be delays in verifications and withdrawals to get processed. That is the right thing because we will not know if the transactions are correct or not if the verification part is not happening. Verification is nothing by validating the data against multiple nodes. The issue is fully not over. There are still over 9000 blocks to get verified at the time of writing this article. But this is far better than the 20k blocks we had a few hours back.
It is really sad to see this happening on a weekend or when people are on their vacation. It becomes a challenge to fix the issue as soon as possible. Some people already quit running witness nodes saying that the revenue from the nodes is not very great and they think it is not worth running the node. Some people question decentralization and they don't want to run nodes. I and a few members have been here from day 1 and I really enjoy being on this channel and being one of the fellow witnesses.
I have some dApps that are dependent on Hive Engine nodes and it is really important for me to keep the nodes intact. I also offer this as a service to some more people. People who would like to run a node approach me and I offer them as a service. I take care of managing them if there are issues or situations like this. It becomes a headache when there is a need to restore the database again. This is the saddest part here.
Overall it was a fun day and I had to do multiple restores and multiple checks to bring some witnesses up and running. I'm still not done. I have some full nodes to be restored from scratch yet.
If you like what I'm doing on Hive, you can vote me as a witness with the links below.
Vote @balaz as a Hive Witness
Vote @kanibot as a Hive Engine Witness