MN timeout on transient error

Team,

Some of my nodes break down on a daily basis with “Timeout on transient error: Could not connect to the server X.X.X.X. Make sure the syscoind server is running and that you are connecting to the correct RPC port.”.

Reindexing and a full reinstall only help temporarily and they break down again few days later with the same error. Any suggestions? What are the common causes I should be looking into?

Many thanks.

I have a similar error… i notice that it debug log a message indicating my block data is corrupt.

error: verifydb(): *** irrecoverable inconsistency in block data

A reindex seems to fix it for a few days, but it seems to happen again after that. Any suggestions?

The inconsistent DB is likely the way you are shutting down syscoind, perhaps you are not using syscoin-cli stop but just closing the process which is not recommended, a graceful shutdown is using the CLI or control-c on the process running in foreground. On the syscoind server not responding, it could be halted through a deadlock in the code or could be insufficient file handlers, you can try to increase the number of file descriptors allowed in your operating system from the defaults to see if it helps. When you get that error you should see if the CPU is at 100% if it is then it is deadlocked and unresponsive which is a problem in the code (hopefully fixed in coming 4.3.0)