preloader
blog-post

Stake pool maintenance.

I took some days of from my day job to perform maintenance on the stake pool that requires a bit more time than the normal daily operations.

This is the TODO list:

  • Look into sending slot info to pooltool.io
  • performance tuning
  • Add the UPS to monitoring and alerting
  • build a riscv64 version of GHC 8.10.7
  • Rebuild cardano node 1.30.1 and dependencies using GHC 8.10.7
  • Documention
  • Website update
  • Upgrade cnitools

pooltool.io

A few weeks ago I started reporting block heights to pooltool.io Using cncli that is pretty easy if your producer node is allowed to connect to the internet.

Sending slots to pooltool.io turned out to be a bigger challange. The problem is not that it’s hard to configure or setup. The problem lies with the cardano-cli which is used to retrieve the poolStakeMark and activeStakeMark values the cncli needs calculate the leader log. Running cardano-cli query stake-snapshot ... will put some load on the cardano-node as it will temporary need about 1GB memory and the cardano-cli needs about 5.6GB of memory to run to completion. When I ran the command it resulted in the cardano-node process being OOM killed by the kernel.

It might be wise to limmit the amout of heap the cardano-cli can allocate by adding the -M option, 5GB should be enough.


 cardano-cli -RTS -M5G +RTS query stake-snapshot ...

You can tell the kernel to not kill the cardano-node and try and kill another process first in case it runs out of memory.


 echo -15 > /proc/$(pidof cardano-node)/oom_adj

If the cardano-node is not shutdown cleanly the complete immutable and volatile databases will be validated on the next start of the cardano-node, this can take considerable time.

Tuning GHC garbage collection

In the last couple of weeks I have been adding more metrics to graphana specifically to get a better understanding of garbage collection. The number of minor and major garbage collections can vary greatly depending on the GHC settings, major GC can lead to missed leader slots on the producer node. It turns out that saving ledger snapshots is generating a considerable amount of garbage I might result in multiple major GCs. The number of peers the node is connected to also affects the number of major GCs.

UPS monitoring and alerting

The complete stake pool including the buildserver that doubles as the testnet stake pool is now on the UPS. The UPS is now also partly observable via prometheus and is available to grafana and alertmanager. Unfortunately the LOADPCT is not reported by apcupsd for my UPS so I can’t graph the load over time. On the display of the UPS it varies between 97W and 100W.

GHC 8.10.7

I plan on writing another blog post after the cardano-node is rebuild with the new GHC and tested.

Related Articles

blog-post

First Block!

We made our first block Hooray!! For every stake pool operator it’s a happy and maybe a bit of a nerve wrecking …

Delegate to RV64 Keeping Cardano decentralized

Centralization has many forms, instruction set architecture is one, currently around x86 and ARM. This pool aims to add the RISC-V ISA as an extra choice.

Delegate to RV64