The problem A problem I discoverd after compiling the cardano-node for the first time is that the node would simply …
Why is memory usage important?
Can’t we just put more memory in the node, isn’t memory is cheap? The official Cardano node is written in Haskell. The Haskell runtime uses a garbage collector to reclaim memory that is no longer in use by the program. What is important to remember that garbage collection takes time and more important that the whole program is paused during the garbage collection. During garbage collection memory is also compacted which requires copying memory from one location to another. We’re talking gigabytes here rather than megabytes.
Every slot the node will check if it is the slot leader and is allowed to mint a block. If the node is paused because a garbage collection is needed it can’t mint a block.
The GHC runtime has a lot of parameters you can use to configure the garbage collector. What combination of parameters and values to use is difficult to say as it depends on many factors. The factors include memory available, CPU speed and type, how quickly memory can be copied, the role of the node, maybe even the OS you run on. Luckily the GHC runtime statistics are exposed as prometheus metrics so we can gain some insight.
These metrics are the once to keep an eye on:
rts_gc_gc_cpu_msThe amount of cpu time in milliseconds spend in garbage collection, when multiple cores are performing garbage collection this will be the sum of time spent on each core.
rts_gc_gc_wall_msThe actual time that has passed in milliseconds in garbage collection.
cardano_node_metrics_RTS_gcMinorNum_intsThe number or minor GC pauses.
cardano_node_metrics_RTS_gcMajorNum_intThe number of major GC pauses.
cardano_node_metrics_slotsMissedNum_intThe number of missed slot leader checks.
After collecting metrics for a few weeks it was time to adjust the GHC runtime parameters. That helped to reduce the number of major GCs significantly. Some GCs still remain, information gained from the metrics gave some hints where to look.
The first thing I noticed that after a node had run for about 12 hours GC paused would occur at regular intervals. This turned out to be the creation of ledger snapshots. By default a node will save a snapshot of the ledger state every 72 minutes. These snapshots should help a node start quickly by loading the ledger state from disk and reapplying only the transactions after the snapshot was created. The ledger state is about 1GB in size on disk. To store it on disk it is encoded in CBOR encodeding. During the save of the snapshot roughly 3 times it’s size in gargabe is created.
The second memory related issue I stumbled upon is the memory required in running
cardano-cli query stake-snapshot --stake-pool-id REDACTED --mainnet. This command will require about 1GB of memory in the cardano-node to run. But the
cardano-cli process it self will need between 5-6GB.
The HiFive Unmatched has 16GB of RAM on the board and with my GHC parameters the node uses up roughly 10GB. Using 6GB for the cli to create the stake-snapshot and the cardano-node peaking at 11GB resulted in the cardano-node being OOM (Out of Memory) killed by the kernel.
It takes a lot more time to start a cardano-node that has not been shutdown cleanly, and this will get slower over time because of the growth of the chain.
You can prevent the cardano-node processes from being kill first when memory runs out by running:
echo -15 > /proc/$(pidof cardano-node)/oom_adj.
The kernel will try to kill some other process to free up memory first if it can.
We can limit the maximum heap to 6GB for the cli by adding some GHC RTS parameters like this;
cardano-cli +RTS -M6G -RTS query stake-snapshot ....
One thing I am curious about is the memory copy performance of the SiFive Freedom U740 Soc that the HiFive Unmatched has. Usually accessing memory that is aligned on word boundaries is much faster on these kind of architectures. I don’t know if the GHC runtime takes this into account.
Something I want to dig into deeper.