Skip to Main Content

MongoByte MongoDB Logo

Welcome to the new MongoDB Feedback Portal!

{Improvement: "Your idea"}
We’ve upgraded our system to better capture and act on your feedback.
Your feedback is meaningful and helps us build better products.

121 VOTE
Status Submitted
Categories Atlas
Created by Guest
Created on Jul 22, 2020

Force restart of unhealthy node

Right now, there's a "test failover" option, which shuts down the primary and forces an election. However, the option is only available if the cluster is in a healthy state. If, for whatever reason, the cluster is unhealthy, it's impossible to manually restart the primary. It should be possible to force an election in an unhealthy state. Often, this is all that is required to get back into a healthy state (e.g. if the primary is in a CPU burning loop that was caused by an unexpected write pattern that has stopped.)
  • Attach files
  • Guest
    Jul 10, 2025
    We've been waiting 18 hours for support (which we are paying good money for) to reboot our cluster after a large operation caused the dirty cache fill ratio to jump to over 20% which completely CPU locked a secondary node evicting pages. We can't scale it up ourselves, we tried and it is simply stuck and the operation failed. It's completely ridiculous that the best they can suggest for self service is to "Test Resilience" which causes a primary failover which their UI blocks you from doing in many cases (such as the one we currently find ourselves in). I will never recommend Atlas again until this is resolved. This suggestion has been open for nearly 5 years so I wouldn't hold my breath.
  • Guest
    Dec 27, 2024
    We have faced this problem many times in our production server, no way to come out of 100% CPU burn problem unless someone from support team restarts our server for us. Causing almost 2-3 hours of downtime for our servers.
  • Guest
    Oct 25, 2023
    We also have this issue, where if the primary gets loaded, scaling doesn't have capacity to scale, and it just gets stuck, and you have to call support while your application is effectively down or struggling. We've just scaled up our instances to overcome this, but it's a massive waste of resource to keep them scaled up so much because scaling can't respond under load and isn't configurable. We've started to look at other DB solutions with more effective scaling. It's unbelievable that 'dark mode' for the UI is being worked on, while critical issues with scaling that cause outages are not.
  • Guest
    May 12, 2023
    We have faced this issue multiple times when the primary got loaded and tried to upscale the instance its become unresponsive, this time we need to take help from the support team which is again process based task, if we have control to restart the node, it would be faster than what we are facing right now.
  • Guest
    Oct 29, 2020
    This is a badly needed feature. The only solution to force an election is to contact Mongo support and wait upwards of 2 hours so that they can force a restart of the process on the unhealthy node. This has happened several times since we've started using this service and it's getting to the point now that we may need to start looking at alternatives because we might lose customers due to a lack of confidence in the system being available.
  • Guest
    Oct 15, 2020
    This is definitely a useful feature that should get implemented. Having to wait for support to take care of restarting a faulty node increases MTTR which could have a huge impact in averting a disaster or at least mitigating it quicker. Please consider implementing this.
  • +21