Autoscaling improvements

Hi, we have tried BETA of Autoscaling feature and we have some thoughts how to make it better. In its current setup its not really suitable for our production workloads. Here are some thoughts how to make it better: 1. Define separate scaling steps At them moment the scaling step is always 1. Going from M10 -> M20 which is not really suitable for burst loads where going one step up might not be enough. Same goes for rapid scaling down Example: Scale range = M10 - M50 Scale step up 4 = (M10 -> M50) Scale step down 2 = (M50 -> M30 -> M10) 2. Define custom timescale It seems that current setup is to start scaling down after 72h, and then repeat every 24h. Our system can be scaled down much more rapidly, when our burst load goes away its done for a few days, we know we can start scaling down after 12h and repeat every 6h. With current setup it will take 6 days to scale down from M50 -> M10. 3. Define custom scaling metrics and thresholds It seems that current system is not taking N of connected clients as a scaling metric. - when connecting from cloud functions its easy to have a lot of connections which are not draining CPU or memory, when we are scaled down to M10 that limit is only 200 - additionally we would like to scale up when our CPU limit is >50% which is not possible ATM Nice to have: 4. Time based scaling events scale up/down at specified time & day, useful for scaling up DEV/Research environments within working hours PS: writing a long post in this form is terrible

Attach files

Enter a subject

Guest

Oct 25, 2023

We've found when there is a sudden burst of activity that takes Atlas to 100%, the autoscaling fails because it relies on there being excess capacity to do the autoscaling, so scaling fails. Then you need to call Mongo support and have an engineer intervene. Exactly the situation that scaling is meant to prevent. They need to change this architecture, and also make scaling more configurable so you can take into account what you know about your workload.

Reply
Hide replies
Like

Guest

Jun 21, 2023

"Define custom scaling metrics and thresholds", is critical for us to be able to handle unpredictable data growths. Having the capability to set storage threshold to a lower value than the current fixed 90% would save us from Atlas downtime caused by disks getting full.

Reply
Hide replies
Like

Guest

Mar 21, 2023

Any update on custom and time-based auto scaling? Implementation of these features would move my team from AWS to Atlas

Reply
Hide replies
Like

Guest

Feb 28, 2023

Transparency as a feature: As a user of Mongo Atlas I've been for years I believe It's important to include in the activity feed the exact reason/criteria/trigger that made the cluster upgrade or downgrade, because without it, as per our experience, can be complicated, if not impossible, to understand why the cluster was upgraded/downgraded being hands tied to manage and make the changes required to efficient use of our database infra.

Reply
Hide replies
Like

Guest

Feb 25, 2023

+1 for "Time based scaling events" as it will be useful for DEV environments

Reply
Hide replies
Like

Guest

Aug 15, 2022

It is a needed one

Reply
Hide replies
Like

Guest

Jan 13, 2022

Is there any timeline to implement such functionality? It is a very important feature for us, as the current auto-scale functionality is not answering our needs.

Reply
Hide replies
Like

Guest

Jan 7, 2022

It's in top5 and been opened for 2 years+: any update / ETA on this subject ? Definitely need some way of configuring auto-scaling conditions and windows: - atlas is billed hourly - most usecases would need to upscale fast during peak hours, and downscale 1-2 hours after These conditions mean that if we have 1 hour of very high traffic per day (~30 hours per month), we'd have to pay for the full month (~720 hours) or 24 times more.

Reply
Hide replies
Like

Guest

Jun 9, 2021

Must required feature in case of real time applications.

Reply
Hide replies
Like

Guest

May 3, 2021

Hi Jonathan, The way Atlas cluster tier auto-scaling works is that you select the maximum tier you're willing to be scaled up to. In other words what you're looking for is already there today. Cheers -Andrew

Reply
Hide replies
Like

Guest

Apr 30, 2021

We'd use more cluster tier autoscaling if it could be turned on with a maximum permitted tier. (upper limit to prevent runaway costs beyond some acceptable X.)

Reply
Hide replies
Like

Guest

Feb 26, 2021

This is a really good suggestion. Scaling up and down based on custom rules and times would ne a huge improvement. For example one hour to scale up is kind of longand and for us usually it is based on IO load and not so much CPU. For now cluster scaling seems to be more targeted at workloads with different CPU loads, but there is also definitely a need for IO based scaling. For example while IO load is low M20 is sufficient, when it increases M30 with provisioned IOPS at different levels. That is something we would like to automate instead of performing manually.

Reply
Hide replies
Like

Guest

Jan 13, 2021

AWS now also offers IOPS scaling. Those interested in this feature should also vote for that one: https://feedback.mongodb.com/forums/924145-atlas/suggestions/42288652-aws-ebs-gp3-volumes

Reply
Hide replies
Like

Guest

Sep 16, 2020

THIS!

Reply
Hide replies
Like

Guest

Feb 25, 2020

Our workload is highly predictable. We serve K-12 students. For 8 hours, M-F we have very heavy loads. Evenings, weekends, holidays and summers we have nothing. I'd like to +1 on the time-based scaling... but only as substitute for better granularity on perf metrics. It would be better to trigger scale up/scale down on IOPS or Ops. Ops is the better metric b/c it does not change when the scale changes. (whereas read iops can drop precipitously after a scale up) For instance, scale up when OPS hit 500, 1000, 2000. To scale down, you could specify these metrics as pairs. 500,100 => scale up when hit 500, down when fall back to 100. 1000, 500 => scale up when hit 1000, down when back to 500. 2000, 1000 = > scale up when hit 2000, down when back to 1000. Or, just take single points... and 100, 500, 1000, 2000 and infer the scale down from the previous up point.

Reply
Hide replies
Like

Guest

Oct 18, 2019

Hi Rez and Andrew, I have sent you an email, looking forward to speaking with you.

Reply
Hide replies
Like

Guest

Oct 17, 2019

@Marek - Thanks a lot for the feedback! Would love to chat with you more about this in person. Do you mind shooting me an email at rez@mongodb.com ?

Reply
Hide replies
Like

Guest

Oct 17, 2019

Hi Marek, Thank you so much for this detailed suggestion. We will likely reach out to you to get a chance to speak to you in more detail. You've got great ideas here--I think you can tell that our initial auto-scaling capability is definitely very conservative and you bring up great examples of use cases that we need to better address in the future. -Andrew

Reply
Hide replies
Like

Please enter your email address

RELATED FEEDBACK

Autoscaling improvements