What is my next best option if I currently have one RDS instance, db.t3.medium running Aurora MySQL?
I currently have one RDS instance, db.t3.medium running Aurora MySQL.
Due to increased workload (expected), the CPU is, on average, running just over its baseline and therefore in about 10 days will run out of CPU credits and then start getting expensive. The 15-minute average CPU Utilisation is around 22% (baseline is 20%).
RDS doesn't support any larger db.t3. versions for Aurora MySQL so I either need to change instance class, or move to Aurora Serverless, or look at a second instance alongside. I'm not sure what's going to be the best move so I'm keen to get some advice.
Second Instance
Up until now I've avoided running a second db instance because:
I don't fully understand how to set it up correctly to share the workload (adding different endpoints for read vs write in my applications would be tricky)
I'm not familiar if this would actually share the load like an EC2 load balancer does
most of my MySQL CPU usage is from reading (relatively little write), so I don't predict splitting it to a reader and a writer helping too much (the reader will probably still run high). But my understanding around how this all works is patchy.
I'm aware of the pitfalls of a single instance around availability/failover etc so it would be nice to mitigate this as a bonus from this upgrade.
Switch Instance Type
If I switch instance types, a db.r5.large is twice the cost of a db.t3.medium but still only has 2 x vCPU. It has a lot more Memory (16 v 4) but my issue is CPU.
I'm not sure how the ECU (the actual relative measure of processing power) compares on db.r5.large vs db.t3.medium as it's documented as 'variable', so not sure if I'm comparing apples with oranges between the r5 and the t3.
db.r5.large ECU = 10
db.t3.medium ECU = Variable
The db.r5.xlarge has 4 vCPU but would quadruple my RDS costs.
Sidenote: I have 9 months left on a t3.medium reserved instance, so would lose that if I do anything other than setup a second t3 instance.
Serverless
Based on the fairly vague documentation around serverless resources (1 ACU has approximately 2 GB of memory with corresponding CPU and networking, similar to what is used in Aurora user-provisioned instances.), I would guesstimate 2 ACUs is the equivalent of a db.t3.medium, so 4 ACUs would give me double the CPU as my current setup, for roughly the same cost as a db.r5.large.
Serverless would throw in the added benefit of built-in availability and replication. The RDS load is fairly steady, rather than spiky, so I wouldn't benefit from any time the Serverless scales down to 1 ACU for example.
What's best?
What would you do, and why?!
Rate Per Second = RPS
Suggestions to consider for your t3 medium AWS Aurora instance in your Parameter Group
read_rnd_buffer_size=131072 # from 524288 to reduce handler_read_rnd_next RPS of 98,949
read_buffer_size=512288 # from 262144 to reduce handler_read_next RPS of 1788
log_output=TABLE,FILE # from TABLE only to have useful visible Slow and General logs
thread_cache_size=32 # from 2 - not sure how you are getting by with only 2
query_cache_type=OFF # to eliminate CPU cycles used for QC mgmt 24 times a SECOND
query_cache_size=0 # from ~ 84M to no RAM for Query Cache
For additional suggestions, view profile, Network profile for contact info and free downloadable Utility Scripts to assist with performance tuning. Many more Global Variables should be adjusted. This is only a start. Another observation, what possible reason do you have 1 million handler_rollback events a day, average?