The way of operating a system nowadays is always in partial failure mode: some
part is always broken, but most of it works.
4 replies
Good Multi-AZ design means having enough resources in each AZ to handle the
additional failover traffic, which gets routed from the failing AZ.
Now he talks about how to decouple APIs from the underlying systems.
This is something where I still see a lot of resistance because many engineers
have limited experience with these designs and are used to interact with 10-20
year old synchronous REST APIs.
The more important and popular a service is in your system is, the more you need
to look I to catching it. Because if the backend for it goes down you don't want
to have your users hitting your failing services over and over.
If you add data to a database that is going to be mutated in the next 20-30
seconds it does NOT belong in a database. Put it in a memcache and persist it
once in a while.