Monzo service failure explanation

It’s probably best I say nothing 🤷😂

https://community.monzo.com/t/we-had-issues-with-monzo-on-29th-july-heres-what-happened-and-what-we-did-to-fix-it/75903

3 Likes

They’re not having much luck with outages/issues at the moment :see_no_evil:

I did laugh at that explanation, I figured out the problem after the first few lines, it was so obvious that the new servers had gone live., I like the fact they totally discounted that as the problem straight away. Most people work on the basis of, we have just done something new, and now we have problems, the new thingvis the problem. Where as that explanation was, yes we did something new but it can’t be that.

4 Likes

Yes, this was what really struck me. What a weird approach.

In every company I’ve worked for the approach has always been to immediately roll back any changes if issues occur right after deployment.

On a different matter: anyone else not seeing images in that post?

2 Likes

In every company I’ve worked for the approach has always been to carry out changes to core systems out of hours to minimise business impact. Who on earth approves this kind of change for a Monday lunchtime…:rofl:

2 Likes

A DBA massively fucked up

Time for everyone to apologise to us lowly software engineers :joy:

Nope. The engineers didn’t read the documentation. Can’t blame the DBA for that.

Was the DBA on holiday?

1 Like

When I want to deploy a new server, I specifically have to explain what my rollback plan is if things go wrong.

That said, I think they were pretty unlucky. Sure, they didn’t properly understand the intended behaviour of the database they are using - but I’ve seen worse blunders, just thankfully on a much, much smaller scale.

3 Likes

Fair enough. I misread DBA :unamused:

Is it me or do Monzo seem to employ well trained monkeys or straight-from-local-FE-college as their devs and engineers? They seem to have never heard of test test test and then test again.

New servers during working hours, a half-baked app that seems to be constantly riddled with bugs. What next?

2 Likes

I can see why they would have thought that simply adding a new node to a cluster would have had virtually zero affect on their systems.

Would have liked to have seen such an action running against a test environment before they did it in anger though.

1 Like

Theory and practice are two different things. Everyone should know that.

I agree - their logic makes sense as far as deploying the new servers and testing etc. Where they’ve fallen down is the immediate roll-back plan and making assumptions to rule out the changes that were made from their troubleshooting - not actually confirming that what they thought was happening was actually happening.

They did though

Do they need a better test environment for that particular factor - yes

Did they not test this - no

Not directed at you Liam, but there is some amazing expertise after the event on this thread seemingly going to waste not scooping up the big bucks on offer at Monzo for those in the know

Maybe we could all do that. I fancy a crack at running their marketing, at least things would get a little less ugly, I also have experience at creating a company which makes a significant loss every year, :rofl: although at least unlike Monzo that one is nearing profitability, and the others are wildly profitable.

That will be my answer to “Why do you think we should hire you?” “Because unlike the people who currently work here I don’t know how to waste vast amounts of other people’s money, and can actually make a company profitable, which is quite important in business so I understand…”

2 Likes

What people have said so far is don’t make changes to core systems during peak hours and always assume (or at least don’t discount) that the change you have literally just made is the cause of a subsequent issue.

You don’t need hindsight for this, it is standard stuff. If Monzo don’t know this, or think they’re above it, then that’s a cultural problem and these things will keep happening…

5 Likes

Which is why most banks make their changes at 2am on a Sunday morning!

5 Likes

Yup, and undoubtedly exactly the reason why there is currently a planned Starling maintenance period commencing at 03.15 and due to complete by 06.15 overnight tonight.

I’ve just read Dan Mullen’s comment on the other side:

It absolutely isn’t. On occasion, when making big changes, you need staff in overnight.

I’m now just waiting for the Monzo apologists to flag him and tell him he’s talking utter horse crap, which of course he’s not, but someone will be thinking he is.

2 Likes

I’d actually prefer they always kept their systems up.

People travel you know - 2AM here on a Sunday morning is 10AM in Korea or Japan (two very tourist heavy destinations) as well as being later evening in some parts of the States.

Really disruptive when you are in one of those places and considering it’s entirely avoidable I don’t see why it still exists.

Your card should never be down unless MasterCard and Visa go down in general.