Deployments stuck on Waiting

Hey Cleavr Team!

I had a client contact saying their website isn’t updating anymore. I’ve taken a look and it seems that the build times took a massive dive before freezing completely.

This is Astro being built and I know there has been issues with Astro before in the past. I was wondering if there is any ideas what may be causing this?

Thanks,
Tom

Hello @Tom,

Sorry about the trouble it has caused.

We’ll look into the issue and get back to you.

Hello @Tom,

Can you please use the option Cancel Deployments and re-deploy or use Force Deploy the recent deployment?

Do let us know if that works for you or not.

Thanks @anish,

My concerns is if this happens again and I’m on holiday for example. They have only had the CMS for a short time and this problem has cropped up instantly. Just looking for some reassurance really.

Thanks,
Tom

Hello @Tom,

That’s a very valid point and thank you for bringing it out. We’ll discuss it with the team and come up with some solution as soon as possible.

Thanks @anish for understanding.

What shall I do in the meantime? I can create a new environment and leave this alone so you can test it? Or will the queue also affect other sites from deploying?

Thanks,
Tom

Hello @Tom,

In the meantime, you can continue the deployments with the options mentioned above in the same environment.

I will only be able to answer this once we figure out what’s causing the deployments to be stuck in the waiting state for so long.

Okay thanks @anish,

To confirm you don’t need to use my environment to test what has caused it to freeze? I just wouldn’t want to clear all the deployments and it not give you chance to test it.

Thanks,
Tom

Thank you a lot for that @Tom. I really appreciate that.

You can just leave the old deployments (for us to test) as it is and continue with the new ones.

1 Like

Thanks @anish,

I have added a new site and created a new deployment. I’ve left the old one as it is for you to test. If you can see my account it’s “PoliNations Static (Froozen)”.

I look forward to hearing what the issue may have been. Keep up the great work, Cleavr is an amazing piece of software!

Thanks,
Tom

1 Like

Update:

The new app is also now stuck on waiting. Is there an issue with deployment hooks?

Hello @Tom,

I could see the first deployment is completed. Do you remember after enabling which hook the deployment started getting stuck or is there any thing else after which deployment started to get stuck?

Hi @anish,

So I think I understand the problem. So the “deployment trigger hook” which the CMS is firing, I had the CMS do this on any update. The issue is when there was repeaters, if was firing an update for each repeater item. This means, in this instance the “deployment trigger hook” was fired 3 times at the same time. This has caused a complete freeze.

This also means if there are multiple people editing the CMS at the same time, this issue could occur. I wonder if there is a “Waiting” or “In Progress” deployment, a new deployment would cancel that and start a new one. That way the queue doesn’t get rushed causing a backlog to effectively do the same thing.

I’m unsure what causes it to get stuck in “Waiting” however, you would expect it to make its way through the queue of deployments. There must be some sort a “rush” happening causing the system to panic and shut down it’s process?

Hope any of this is helpful,
Tom

EDIT:
This is still an issue with even two firing at the same time. Seems to be still happening now even after I’ve limited the amount of calls to the deployment trigger hook.

Hello @Tom,

We’re investigating the issue right now. We’ll try to find out why deployments are stuck in waiting and get it fixed as a first step so that you’ll be unblocked for now.

We apologize for the inconvenience caused.

We have found one path which could lead to this but in a rare case - the health checks we perform after each deployments could be waiting to complete and blocking the next deployment. We have already fixed the bug as currently in QA.

We are also thinking of a way to not fire deployments consecutively like what happened to you. If you have any suggestions on how to better handle it, we would love to hear them.

Sorry about the issue and thank you for your support and patience.

Thanks,

We are actually still having the issue and it’s delaying the website from launching. At the moment I have pressure on me to get this fixed.

I’ve upgraded the server to 2 vCPU hoping this will help the issue, but doesn’t seem to have an effect. It’s if you have two “deployment trigger hook” fired at the same time.

At the moment, any time they make CMS changes I have to manually deploy, I can’t babysit the server like this.

EDIT:
“We are also thinking of a way to not fire deployments consecutively like what happened to you. If you have any suggestions on how to better handle it, we would love to hear them.”

I think it would be fine to cancel any duplicated requests at the same time and only action the most recent. I can’t think of any example in which this would cause issues but it would prevent a rush from the CMS.

The CMS is Directus, basically it has webhooks but if you have repeaters for relationship data for example you have a list of “events” you also have a “featured events” relationship on the homepage. Adding new events or re-ordering the events on the homepage will trigger both an update for “homepage” and also “events”. This might be something I can fix on Directus side, but I’m unsure. It also means that there is a possibility of freezing deployments if they update content quickly which worries me.

Hello @Tom,

We’ve made some fixes but found out that wouldn’t fix your issue. We’re making another fix that we think will resolve the issue.

We’re really sorry for all the inconvenience it has caused.

Thank you so much Anish!

Just want to say that Cleavr is an amazing bit of software and I completely believe in it. The support is amazing and you guys are doing an amazing job. I really appreciate you looking in to this.