Outage: Master LDAP (resolved)

Mon 20th May 15:43 - 19:55

Services Affected: All services

Description: Master LDAP Test DOWN and the DNS servers status DOWN , our operations team are working on it.

UPDATE: LDAP and services are now restored. We apologize for the inconvenience here.

UPDATE Post-mortem 5/23/2013:

Outage Duration- 15:28 UTC  – 15:35 UTC, 16:11 UTC –  18:08 UTC, 18:50 UTC - 19:09 UTC  [ Total 2.13 hrs ]
 
Problem statement –  Access to most of CloudForge services was affected due to LDAP server non availability in 3 different time periods between 15:28 – 19:09 UTC. The initial outage due to a Chassis failure in our datacenter had a ripple effect on our LDAP server accumulating too many connections which required the server to be rebooted. Our backup server due to misconfiguration, mirrored the problem faced by primary server. Recovery took more than expected time due to clean up needed in database.
 
Remediation - CollabNet corrected the configuration which will allow additional connection & avoid the problem in the future. Additionally CollabNet will also be work with the hosting service provider to re configure the server architecture with the aim to provide better fail over performance.
 
Regards,
Mohan Achar

Have more questions? Submit a request

50 Comments

  • 0
    Avatar
    Shannon Massman

    Your "99.9% uptime" is ticking. You will soon consume the entire year's commit.

  • 0
    Avatar
    Richard Cook

    or your .1% downtime is ticking, glass half full?

     

  • 0
    Avatar
    Edward St. Lawrence

    In one hour I switch to github.

  • 0
    Avatar
    Shannon Massman

    My mistake, .1% was fully consumed the first week of the year: https://help.cloudforge.com/entries/23489316-Outage-production-database-db01-is-now-resolved

  • 0
    Avatar
    Richard Cook

    i should have said your .1% downtime is tocking ... if uptime ticks, downtime tocks?

    anyway, tick tock ...

     

  • 0
    Avatar
    Akshay Jain

    Urgent needed to sync with SVN . What is the expected time to fix this issue.

  • 0
    Avatar
    Brian Comer

    Have a build I need to sort in the next hour to ship out heh. Would also be interested in how progress goes :)

  • 0
    Avatar
    Richard Cook

    when they don't update progress stats that's a pretty good indication of how progress goes ... it doesn't ...

  • 0
    Avatar
    Jonathan Peterson

    Oh give 'em a break. They're working on it and when they're done, they'll let us know.

    Haven't you ever been there? I know I have.

  • 0
    Avatar
    Shannon Massman

    Yes, we've been there; at an organization that doesn't have the availability they advertise, and those companies deservedly lose business. We aren't hacking on individuals, rather the organization that made claims that aren't met. We aren't saying people aren't doing the best with the resources they have currently, we are saying that after the organization already surpassed their downtime commitment in the first week of the year, they should have reset expectations or found other means to improve uptime.

  • 0
    Avatar
    Shannon Massman

    Also, there's no excuse for not providing status updates when their core service is out. That's something that any person in the whole organization can do. Get the freaking owner on the web to come post here every half hour until it's fixed. Do it now.

  • 0
    Avatar
    Dmytro BAZULIN

    0.1% is about 45min, so everybody should be requesting 7 free days because of this. I doubt they will drop below 97% though, it's about a day-long outage. But at that point I will not be looking for 2 free weeks of service, I will be switching.

     

  • 0
    Avatar
    Gregg Cirielli

    Was up for 5 mins for me, then down again.

  • 0
    Avatar
    Jesse Yowell

    Sorry all, obviously this has been a huge pain for everyone. We didn't expect the LDAP slave to fail when our main LDAP server was unreachable. Shannon is correct; there is no excuse. Unfortunately, a lot of the time was waiting for engineers to resolve this, and by the time I sent out an ETA update to the support staff the problem was solved roughly 15 minutes later. We apologize that it took so long to resolve something that could have been easily avoided.

  • 0
    Avatar
    Edward St. Lawrence

    The update says all services are available- Git is still down for me. Is this working for other people yet?

  • 0
    Avatar
    Jesse Yowell

    Edward,

    Strange.. can you try again real quick? I'm hoping there just needed to be a services resync

  • 0
    Avatar
    Gregg Cirielli

    Still down for me, and my company.

  • 0
    Avatar
    Akshay Jain

    same here..SVN is not still working for me . 

  • 0
    Avatar
    Jesse Yowell

    Looks like I spoke too soon.. once was up, now is down. Sorry all.. I've alerted the engineers.

  • 0
    Avatar
    Anton Matosov

    Git is down for me, either!

  • 0
    Avatar
    Edward St. Lawrence

    Just tried again, pull/push hangs. My build server isn't able top connect either (see attachment) and even the cloudforge web interface (viewvc) hangs.

     




    gitnowork.PNG
  • 0
    Avatar
    Paul Bennett

    SVN is still not responding for me

  • 0
    Avatar
    Travis Romney

    SVN is still broken for me too

  • 0
    Avatar
    Jesse Yowell

    We've updated the status once again..I'm sorry to have misled. Apparently it was only up and healthy for a good 5-10 minutes. Our engineers are investigating as we type this.

    Jesse Yowell

    Support Engineer

  • 0
    Avatar
    Andy Hsiung

    Svn works for me from the command line. However, if I want to browse the repository, it won't connect.

  • 0
    Avatar
    QMX Development

    Unfiortunately, this is not the first time.  I have been a customer since the "Dude" days and now I am looking elsewhere.

  • 0
    Avatar
    Rafal Luberda

    OMG, I just sign up and they have LDAP crash....  FAIL OF DAY

  • 0
    Avatar
    Gregg Cirielli

    Just saw some signs of life again....  time will tell if steady or not.

  • 0
    Avatar
    Jesse Yowell

    Engineering said Main LDAP was brought back up. We're hoping it stays steady.. sorry for the eventful morning, everyone

  • 0
    Avatar
    Patrick Oleary

    Jesse it's still not working - svn is out for the count 

Article is closed for comments.