Mon 20th May 15:43 - 19:55
Services Affected: All services
Description: Master LDAP Test DOWN and the DNS servers status DOWN , our operations team are working on it.
UPDATE: LDAP and services are now restored. We apologize for the inconvenience here.
UPDATE Post-mortem 5/23/2013:
Outage Duration- 15:28 UTC – 15:35 UTC, 16:11 UTC – 18:08 UTC, 18:50 UTC - 19:09 UTC [ Total 2.13 hrs ]
Problem statement – Access to most of CloudForge services was affected due to LDAP server non availability in 3 different time periods between 15:28 – 19:09 UTC. The initial outage due to a Chassis failure in our datacenter had a ripple effect on our LDAP server accumulating too many connections which required the server to be rebooted. Our backup server due to misconfiguration, mirrored the problem faced by primary server. Recovery took more than expected time due to clean up needed in database.
Remediation - CollabNet corrected the configuration which will allow additional connection & avoid the problem in the future. Additionally CollabNet will also be work with the hosting service provider to re configure the server architecture with the aim to provide better fail over performance.