Background: Over the past 6-9 months we have been planning for our Technology Refresh migration from Cisco Unified Contact Center Enterprise version 8.5 to version 9.0. This is both an upgrade and a hardware refresh to virtualized servers. The upgrade is being done to maintain software currency (in the Cisco Collaboration application space you should always know when your next upgrade is planned) as well as allow us to upgrade our CUCM cluster to version 9.X. We also want to eliminate our legacy Cisco MCS servers and migrate everything to virtual servers. Continue reading
This is the first in a series of (long overdue) posts related to odd bugs and behavior experienced in the Cisco Unified Border Element (CUBE) which is built into Cisco IOS. I will spare you all the details, but high level our environment looks like this:
- Cisco Unified Communications Manager (CUCM) – multisite deployment with centralized call processing with geographical diversity
- Contact Center – Cisco CVP including Call Studio, UCCE, Nuance ASR/TTS, Cisco Unified Presence Server (SIP Proxy)
- SIP Trunks with CUBE for Local/Long-Distance and Inbound Toll-Free
Recently, at work, we have had two separate instances with our SIP Service Provider where both their primary and secondary Acme Session Border Controller (SBC) clusters went into a “hung” state and we were off the air from the outside telephone world’s perspective. Despite all the provisioning precautions of having two geographically diverse carrier SBCs accessed from two geographically diverse MPLS transport circuits (used exclusively for SIP trunking) that route to two geographically diverse data centers with a dedicated CUBE router in each, we were still hosed. Doing a quick packet capture on the CUBE’s external interface we could see the provider’s SBCs were responding with SIP 503 “Service Unavailable” messages for every call attempt we made outbound. Inbound calls resulted in an “All Circuits Busy” message to callers and nothing was signaling ingress to our CUBEs from the provider.
Ok, it is another post from the network engineering voice trenches. We have been working the past 19 months (longest project ever) with a major carrier to get their SIP trunking solution in place to eventually replace our tons of standalone PRI and NFAS T-1 Circuits. We have had more than our fair share of problems along the way and, I promise, some day I plan to share some of our
horror stories experiences, but I will save that for later.
First, a little background. Our reasons for invetigating SIP trunking was not one of cost savings — which is what most carriers try to push when they come to talk with you — but rather one of redundancy. Redundancy for our high value phone number blocks. These not only include the toll-free numbers that route into our contact center for our customers (which are already very redundant thanks to advanced feature capabilities not available on normal PRIs), but more so for our DID (or DDI if you prefer) phone numbers that power outside communications for our back office employees.
Yesterday was a big day for us voice geeks at work. We did both a 7.1(5) to 8.6(2a) migration on Cisco Unified Communications Manager (CUCM) and an 8.0(3) to 8.5(3) migration on UCCE. This upgrade was the last in several days of upgrades to get to the most recent releases on these products. The CUCM upgrade went well (the 8.6 install process is much different than other CUCM releases, but it’s documented well). The UCCE upgrade also went fine, well, until we started to test call routing to agents…
This post is a deviation from my typical aviation topics towards a problem experienced in my work life. In my day job, I do not fly airplanes, I work as a Network Engineer where I work on a team that supports data and voice network infrastructure components for a company in town with a global presence. If you have said or heard the saying, “the network is slow”, I work on the team that handles redirects those problem reports daily.
This morning started off with a trouble report that outbound calls to our external conference bridge number were resulting in a “fast busy” after 2-3 minutes of being on a call. Of course there also happened to be a corporate wide “all managers” meeting this morning that was using the bridging service which increased the urgency greatly. Upon testing, it was easily reproducible. Sure enough, the call would go “fast busy” at about 2 minutes 53 seconds.