This is the first in a series of (long overdue) posts related to odd bugs and behavior experienced in the Cisco Unified Border Element (CUBE) which is built into Cisco IOS. I will spare you all the details, but high level our environment looks like this:
Cisco Unified Communications Manager (CUCM) – multisite deployment with centralized call processing with geographical diversity
Contact Center – Cisco CVP including Call Studio, UCCE, Nuance ASR/TTS, Cisco Unified Presence Server (SIP Proxy)
SIP Trunks with CUBE for Local/Long-Distance and Inbound Toll-Free
Recently, at work, we have had two separate instances with our SIP Service Provider where both their primary and secondary Acme Session Border Controller (SBC) clusters went into a “hung” state and we were off the air from the outside telephone world’s perspective. Despite all the provisioning precautions of having two geographically diverse carrier SBCs accessed from two geographically diverse MPLS transport circuits (used exclusively for SIP trunking) that route to two geographically diverse data centers with a dedicated CUBE router in each, we were still hosed. Doing a quick packet capture on the CUBE’s external interface we could see the provider’s SBCs were responding with SIP 503 “Service Unavailable” messages for every call attempt we made outbound. Inbound calls resulted in an “All Circuits Busy” message to callers and nothing was signaling ingress to our CUBEs from the provider.
Ok, it is another post from the network engineering voice trenches. We have been working the past 19 months (longest project ever) with a major carrier to get their SIP trunking solution in place to eventually replace our tons of standalone PRI and NFAS T-1 Circuits. We have had more than our fair share of problems along the way and, I promise, some day I plan to share some of our horror stories experiences, but I will save that for later.
First, a little background. Our reasons for invetigating SIP trunking was not one of cost savings — which is what most carriers try to push when they come to talk with you — but rather one of redundancy. Redundancy for our high value phone number blocks. These not only include the toll-free numbers that route into our contact center for our customers (which are already very redundant thanks to advanced feature capabilities not available on normal PRIs), but more so for our DID (or DDI if you prefer) phone numbers that power outside communications for our back office employees.
Yesterday was a big day for us voice geeks at work. We did both a 7.1(5) to 8.6(2a) migration on Cisco Unified Communications Manager (CUCM) and an 8.0(3) to 8.5(3) migration on UCCE. This upgrade was the last in several days of upgrades to get to the most recent releases on these products. The CUCM upgrade went well (the 8.6 install process is much different than other CUCM releases, but it’s documented well). The UCCE upgrade also went fine, well, until we started to test call routing to agents…
This post is a deviation from my typical aviation topics towards a problem experienced in my work life. In my day job, I do not fly airplanes, I work as a Network Engineer where I work on a team that supports data and voice network infrastructure components for a company in town with a global presence. If you have said or heard the saying, “the network is slow”, I work on the team that handles redirects those problem reports daily.
This morning started off with a trouble report that outbound calls to our external conference bridge number were resulting in a “fast busy” after 2-3 minutes of being on a call. Of course there also happened to be a corporate wide “all managers” meeting this morning that was using the bridging service which increased the urgency greatly. Upon testing, it was easily reproducible. Sure enough, the call would go “fast busy” at about 2 minutes 53 seconds.
Today I did the second of two flights with a new instructor (to me) to complete an Instrument Proficiency Check (IPC). I did not bring a cockpit audio recorder with me so I had to go “harvest” what made it from the LiveATC.net audio archive. I pretty much got most of the departure audio from KOJC and the audio on the last leg returning to KOJC. There are radio calls that are missing just due to the nature of how scanners work and the number of frequencies scanned on the KOJC LiveATC.net feeder.
This instrument practice flight was 1.5 hours. We did four approaches including a hold over the TOP VOR. The route of this flight was KOJC -> KTOP (LOC BC 31) -> KLWC (VOR-A) -> KIXD (GPS 19) -> KOJC (LOC 18). Overall I was happy with my radio calls. I was a bit detracted setting up the approach on the G1000 when I made my call to KOJC for the LOC 18 approach, you will hear it in there towards the middle.
This audio was edited to trim out the long delays and non-essential transmissions, but the content of the transmissions was left unaltered.