Wednesday 28 December 2016

ESXi 6.0 & 6.5 VMware Tools Missing or Not Started

Ran into this one after upgrading my lab to ESXi 6.5, the VMware tools status was showing "Installed but not running" on the VMs:
Then after rebooting them (with "check and upgrade VMware tools before each power on" enabled) the VMware tools status went to "Not installed" on some of the VMs.

It turns out that CUCM, CUC, CIM&P & UCCX are all affected when v10.x of the VMware tools are in use, SELinux interferes with it & the resulting logs can also fill up the free disk space. Full bug details: CSCux90747
There's separate patches for CUCM, CUC, CIM&P & UCCX to resolve this, at the time of writing:

  • CUCM & CUC - ciscocm.VMwareTools2016c.cop.sgn
  • CIM&P - ciscocm.IMP_VMwareTools2016c.cop.sgn
  • UCCX - ciscouccx.VMwareTools2016V2.cop.sgn
At least for UCCX after the patch I still had to reinstall the VMware tools manually via utils vmtools refresh from the CLI after mounting the install ISO.

Wednesday 21 December 2016

Finding Lines With a Specified External Phone Number Mask

Via the power of SQL queries you can quickly determine what devices & lines have a specified external phone number mask. Handy for homing in on possible causes of calls with incorrect caller ID. Using like in the SQL query means that it's also possible to use certain wildcards (e.g. % being zero or more characters) to aid in the search.

select d.name, d.description, n.dnorpattern, dmap.e164mask from device as d inner join devicenumplanmap as dmap on dmap.fkdevice = d.pkid inner join numplan as n on dmap.fknumplan = n.pkid where dmap.e164mask like '%2081234567' order by d.name

The output will include the device name, device description, DN & external phone number mask of any matching devices & lines:

admin:run sql select d.name,d.description,n.dnorpattern,dmap.e164mask from device as d inner join devicenumplanmap as dmap on dmap.fkdevice=d.pkid inner join numplan as n on dmap.fknumplan=n.pkid where dmap.e164mask like '%2081234567' order by d.name
name            description     dnorpattern e164mask
=============== =============== =========== =============
SEPF09E636E5656 SEPF09E636E5656 1000        +442081234567
SEPF09E636E5657 SEPF09E636E5657 1001        +442081234567


Tuesday 1 November 2016

ISR 4000 Series RTP Port Numbers

You might bump into one way or no audio issues when deploying 4000 series routers in a locked down environment where firewalls or ACLs are heavily restricting traffic.
Most Cisco documentation specifies that RTP & RTCP traffic will use a dynamically chosen port number in the range 16384 to 32767, with RTP using an even port number & RTCP using the subsequent odd numbered port. However as of IOS XE 3.10.2 the 4000 series routers actually use the range 8000 to 48200 by default, fortunately this information is in the release notes. This change means that any ACLs that restrict traffic based on the 16384 to 32767, or firewalls that aren't H323, MGCP, SCCP or SIP aware may block the RTP audio packets.
If you're unable to get the ACL or firewall configuration updated, then as a workaround you can force the 4000 series router to use the same port range as older Cisco routers:

voice service voip
 rtp-port range 16384 32766

Note 32766 as the maximum as 32767 would be used for RTCP.

Friday 12 August 2016

ASA NAT Into VPN Tunnel

This scenario is sometimes needed when connecting via VPN to a 3rd party & a requirement is that IP addressing is unique. In this example a server (192.168.0.10) behind the ASA should be NATed to a public IP address (1.2.3.4) when communicating across the VPN, but PATed to the outside interface when communicating with the Internet. The local network is 192.168.0.0/24 & the remote network 172.16.0.0/24.


interface Ethernet0/0
 nameif inside
 security-level 100
 ip address 192.168.0.1 255.255.255.0
!
interface Ethernet0/1
 nameif outside
 security-level 0
 ip address 100.64.0.1 255.255.255.252
!
object network SERVER-INSIDE
 host 192.168.0.10
!
object network SERVER-NAT-IP
 host 1.2.3.4
!
object network REMOTE-NETWORK
 subnet 172.16.0.0 255.255.255.0
!
access-list VPN-TUNNEL extended permit ip object
SERVER-NAT-IP object REMOTE-NETWORK
!
object network NAT-LAN
 subnet 192.168.0.0 255.255.255.0
 nat (inside,outside) dynamic interface
!
nat (inside,outside) source static SERVER-INSIDE SERVER-NAT-IP

destination static REMOTE-NETWORK REMOTE-NETWORK
!
crypto ipsec ikev1 transform-set AES256-SHA esp-aes-256 esp-sha-hmac
crypto map OUTSIDE_MAP 10 match address VPN-TUNNEL
crypto map OUTSIDE_MAP 10 set peer 100.64.1.1
crypto map OUTSIDE_MAP 10 set ikev1 transform-set AES256-SHA
crypto map OUTSIDE_MAP interface outside
crypto isakmp identity address
crypto ikev1 enable outside
crypto ikev1 policy 10
 authentication pre-share
 encryption aes-256
 hash sha
 group 2
 lifetime 86400
!
tunnel-group 100.64.1.1 type ipsec-l2l
tunnel-group 100.64.1.1 ipsec-attributes
 ikev1 pre-shared-key Password123


The key is to use twice NAT so that the 192.168.0.10 address gets NATed only when destined for 172.16.0.0/24. The interesting traffic ACL for the tunnel then covers the 1.2.3.4 public IP address & the VPN will establish with traffic NATed in & out of it. Alternatively if we wanted the 192.168.0.10 address NATed to 1.2.3.4 at all times we could just use object NAT instead:

object network NAT-SERVER
 host 192.168.0.10
 nat (inside,outside) static
SERVER-NAT-IP

Tuesday 5 July 2016

Deprecated Phone Models

I guess it had to happen eventually! The CUCM 11.5 release notes state that the following phones are no longer supported & thus won't work:

  •  Cisco IP Phone 12 S
  •  Cisco IP Phone 12 SP
  •  Cisco IP Phone 12 SP+
  • Cisco IP Phone 30 SP+
  • Cisco IP Phone 30 VIP
  • Cisco Unified IP Phone 7902G
  • Cisco Unified IP Phone 7905G
  • Cisco Unified IP Phone 7906G
  • Cisco Unified IP Phone 7910
  • Cisco Unified IP Phone 7910G
  • Cisco Unified IP Phone 7910+SW
  • Cisco Unified IP Phone 7910G+SW
  • Cisco Unified IP Phone 7912G
  • Cisco Unified Wireless IP Phone 7920
  • Cisco Unified IP Conference Station 7935

Another thing to bear in mind is that CUCM 11.5 also won't allow the installation of patches that haven't been signed with the v3 keys (i.e.  ".k3." isn't in the filename).

Monday 27 June 2016

3850 Switch Upgrade Failure

Every so often a software upgrade fails, be it a corrupted download from Cisco's website or occasional hiccup on the network whilst copying it to the device. Rather usefully the 3650 & 3850 switches have a USB port on the front which can be used with a USB memory stick to recover from a failed upgrade. This a walk through of the process & a couple of obstacles along the way.

To enter ROMMON to recovery from a failed upgrade either wait for 5 failed boot attempts or hold the Mode button down for 10 seconds whilst the switch boots. The command prompt should be "switch:". Plug in the USB drive & you can view the contents via dir usblflash0:. According the official documentation you can at this point either boot from the USB drive or copy the image to flash, however testing with IOS XE 3.7.x the flash was only ever read only:

switch: dir usbflash0:
Directory of usbflash0:/

     4  -rw-  328157104  cat3k_caa-universalk9.SPA.03.07.04.E.152-3.E4.bin

7689236480 bytes available (328175616 bytes used)

switch: copy usbflash0:cat3k_caa-universalk9.SPA.03.07.04.E.152-3.E4.bin flash:
flash:usbflash0:cat3k_caa-universalk9.SPA.03.07.04.E.152-3.E4.bin: read only file system


If you boot off the USB drive via the boot command, you should be able to get to a point where you can remove the corrupt image & tidy up. However there's another hurdle if the IOS XE image includes a firmware update for the switch ASICs, as it will try to boot using the corrupt image on flash after completing the firmware update:

switch: boot usbflash0:cat3k_caa-universalk9.SPA.03.07.04.E.152-3.E4.bin
Reading full image into memory............................................................................................................................................................................................................................................................................................................................done
Bundle Image
--------------------------------------
Kernel Address    : 0x5344e290
Kernel Size       : 0x3fb4ba/4175034
Initramfs Address : 0x5384974c
Initramfs Size    : 0xd47999/13924761
Compression Format: .mzip

Bootable image at @ ram:0x5344e290
Bootable image segment 0 address range [0x81100000, 0x820b0000] is in range [0x80180000, 0x90000000].
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
File "usbflash0:cat3k_caa-universalk9.SPA.03.07.04.E.152-3.E4.bin" uncompressed and installed, entry point: 0x816734a0
Loading Linux kernel with entry point 0x816734a0 ...
Bootloader: Done loading app on core_mask: 0x3f

### Launching Linux Kernel (flags = 0x5)

All packages are Digitally Signed
Starting System Services
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=4,mode=600,ptmxmode=000 0 0


FIPS(NGWC): Flash Key Check : Begin
FIPS(NGWC): Flash Key Check : End, Not Found, FIPS Mode Not Enabled


Front-end Microcode IMG MGR: found 4 microcode images for 1 device.
Image for front-end 0: /tmp/microcode_update/front_end/fe_type_6_0
Image for front-end 0: /tmp/microcode_update/front_end/fe_type_6_1
Image for front-end 0: /tmp/microcode_update/front_end/fe_type_6_2
Image for front-end 0: /tmp/microcode_update/front_end/fe_type_6_3

Front-end Microcode IMG MGR: Preparing to program device microcode...
Front-end Microcode IMG MGR: Preparing to program device[0]...594412 bytes.... Skipped[0].
Front-end Microcode IMG MGR: Preparing to program device[0]...392946 bytes.
Front-end Microcode IMG MGR: Programming device 0...rwRrrrrrrw..0%.........................................................................10%........................................................................20%..........................................................................30%........................................................................40%........................................................................50%..........................................................................60%.........................................................................70%........................................................................80%..........................................................................90%........................................................................100%
Front-end Microcode IMG MGR: Preparing to program device[0]...25166 bytes.
Front-end Microcode IMG MGR: Programming device 0...rrrrrrw..0%....10%....20%......30%...40%......50%....60%......70%...80%......90%....100%wRr!
Front-end Microcode IMG MGR: Microcode programming complete for device 0.
Front-end Microcode IMG MGR: Preparing to program device[0]...86370 bytes.
Front-end Microcode IMG MGR: Programming device 0...rwRrrrrrrw..0%................10%...............20%.................30%...............40%.................50%...............60%.................70%................80%...............90%.................100%w

Front-end Microcode IMG MGR: Microcode programming complete in 291 seconds

% Front-end Microcode IMG MGR: HW image is upgraded. MCU reset causes the switch to reload


If this happens use the boot  command to boot from the USB drive again & proceed with tidying up as normal.

Friday 6 May 2016

7800 Series Phones Not Prompting For MRA Domain & User Credentials

I've seen this happen a few times after upgrading from 7800 series firmware 10.3.1 to the latest 11.0.1 firmware (which introduced official MRA support). Phones that could previously connect endlessly loop between "registering" & "not registered", they never prompt for the domain to resolve the _collab-edge._tls SRV record, nor do they attempt to resolve a cached domain either.
Correct behaviour is if DHCP option 150 isn't present or the TFTP server is unreachable, the phone tries to resolve the host name of the CUCM servers in its cached configuration file, if this also fails then the phone should determine that it should use MRA instead. Looking at the status messages on the phone confirms that the phone never gets beyond the lack of TFTP or CUCM connectivity:

[12:22:54,29/04/16] TFTP Error : SEPF09E636E5656.cnf.xml.sgn
[12:22:54,29/04/16] DNS Unknown IPv4 Host CUCM2.somewhere.com
[12:22:54,29/04/16] DNS Unknown IPv4 Host CUCM1.somewhere.com
[12:22:57,29/04/16] TFTP Error : SK93886772-e459-a46f-ff8d-e15df16797e8.xml.sgn
[12:22:58,29/04/16] Configuring IP
[12:23:03,29/04/16] No IPv4 TFTP Server
[12:23:03,29/04/16] Trust List Update Failed
[12:23:04,29/04/16] TFTP Error : SEPF09E636E5656.cnf.xml.sgn
[12:23:04,29/04/16] DNS Unknown IPv4 Host CUCM2.somewhere.com
[12:23:04,29/04/16] DNS Unknown IPv4 Host CUCM1.somewhere.com

Looking at the console logs on the phone also shows that the phone fails to connect to TFTP or resolve the server names but thinks it's not in Collaboration Edge mode:

1102 ERR Apr 29 12:23:27.529537 JAVA-CFG : config_process_ccm_properties : No valid IPv4 TFTP server configured
1103 NOT Apr 29 12:23:27.529953 JAVA-ccm0=CUCM2.somewhere.com ccm1=CUCM1.somewhere.com  ccm2= sip_port_0=5060 sip_port_1=5060 sip_port_2=5060 length=0
1104 NOT Apr 29 12:23:27.530006 JAVA-ccm0=CUCM2.somewhere.com ccm1=CUCM1.somewhere.com  ccm2= sec_sip_port_0=5061 sec_sip_port_1=5061 sec_sip_port_2=5061 length=0
1105 NOT Apr 29 12:23:27.530341 JAVA-ccm0_v6= ccm1_v6=  ccm2_v6= lenght=0
1106 NOT Apr 29 12:23:27.530382 JAVA-ip_mode is IPv4
1107 NOT Apr 29 12:23:27.530614 JAVA-config_process_ccm_properties: is in Edge mode = FALSE
1108 NOT Apr 29 12:23:27.597478 dgetserver(9651)-dgetserver.RESULT [_0_  [PASS] No error     ]
1109 NOT Apr 29 12:23:27.606755 downd-SOCKET accept errno=4 "Interrupted system call"
1110 INF Apr 29 12:23:27.630973 dnsmasq[2033]: query[A] CUCM1.somewhere.com from 127.0.0.1
1111 INF Apr 29 12:23:27.631110 dnsmasq[2033]: cached CUCM1.somewhere.com is NXDOMAIN-IPv4
1112 INF Apr 29 12:23:27.632905 dnsmasq[2033]: query[A] CUCM1.somewhere.com from 127.0.0.1
1113 INF Apr 29 12:23:27.633020 dnsmasq[2033]: cached CUCM1.somewhere.com is NXDOMAIN-IPv4
1114 ERR Apr 29 12:23:27.631788 JAVA-dnsGetHostByName :lookup CUCM1.somewhere.com failed.
1115 INF Apr 29 12:23:27.634020 dnsmasq[2033]: query[A] CUCM2.somewhere.com from 127.0.0.1
1116 INF Apr 29 12:23:27.634110 dnsmasq[2033]: cached CUCM2.somewhere.com is NXDOMAIN-IPv4
1117 INF Apr 29 12:23:27.634374 dnsmasq[2033]: query[A] CUCM2.somewhere.com from 127.0.0.1
1118 INF Apr 29 12:23:27.634565 dnsmasq[2033]: cached CUCM2.somewhere.com is NXDOMAIN-IPv4
1119 ERR Apr 29 12:23:27.634845 JAVA-dnsGetHostByName :lookup CUCM2.somewhere.com failed.

Eventually I stumbled across Admin Settings > Reset Settings > Service Name fixing the problem. Not exactly a clearly described setting, nor does it explain why the phone isn't getting as far as prompting for a domain, if it had cached an incorrect one I'd expect to see it trying to resolve the _collab-edge._tls SRV record for the incorrect domain.

Friday 1 April 2016

ATA 186 Refusing to Register to New Server

Whilst working on a CUCM merger project, the process below worked for moving regular phones from the 2 existing CUCM clusters over to the new CUCM cluster:
  1. Set DHCP scope option 150 to new CUCM server's IP address
  2. Bulk erase ITL files on all phones using a 3rd party tool
  3. Reset all phones
However all the ATA 186s flat out refused to register to the new CUCM despite multiple attempts. Looking at the ATA's built in web server showed the correct TFTP server had been picked up from DHCP. Eventually found that you can force the CUCM IP addresses to register to in the GUI:


Reset the ATA 186 after applying the change & voila it works!

Wednesday 6 January 2016

Restoring Publisher Database From the Subscriber

I've been meaning to write about this feature for a while now, since it's actually pretty handy. Cisco introduced a feature whereby you can restore the publisher's database from a subscriber server in v8.6(1) of CUCM & CUC. You still have to rebuild a new publisher identical to the failed one, then get the licences migrated (if applicable) & the method differs between CUCM & CUC. Having seen several scenarios where customers have either only the original DRS backup done at go live or out of date backups available, this will help you get out of a sticky situation & back to a working cluster without losing any configuration data.
The Cisco documentation is pretty good for these features, so I'm going to cover some caveats & defer to the official instructions. Be aware that there are quite a few steps that must be followed correctly otherwise you run the risk of pushing data from the blank publisher to subscribers, it also includes multiple reboots, so this is a time consuming process.

CUCM
The DRS restore mechanism has been enhanced so that you can choose to restore the CCMDB component from a selected subscriber server. Now the Cisco guide says you can do a fresh DRS backup from the newly built publisher & do the restore from a subscriber, but whenever I've tried the option never appears unless the actual backup contains files from the intended subscriber server. So you'll need a DRS backup that contains both the publisher & subscriber server.

The main caveat is that your new publisher won't have files that aren't in the backup or have changed since then, such as certificates, device packs, TFTP server files (e.g. phone loads, music on hold, etc.) or logs/traces. This is because this information isn't held in the Informix database, which is what gets copied from the subscriber, so you'll need to deal with these out manually.

Cisco Guide to CUCM Restore From Subscriber

CUC
CUC's clustering is different from CUCM, in that you still have a publisher & subscriber, but there's also the concept of the primary & secondary server. The primary server, regardless of whether it's the publisher or subscriber, contains a writeable configuration database & message store; normally this is automatically replicated to the secondary server.
After rebuilding the publisher you're basically rebuilding the cluster, with the subscriber as primary & manually starting the synchronisation process.

The main caveat is that the publisher needs locales & any other patches installing manually, as these aren't pulled across from the subscriber. The Informix database with configuration, message store, spoken names, greetings & master certificates are what gets copied from the subscriber. You'll still need to deal with certificates that were specific to the publisher server, such as the Tomcat certificate.

Cisco Guide to CUC Restore From Subscriber