Sunday 3 January 2021

Debugging .1x / RADIUS on 3850 Switches

It used to be you could simply enable the AAA and/or RADIUS debugs & review the outputs at your leisure via your favourite syslog analysis tool, such as Splunk or ELK stack. But on the 3650 & 3850 switches that's not the case, you have to use traces which aren't sent to syslog. Amended process below, though do note that the commands changed between IOS XE 3.x & 16.x releases.

 

First of all enable rotating the traces between files, so that you don't overwrite the outputs accidentally:

request platform software trace rotate all

Enable the traces that cover .1x, AAA & RADIUS:

set platform software trace smd R0 radius debug
set platform software trace smd R0 dot1x-all debug
set platform software trace smd R0 auth-mgr-all debug
set platform software trace smd R0 epm-all debug


Reproduce the issue & view the last 1,000 lines of traces:

show platform software trace message smd switch active R0

To view more traces requires exporting the traces for a set time period, then uploading off the switch:

request platform software trace archive last x days target flash:blah
copy flash:blah ftp:


Once you're done, return the traces to their usual state:

set platform software trace smd R0 radius notice
set platform software trace smd R0 dot1x-all notice
set platform software trace smd R0 auth-mgr-all notice
set platform software trace smd R0 epm-all notice



Which segways into why I'm writing about this...Had to diagnose wired .1x authentication failures that turned out to be a bug where a 3850 with equal cost uplinks (pretty common!) will chew up some of the RADIUS Access-Requests. Fun part is the RADIUS server never replies to the broken RADIUS message, so the switch then thinks the RADIUS server is timing out & marks it dead, which you'll see along with incrementing timeouts under show aaa servers. This was fixed in IOS XE 16.9.6.