Ubuntu 20.04 Docker image – Python For Network Engineers

This is an updated Docker image of Python For Network Engineers (PFNE) based on Ubuntu 20.04 (minimal server distro).

It contains all necessary tools for network / devops engineers to test automation and learn Python:

Openssl
Net-tools
IPutils
IProute
IPerf
TCPDump
NMAP
Python 2
Python 3
Paramiko
Netmiko
Ansible
Pyntc
NAPALM
Netcat
Socat

If you notice a missing package which could be a value added for the scope of the Ubuntu PFNE image, please let me know in comments below.

Before testing the new Ubuntu 20.04 PFNE Docker image, please pull it from Docker Hub:

docker pull yotis/ubuntu2004-pfne

To start using it:

docker run -i -t yotis/ubuntu2004-pfne /bin/bash

For more details about how to install, operate and create your own Docker images, please check my older article on How to create your own Docker image.

Cisco WLAP and WLC fail to create CAPWAP connection

Last days I’ve encounter an issue when some of the Wireless Lightweight Access Points (WLAP) just disappeared from the Wireless LAN Controller (WLC).

I saw before these kind of problems before and usually, whatever the reason, the WAP cannot discover the WLC. It was not the case now, everything seems to be in order both in IP connectivity and correct parameters to point the WAP to correct WLC.

Looking back now, the problem is generated by an obvious issue, but back then it took me a bit to troubleshoot the issue. I’ll share my findings so others can resolve it quickly in case they hit this problem.

The WLC logs didn’t point out an obvious reason. Maybe is due to log volume and the fact that this particular WLC had other WAP which were working fine. Just couple of them suddenly disappeared.
I went the other way, and start troubleshooting from the WAP. Once I got remote access to the WAP (yes, it had an IP address and was reachable) the logs showed something like this:

WAP#
*Oct 17 19:54:55.757: %DOT11-7-AUTH_FAILED: Station MAC_ADDRESS Authentication failed
*Oct 17 19:54:56.000: %CAPWAP-5-DTLSREQSEND: DTLS connection request sent peer_ip: WLC_IP peer_port: 5246
*Oct 17 19:54:56.352: %DTLS-5-ALERT: Received FATAL : Certificate unknown alert from WLC_IP
*Oct 17 19:54:56.352: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to WLC_IP:5246
*Oct 17 19:56:01.000: %CAPWAP-5-DTLSREQSEND: DTLS connection request sent peer_ip: WLC_IP peer_port: 5246
*Oct 17 19:56:01.364: %DTLS-5-ALERT: Received FATAL : Certificate unknown alert from WLC_IP
*Oct 17 19:56:01.364: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to WLC_IP:5246

Was obvious that something is wrong with the CAPWAP tunnel and seemed to be related to the DTLS since the certificate unknown alert was present.

I’ll spare you the research around to figure it out, but finally I came to find this Field Notice: FN – 63942 – Wireless Lightweight Access Points and WLAN Controllers Fail to Create CAPWAP Connections Due to Certificate Expiration – Software Upgrade Recommended – Cisco which in turn pointed to this BUG CSCuq19142. The BUG says that a WAP will fail to join a WLC if the SSC (self signed certificate) or MIC (manufactured installed certificate) has an expired date.

Going back to WAP CLI to check the MIC (SSC is not the case), it seemed that the suggested command “show crypto pki certificates” was not available. At least it seemed…

You need to add another command “debug capwap console cli” before to issue the “show crypto pki certificates“:

WAP# debug capwap console cli
WAP# show crypto pki certificates
!! removed output!!
Certificate
  Status: Available
  Certificate Serial Number (hex): HEX_VALUE
  Certificate Usage: General Purpose
  Issuer:
    cn=Cisco Manufacturing CA
    o=Cisco Systems
  Subject:
    Name: AP_NAME
    [email protected]
    cn=AP_NAME
    o=Cisco Systems
    l=San Jose
    st=California
    c=US
  CRL Distribution Points:
    http://www.cisco.com/security/pki/crl/cmca.crl
  Validity Date:
    start date: 07:21:37 UTC Oct 13 2012
    end   date: 07:31:37 UTC Oct 13 2022
  Associated Trustpoints: Cisco_IOS_MIC_cert
  Storage:
!! removed output !!

If you check the validity date, seems this AP had a 10 years anniversary, which is also the default expiration date for the MIC installed certificate.

Checking the Field Notice above, it recommends to upgrade the WLC OS, but a lot of OS versions are affected, so in the meantime I went with the suggest workaround:

WLC> config ap cert-expiry-ignore mic enable

The WLC will ignore the MIC with expired date and as result the WAP will immediately join the WLC.

I hope this basic explanation and the quick workaround will help somebody if they run into the same issue.

Nginx reverse proxy and Webmin

Before going into “How” you may wonder “Why” I need a reverse proxy in front of Webmin.

First, and most important, is laziness. Yes, you read it right. I have in my home lab a one page html listing all http(s) resources I have in my IT lab. Instead of typing numerous URLs I just type one and click the needed link. You may argue that I can use browser bookmarks, true, but I use the one html landing page to access various resources.

Second is a bit more realistic (at least professional realistic).

I’m using Sophos XG (home version) to access my home lab and other in-house smart devices when on the road. This product has a very nice User Portal feature where you can add various “bookmarks” to resources accessible via various protocols (rdp, vnc, ssh, http(s)…)

Recently Sophos decided to retire the http(s) bookmark feature “in order to improve security and reduce the potential for cross-site scripting (XSS) exploits”

In my opinion you work on features to improve security and fix issues, you just don’t retire them. If this would be the way, then let’s shutdown electrical grid, stop cars or terminate Internet and we’re all be more secure. But that’s just my opinion…

Anyway, this action leaves a gap in my happiness accessing my home IT resources. Sophos recommend using WAF, which is a good advice from security perspective, but I don’t plan to have 50 redirections (as in DNAT) from my public facing IP address / router to LAN just to access the various URLs I have in my home lab.

I plan to use one port redirection from Internet to a LAN hosted webserver (protected with WAF) and, you guessed, hosting page lisingt my home lab resources (in form of Webpage Links)

For this to properly work I need one just one domain / subdomain with various URI resources (e.g. https://mydomain.com/resouce1 , https://mydomain.com/resource2, etc…), hence the use of a reverse proxy.

Nginx reverse proxy is not something new and it works great in a lot of situations, but it gave me some headache with Webmin. After quite some research, I said let me put together a quick and dirty how-to in case somebody else needed it.

My scenario involves one server with Nginx as reverse proxy (https://mypage.local.lan) and one Webmin server (https://webmin01.local.lan:10000) for this example.

Http protocol is secured with SSL certificates issues by a LAN CA. In case you don’t have secure http, just make sure to replace https with http in the example below.

My Nginx SSL config is very basic at this point:

server {
server_name mypage.local.lan;
listen 443;

root /var/www/html;

ssl on;
ssl_certificate /etc/ssl/private/mypage.local.lan.crt;
ssl_certificate_key /etc/ssl/private/mypage.local.lan.key;
access_log off;
error_log off;
}

Next part is to add the reverse proxy configuration for https://webmin01.local.lan:10000 so it can be access via https://mypage.local.lan/webmin01

  location /webmin01/ {
    proxy_pass      https://webmin01.local.lan:10000/;

    #Proxy Settings
    proxy_set_header   Host             $http_host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

  }

Add the part above just before the closing } in the first Nginx configuration part.

Very important, don’t forget the trailing / after webmin01 in the location /webmin01/ line

This should satisfy the majority of scenarios where a resource is accessed via reverse proxy. However Webmin needs a bit more fine tuning.

Restart your Nginx service after modifying the configuration files.

On my webmin01 server, I needed to modify the following files part of webmin installation (btw, this is on Ubuntu 20.04).

/etc/webmin/miniserv.conf

Add or modify the following parameters:

cookiepath=/webmin01
trust_real_ip=1
logouttimes=

/etc/webmin/config

Add or modify the following parameters:

referers=mypage.local.lan
webprefix=/webmin01
relative_redir=0

Referers needs to list the URL from where the request comes from. This is par of the Webmin security avoid malicious redirects from untrusted locations.

Webprefix is for proper redirection of the response from webmin pages. A word of advice, once you modify this part, you may not be able to access the webmin installation directly (e.g. https://webmin01.home.lan:10000) since the it will expect a /webmin01 part in the URL which of course is not there on the webmin server.

Restart your Webmin service after modifying the configuration files

After the above configuration, I added on my one html page located on https://mypage.local.lan and link called Webmin01 (pointing to https://mypage.local.lan/webmin01).

Once I access that URL resource, I’ll be redirected to the login page of Webmin01 instance.

In case you give it a try, let me know if it works for you

Last but not least, I’ve did quite some research on this topic, but the best information was from the Github user 1985a and the folks at https://github.com/webmin/webmin/issues/420. Thanks a lot!

ESXi VM – The CPU has been disabled by the guest operating system

For some weeks now, a couple of my virtual machines on ESXi would stop working out of nowhere. They were completely unresponsive (including via the ESXi VM Console). Nothing would help, except a shutdown / start of the VM. Just to find out later that, randomly, the VM would become unresponsive again.

The only human readable information about these failures was in the ESXi host Events and was saying something like this (among other things):

 The CPU has been disabled by the guest operating system

One other thing which I should mention is that all my VM encountering this issue where Linux based, mainly Ubuntu 20.04 as OS distribution.

Not much to work with, but I gave it a try and searching for the error did point me to this VMware KB: https://kb.vmware.com/s/article/2000542

The KB is clearly accurate, just that it didn’t help me at all to resolve my problem. The troubleshooting process explain in the KB lead me to a dead end.

Other web resources (for the above error) pointed to articles which explained a procedure for VMware Workstation / Player. Not my case, since I’m using ESXi.

More research done, which took a while – that’s why I’m writing this article, hopefully others with this problem will find it easier – pointed to a BUG. Seems this BUG is a particular case between my VM Linux kernel and the version of the ESXi I’m using currently.

I’ve arrived to this VMware KB https://kb.vmware.com/s/article/2151480 which was a game changer. In my case this KB was hard to find, because the title – Linux VM fails with the error “kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!” (2151480) – is completely different than the error I was seeing and which I used searching the web.

Skipping the long output at the beginning of the KB, I saw something interesting in lower part of the page:

This issue occurs due to a bug in VMXNET3 vNIC backend which is part of the vmkernel. This issue occurs if the following conditions are met:

    Linux VM is running kernel >= 4.8
    HW version of VM is >=13
    ESXi version is 6.5

All the above fits my scenario, VMXNET3 as vNIC, Kernel 5.4, VM HW version 13 and ESXi 6.5

Like in most of BUG cases, the obvious solution is upgrade. Same here:

This issue is resolved in VMware ESXi 6.5 U1

Just that I cannot upgrade now for various reasons.

So, I’ve decide to look into the workarounds.

Second workaround on the page seems to be more simple and I don’t even have to restart the VM:

ethtool -G ethX rx-mini 0

Of course replace the ethX with your interface name.

Worked like a charm without any visible side-effects.

The other workaround is also doable, but I didn’t want to modify the .vmx file

Power off the virtual machine
         
Edit the vmx file and add the below parameter:
vmxnet3.rev.30 = FALSE
         
Power on the virtual machine

Now I’m just curious if I would encounter the same issues using another vNIC adapter type, like E1000 or E1000E instead of VMXNET3. Maybe I’ll give it a try…

VCSA, 503 Service Unavailable – possible fix

My ESXi hosting the VCSA crashed for whatever reason and after reboot the VCSA was displaying a “503 Service Unavailable” error.

What I was seeing actually was a blabbering long line:

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x00007fa69401a900] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)

The ESXi hosting my VCSA is not the fastest in the world, so I’ve waited a while, but the error was still there. Searching the Interne returned a lot of possible root causes for this errors, ranging from simple to complex one (like duplicate database table entry where you have to manually touch the postgresql instance).

I didn’t want to jump directly into touching things like the database, so I started with something more simple.

Below is what worked for me, maybe you’ll find it useful and can try before going into advanced troubleshooting.

I’ve connected to the VCSA CLI using the root credentials.

[email protected]'s password:
Connected to service

* List APIs: "help api list"
* List Plugins: "help pi list"
* Launch BASH: "shell"

Command>

Launched BASH by typing shell at the Command> prompt.

Now I have a Linux like CLI terminal.

Next step I’ve ran

service-control --status --all

which resulted in the following output:

root@vcsa [ ~ ]# service-control --status --all
Running:
 lwsmd vmafdd vmcad vmdird vmdnsd vmonapi vmware-cis-license vmware-cm vmware-eam vmware-rhttpproxy vmware-sca vmware-sts-idmd vmware-stsd vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd-svcs vsphere-client
Stopped:
 applmgmt pschealth vmcam vmware-content-library vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-perfcharts vmware-psc-client vmware-rbd-watchdog vmware-sps vmware-statsmonitor vmware-updatemgr vmware-vcha vmware-vpxd vmware-vsan-health vmware-vsm vsphere-ui

I’m not a certified expert in VCSA, but this doesn’t look good. Too many stopped services.

So, I just give it a try to see if I can start them by running

service-control --start --all

The next output is a long one, but basically it will check what services are up and start the ones which are stopped

root@vcsa [ ~ ]# service-control --start --all
Perform start operation. vmon_profile=ALL, svc_names=None, include_coreossvcs=True, include_leafossvcs=True
2020-04-06T17:31:57.180Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'lwsmd']
2020-04-06T17:31:57.185Z   Done running command
2020-04-06T17:31:57.188Z   Service lwsmd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.188Z   Running command: ['/sbin/service', u'lwsmd', 'status']
2020-04-06T17:31:57.213Z   Done running command
Successfully started service lwsmd
2020-04-06T17:31:57.217Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmafdd']
2020-04-06T17:31:57.589Z   Done running command
2020-04-06T17:31:57.593Z   Service vmafdd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.593Z   Running command: ['/sbin/service', u'vmafdd', 'status']
2020-04-06T17:31:57.617Z   Done running command
Successfully started service vmafdd
2020-04-06T17:31:57.621Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmdird']
2020-04-06T17:31:57.627Z   Done running command
2020-04-06T17:31:57.630Z   Service vmdird does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.630Z   Running command: ['/sbin/service', u'vmdird', 'status']
2020-04-06T17:31:57.654Z   Done running command
Successfully started service vmdird
2020-04-06T17:31:57.657Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmcad']
2020-04-06T17:31:57.663Z   Done running command
2020-04-06T17:31:57.667Z   Service vmcad does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.667Z   Running command: ['/sbin/service', u'vmcad', 'status']
2020-04-06T17:31:57.690Z   Done running command
Successfully started service vmcad
2020-04-06T17:31:57.694Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-sts-idmd']
2020-04-06T17:31:57.700Z   Done running command
2020-04-06T17:31:57.703Z   Service vmware-sts-idmd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.703Z   Running command: ['/sbin/service', u'vmware-sts-idmd', 'status']
2020-04-06T17:31:57.727Z   Done running command
Successfully started service vmware-sts-idmd
2020-04-06T17:31:57.730Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-stsd']
2020-04-06T17:31:57.736Z   Done running command
2020-04-06T17:31:57.739Z   Service vmware-stsd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.740Z   Running command: ['/sbin/service', u'vmware-stsd', 'status']
2020-04-06T17:31:57.763Z   Done running command
Successfully started service vmware-stsd
2020-04-06T17:31:57.767Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmdnsd']
2020-04-06T17:31:57.773Z   Done running command
2020-04-06T17:31:57.777Z   Service vmdnsd does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.777Z   Running command: ['/sbin/service', u'vmdnsd', 'status']
2020-04-06T17:31:57.801Z   Done running command
Successfully started service vmdnsd
2020-04-06T17:31:57.805Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-psc-client']
2020-04-06T17:31:57.812Z   Done running command
2020-04-06T17:31:57.815Z   Service vmware-psc-client does not seem to be registered with vMon. If this is unexpected please make sure your service config is a valid json. Also check vmon logs for warnings.
2020-04-06T17:31:57.815Z   Running command: ['/sbin/service', u'vmware-psc-client', 'status']
2020-04-06T17:31:57.839Z   Done running command
2020-04-06T17:31:57.843Z   Running command: ['/usr/bin/systemctl', 'daemon-reload']
2020-04-06T17:31:57.927Z   Done running command
2020-04-06T17:31:57.927Z   Running command: ['/usr/bin/systemctl', 'set-property', u'vmware-psc-client.service', 'MemoryAccounting=true', 'CPUAccounting=true', 'BlockIOAccounting=true']
2020-04-06T17:31:57.943Z   Done running command
Successfully started service vmware-psc-client
Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start statsmonitor services. Error: Operation timed out

The last line above is not too encouraging, “failed” keywords is not something to I wanted to see in the output. I was thinking my attempt didn’t work.

However checking the service status again, I’ve seen the following:

root@vcsa [ ~ ]# service-control --status --all
Running:
 applmgmt lwsmd pschealth vmafdd vmcad vmdird vmdnsd vmonapi vmware-cis-license vmware-cm vmware-content-library vmware-eam vmware-perfcharts vmware-psc-client vmware-rhttpproxy vmware-sca vmware-sps vmware-sts-idmd vmware-stsd vmware-updatemgr vmware-vapi-endpoint vmware-vmon vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui
Stopped:
 vmcam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-rbd-watchdog vmware-statsmonitor vmware-vcha

This was for sure better than before.

I gave it a try by opening the https://vcsa.local.domain and there it was, the webpag working fine.

I’m not sure exactly why the restart of the VCSA resulted in some services not to start properly, but seems that a kick will do the job.