OcNOS DC : Troubleshooting Guide : System Management
System Management
This chapter contains steps to resolve system management issues.
 
Symptom/Cause
Solution
Non availability of telnet/ssh service
When the node is booting up, we disable all remote access. Upon the start of hostpd, the service xinetd starts.
Make sure hostpd is running or started during init sequence of board initialization, and xinted service is running. Execute at the Linux prompt and verify listening socket:
 
ip netns exec zebosfib1 netstat -tpln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:705 0.0.0.0:* LISTEN 30044/snmpd
tcp 0 0 0.0.0.0:199 0.0.0.0:* LISTEN 30044/snmpd
tcp6 0 0 :::22 :::* LISTEN 29997/xinetd
tcp6 0 0 :::23 :::* LISTEN 29997/xinetd
tcp6 0 0 :::830 :::* LISTEN 29997/xinetd
 
Failure to authenticate a user
If the basic files for Linux authentication of a user are missing/corrupted, the login to the node is denied. Using console root user, make sure the /etc/passwd file has an entry for the user trying to login. Look for authentication errors are in /var/log/messages, for more about such failures.
Remote access to the node via telnet/ssh hangs
The shell imish/cmlsh is configured for all OcNOS users, except for user root, which is accessible via console only. If the module imi or cmld is not responding, then there will be no imish/cmlsh prompt after successful login.
 
The system monitoring module (pservd) restarts such hung modules, recovering hang states of one of more modules. Look for the core directory (/var/log/crash/cores) and syslog messages in /var/log/messages to find the actions from system monitoring module.
Continuous restart of any module
If any module is restarting continuously, disable monitoring such module via:
 
no software-watchdog <module name>
 
If the NSM/HSL module crashes or hangs, the system reboots.
 
The system does not reboot automatically when the earlier two reboots were due to HSL or NSM crashes during the initial few minutes of board boot up. This is to stop continuous reboots of the system due to NSM/HSL crashes.
 
There is no mechanism to disable this except for disabling pservd service. Stop the service pservd to disable it.
 
If module pservd is hung, it will be restarted in 5 mins.
Deleting ZebOS.conf looses management IP address
During ONIE installation, if you do not configure a static IP address, OcNOS boots and gets an IP address for eth0 (management port) through DHCP and updates the /etc/network/interfaces file. Once you configure a static IP address from the OcNOS command line and save the configuration, OcNOS updates /etc/network/interfaces and changes the method used to configure eth0 from dhcp to static.
In this scenario, if you delete ZebOS.conf, then the management IP address is lost and you can only recover management access by assigning an IP address via the console.
sys-update install <installer> failure
No free space left on system. Minimum 1 GB space is needed:remove some files to make available space > 1GB on device.
Binaries not compatible with the board: use proper installer file for the respective board.
Installer not downloaded properly, try again: downloaded installer file is not complete.
Source Interface not found.
OcNOS version you are trying to upgrade is already Installed: no need to upgrade again, you have the same version already installed.
File not found on board: installer file is not present on board for given path, provide valid path for installer file.
File not found on server: installer file is not present on the server provided in the link, provide valid link for installer file.
Server connection timed out: waited 60 seconds for server to respond.
Unsupported protocol: the ftp, http, tftp, and file protocols are supported.
Invalid installer: installer file is not valid.
% Source interface is not up : Ensure source interface is UP
 
Note: When the sys-update operation stops without any error, check whether the IP reachability is there to download the installer file.
sys-update install <deb package> failure
No free space left on system. Minimum 1 GB space is needed: remove some files to make available space > 1GB on device.
Unsupported protocol: the ftp, http, tftp, and file protocols are supported.
Unsupported OCNOS image format (need:*.deb): deb package name should be like <filename>.deb.
Kernel changes are present in this version, sysupdate not possible: upgrade using installer.
Binaries not compatible with the board: use proper installer file for the respective board.
OcNOS version you are trying to upgrade is already Installed: no need to upgrade again, you have the same version already installed.
Non-ZEBM to ZEBM upgrade using *.deb not allowed: use installer for non-ZEBM to ZEBM upgrade.
ZEBM to non-ZEBM upgrade using *.deb not allowed: use installer for ZEBM to non-ZEBM upgrade.
When incompatible transceiver(s) are inserted it may result in Device Monitoring (DDM) to be disabled and may result in hang.
When Incompatible transceiver(s) insertions results in device monitoring to be disabled and possible board hang, two possible recovery mechanisms are available:
Remove the incompatible transceiver
Power Cycle the board (Not reboot)
When Incompatible transceivers insertion issue is seen, system reboot or CMMd restart could lead to undefined behavior (includes inaccessible system).
When the system first boots up after upgrade with incompatible transceiver present which results in issue, the system might be inaccessible after the upgrade.
License Troubleshooting
Note: If you install OcNOS version 1.3.8 (or later) for the first time on a device and then perform license activation, the activated license is deactivated if you install any version before 1.3.8. To recover, the license has to be activated/installed again.
 
Symptom/Cause
Error
Solution
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Failure: license get <url> / license refresh
License file (IPI-DEVICEID.bin) Not Found
license is not present on system, use "license get <url>" to install the license
License installation failed due to incorrect Device ID in the License file, please use the relevant device specific license file
Downloaded license is not for the current device and it is removed. Use "license get <url>" to install the correct license.
The allowed time to process response has expired
License file lifetime has expired, but this is not an actual license expiration error. Also this lifetime value is not visible to the user. So download the license from FNO portal using "license get <url>" again.
 
Note: The Lifetime field is the lifetime of the capability response, in seconds, after which the response is considered “stale” and cannot be processed by the client or server. IPInfusion has the lifetime set to 3628800 seconds or 42 days. If a capability response is created and held without installing for more than the specified period (42 days), it turns stale and the target device would not be able to process this.
Response is out of order with previous responses, also show license is not reflecting the new license features.
User have already installed a license which is downloaded more recently than the current license. But once you land in this error case, re-installation of either of these two licenses will not be helpful anymore. So download the license from portal freshly and install "license get <url>" command.
Failed to create trusted storage
Remove the contents of /cfg/license/ then install the license using "license get <url>" CLI.
Invalid license file
License file might be corrupt, so download and install the license from FNO license portal using "license get <url>". If it still fails, validate the checksum of the license file in /cfg/license/bin/ with the one downloaded from the FNO portal.
Start date for the license is in the future
Correct the system clock and issue the "license refresh" command to install the license
Empty license file
Download the license from FNO license portal, and install using "license get <url>".
Failed to process capability response / Failed to process the license file
Remove the contents of /cfg/license/ then install the license using "license get <url>" CLI.
Command "license get" is not installing the given license file, but processes old license and fails.
Correct the system clock and issue "license get <url>" to install the license again.
 
License is not matching with device software
License file SKU is not compatible with device software, please map the right SKU, then generate and install the license.
 
Empty license response received: license is not mapped with SKU or the license server exhausted its limit
Select a SKU while generating license from FNO license portal or increase the license pool on the license server to accommodate more devices.
Zero Touch Provisioning
 
Symptom/Cause
Solution
ONIE Image/IP address not fetched from DHCP server
Ensure DHCP server is up and reachable from device.
ONIE Image/IP address not fetched from DHCP server
Ensure DHCP server config file has proper info for this device. Things to check in dhcpd.conf file:
Device MAC address is proper, if MAC address based config is used.
Device VCI is proper in DHCP config file. Use onie-sysinfo command to check the same.
Syntax and value provided for DHCP options to be used by this device is proper.
Error: license & config already exist on device. Skipping ZTP provided data check.
As device already had old license and config, ZTP provided info are discarded
Error: Lease info from DHCP server not found. Skipping
Ensure that DHCP server is up and reachable. Install license and then config manually once device is UP
Error: Unable to download the startup config mentioned at ZTP/DHCP server!
Ensure that ZTP provided config file path is reachable and having download permission. Install license and then config manually once device is UP.
Error: Unable to download the license provisioned through ZTP/DHCP server!
Ensure that ZTP provided license file path is reachable and having download permission. If license path was provided in DHCP server config, ensure license file with this Device ID exists. Install license and then config manually once device is UP.
Error: eth0 is not configured using dhcp
DHCP server didn't install IP address as dynamic. Please install license/config manually
Error: ZTP provided config didn't get applied successfully
Ensure valid license was installed or provided by ZTP server.