Track Internet disconnections, provider outages with historical data, and automated speed testing.
For Windows, Linux, ARM64, ARMa7. Learn more by visiting www.outagesio.com
Notice: If you created an account on app.outagesio.com, simply use the same credentials to log in here.
SOLVED: Hardware Agent Rebooting Status
-
We have been using this hardware agent for about a week now and it is working great. We were able to narrow down a couple internal and service provider issues. I noticed this morning, after a switch reboot, that the agent is reporting "Your agent is REBOOTING". It is still reporting all metrics properly but now I do not see the heartbeat/status. Everything else is reporting- speed tests, hops, pings, outages, stats, temp/environmental info. Is this cosmetic or how should I resolve this?
-
Hi Jon,
Let me get a dev on this right now and we'll take a look. It seems cosmetic as the agent appears to be communicating and sending updates.
I'll report back shortly. FYI, the dev may purposely reboot the device in the meantime.
-
@OutagesIO_Support
Sounds good. No rush right now because we are still collecting the data. Thanks for your help! -
@jonmill1234
Hi Jon,Can you please tell me at which time (your local timezone) the switch was rebooted ?
I am asking since:
- the HW agent reboots everyday to check OTM or firmware updates
- this changes the status to REBOOTING to avoid fake inactive notifications
- when all the checks are over it sends a confirmation message that all is ok
- that confirmation is missing
I need to be sure that this was happening while the switch was rebooted before I dive deeper to check if there is a bug.
-
@jonmill1234
Need to dive into this since the reboot happens at 3:00 am UTC
Will get back as soon as I have more info. -
Hi,
I just checked and now the status is back.
Yes, this is because we sent it another reboot command to see if would do the same thing again and it didn't.
What you are seeing should be addressed in the next agent version release.
Initially, we thought that the multi-threading function of the agent code which handles a number of simultaneous functions might be not sending some data now and then.
However, situations like what happened here help to confirm that what may actually be happening is that if the agent is not able to send something for what ever reason, it could give up so the data could eventually be lost if it doesn't re-try.
It is supposed to always re-try but there is something in the libraries or method we are using that somehow prevents this re-try now and then. Could even be CPU overload causing the re-try thread to be lost.
The new version should be ready to test internally this week some time. How long that testing takes depends on what we see as problems if they come up as this is a rather heavy re-write.
-
@OutagesIO_Support
Excellent! Thank you so much for your help and I'm glad I was able to get you more data to analyze for the bugfix. Let me know if there is anything else you need from me, otherwise I'll be looking forward to the firmware update. Thanks again!