Track Internet disconnections, provider outages with historical data, and automated speed testing.
For Windows, Linux, ARM64, ARMa7. Learn more by visiting www.outagesio.com
Notice: If you created an account on app.outagesio.com, simply use the same credentials to log in here.
2.5Gbps Hardware Agent Repeatedly "Rebooting", Missing Data
-
I recently got a 2.5Gbps hardware agent to replace my old yellow agent.
The new hardware agent seems to be constantly rebooting, as marked on the dashboard. Sometimes it seems like it is actually rebooting; mostly it seems the dashboard is just incorrect. This "rebooting" can last well over 60 minutes. That amount of time offline due to random "reboots" makes it much less useful for any reliable monitoring.However, it also seems to not be logging times when I know internet was offline, like when i swapped cables or when I rebooted the router.
Maybe I have it set up incorrectly? There was no included setup instructions, only the activation email, and activation was successful.
So far, this new hardware agent has been a huge disappointment, but I am hoping it is just due to user error.
-
Hi,
Thanks for reporting the problem.
First, there are no setup instructions needed, you just connect it to the LAN port and power. Once online, you activate it into your account which you have so you're fine on all of that. Activating basically just means telling OutagesIO that this is your agent, it goes into your account/inventory.
Correct, the agent should not be rebooting or showing rebooting all the time. This means that it is not fully communicating so we'll have to look into that with you to find the cause.
The fact that it's not sending a response to OutagesIO to confirm it's done rebooting and back online is what tells me that.
I'll contact our lead developer, Ed, about this to see if he can help you.
On the outages not being reported, what you explained is the correct behavior. Pulling a cable would cause the agent status to go Inactive, eventually Disconnected but since there was no actual IP outage, it would not send anything.
You can however set your notifications to be notified if the agent becomes Inactive or Disconnected.Assuming it was not power cycled, and assuming there was an actual IP outage, it would send that report once it comes back online.
Here are a few things that can help explain how statuses, disconnections and outages are handled. Ed will respond as soon as he can.
https://support.outagesio.com/topic/112/important-cable-and-wireless-internet-services
https://support.outagesio.com/topic/12/about-internet-outages-alerts-and-agent-statuses
We can tell when we have something problematic because we would get a lot of complaints.Agents are manually prepared and fully tested before they are shipped. I'm sure there is nothing wrong with the agent itself but something is causing a problem which we need to dig into.
We appreciate when people take the time to let us know about problems because our number one priority is always making sure the service works. We spend too much time of our lives on this service to offer it broken so are always eager to fix any problems that come up.
-
Notifications were turned on for both email and dashboard for agent inactive. When I looked at the dashboard I saw agent inactive, but now a couple hours later, it looks like there was never any alert logged on the dashboard or any sent for being inactive. Still wondering if maybe I have something set up wrong.
-
You mentioned seeing the status being 'rebooting' for a long time. Can you give us any more details about this?
The normal reboot is in the middle of the night and only takes 30 seconds or so. We do this to make sure there are no memory bleeds but also to have a chance to recover agents if something ever goes wrong.
As for the notifications not being sent, we just noticed that too since your agent was Inactive a moment ago. We see your settings are set to get those.
Still digging into this because we want to solve problems so again, thanks for taking the time to let us know.
-
Hi,
Here is an update.
Thankfully, you took the time to report what you were seeing and sure enough, we've found some bugs that need attention.
First, we found a bug that started in mid December causing notifications to never go out but only for certain agents. Hard to believe that no one bothered to take the time to let us know about that or didn't notice but it should be fixed now. If it's not, at least with a bit more feedback, it'll get resolved.
Second, there is indeed an odd problem going on with these new agents and again, not all of them but yours is affected. While it does reboot nightly as it should, it is also being sent a command that causes the monitoring code to keep running but the status to show Rebooting.
We worked on this all of last night trying to find out where this is coming from, what is sending it and why but didn't find anything. We stopped by leaving additional logging in place to get some leads which I see we did capture overnight.
The investigating will continue in an hour or so and I'll update this as we find or solve it.
If you see anything unusual that you've not reported, please feel free to mention it.
-
Thanks for looking into this so quickly!
Not sure if it makes any difference in terms of what the monitoring and outage reporting looks like, but this is currently installed on a network where the ISP is WeLink, which ISP uses CG-NAT and also may be more subject to signal problems than a fiber or cable connection would be. This WeLink rooftop antenna based wireless high speed (2gbps) internet may be generating more events / different events than typical installations would see. -
The setup is unrelated to the problems you've reported. Once we've resolved those issues, we can try to help with what you're seeing.
-
Just so you know, we are still monitoring this. Since the problem is not clear, we're trying tests which take a while to confirm.
For example, we've updated your agent to the newest version that is not released yet to see if that might help and other things.
We do see that the agent gets a fair number of disconnections which are not IP outages.
Still on this.
-
Update.
Day two of looking into this. We spent a large portion of the day trying to understand why this is happening to your agent only.
There is barely a pattern but we can tell that the main program stops communicating after a while and eventually, just comes back. In the meantime, another program keeps communicating but that one does not monitor the Internet.
Your agent is running the same firmware as all the others are and the same agent version yet yours has this odd problem of the main program not communicating. That said, while it has the same version, it could still be a slightly bit different one that somehow was flashed on yours by mistake.
One thing that came up. Are you using the power supply that came with the agent or another one?
Second, is there any chance you can give us remote ssh access to this agent. You can lock it down to one of our IPs of course but being able to look on the agent itself might show us something we simply cannot see from remote.
If ssh is possible, Ed would get into a chat with you tomorrow to get the into and would then take a look.
-
I am using the power supply that came with the 2.5Gbps agent.
I still think there is something odd going on with the ISP (WeLink). Other devices on the LAN exhibit problems too when using WeLink, which I think are an artifact of WeLink service.
So, I have switched connection back to AT&T fiber for now, starting about 1715 Central 13 Jan.
I have also left a yellow hardware agent (129878) and a software agent (131236) running on the same LAN in case that helps.
I would need specific step-by-step instructions to enable ssh access to the 2.5Gbps hardware agent. -
For ssh access, you'd have to look up what ever firewall you are using and create a rule that allows a remote IP to a LAN device (port forwarding).
Your other option would be to send it back so we can look at it.
The problem is that your ISP could be having certain problems but it would not explain the behavior we're seeing from your agent. From all we can tell, the programs are running correctly but something keeps blocking the monitoring program for long periods of time.
Since both programs communicate with the same remote networks, then both should stop communicating if it was a communications issue but only one does.
It's quite weird which is why gaining access to the device might give us a better understanding of what is going on.
-
BTW, I assume you noticed that the missing notifications was solved right after you brought it up so again, thanks for bringing that up.
At the moment, it seems to be something on your end but what, we cannot tell from here. Quite challenging. By gaining ssh access into it, where the problem is, we would have a better chance to find out what's going on.
No need to send it back, there is nothing wrong with the agent. You might notice it seems to be behaving a bit better already.
We continue testing small changes to see if those affect your agent and the odd behavior.You might notice it rebooting now and then and if you do, that's us testing incremental changes because we don't want to affect all of the other working agents.
We do not see these things with other agents so what ever is happening, it is specific to your location or better said, something on your network.
We are still investigating and making some progress which we'll share.
-
-
Hi,
I am trying to compare different behaviors within the same LAN and to do that I am asking if it would be possible to have all 3 agents 131236 (wash geek), 131232 (wash 2.5) and 129878 (wash) up and running.
For the moment I can say that both 131236 and 131232 are behaving in a similar way: i.e. they become inactive but NO outage.
Usually this means the problem has to be located within the LAN (your network, firewall or switching system) and not on the WAN (the ISP provider) but it is not always so crystal clear.In short some data stops from being collected by the agent when it doesn't have access to our servers, but there is no evident "internet outage" recorded: this can happen for some of the reason that were commented at the beginning of this thread (cable, signal, etc.).
Let me know if 129878 can be powered on.
-
The 3 agents are recording a similar situation on Jan 13 at around 13:33 Chicago time but at the same time are not recording any type of outage.
Two different technologies (Windows, Openwrt) and three different versions (MT300 and MT3000 even if they both are openwrt are different in binaries) but they all:
- cannot identify an outage
- are monitoring inactives
- they disagree in some minor timing, which can be related to the way the three agents are connected to the LAN
So next question is: is it possible, without any specific detail, understand if all three agents are connected the same way within the LAN (different VLANs, directly connected to the router or thru a switch, different rules in the firewall)
A simple hand drawn picture is more than enough, as I said no company detail is needed, jsut trying to see where the problem is originated and why they behave in such a way.
-
Retracted. Accidentally replies to Ed :)
-
I see @OutagesIO_Support seeming to quote/reply to @SBK and not sure what portion is directed to @SBK versus the OP (me).
The entire purpose of my getting new 2.5Gbps hardware agent was to better monitor outages from new ISP (WeLink) and compare to old ISP (AT&T Fiber). Unfortunately, there may be something else weird going on with the local network which may or may not be an artifact of something the new ISP does (e.g. CGNAT, RF at unknown reliability vs Fiber, maybe some filtering they did not tell me about, etc).Once the bug was quickly fixed about certain but not all 2.5Gbps hardware agents not reporting, I am not sure what's up, but I do know I did NOT keep the LAN and WAN/ISP setup in a steady state suitable for any in-depth monitoring throughout that time, until late evening Chicago time 14 Jan. Work was underway to ensure internet service back online to users. But that no doubt made any troubleshooting of agents difficult.
Over those couple days I had to swap back and forth between the old and new ISP a couple times (eventually switching back the original - had before 2.5GBPS agent installation - ISP) and added a new software agent and put the old yellow hardware agent back online. A few, maybe several, of the times when agent(s) ceased communication that was likely: when ISP tech came to attempt to adjust their RF (did not solve), ISP tech took connection offline, I had to swap back and forth between ISPs by physically moving cable in 2 different floors and 2 separate buildings, I moved HW agent 129878 to a different floor different switch, routers rebooted, etc.
The current setup is since late evening Chicago time 14 Jan is this:
ISP ONT/Router BGW320-500 (in passthrough mode) 3GbE port -> 2.5GbE TPLink Deco BE25 2.5GbE -> 2.5GBe switch:
-> HW 2.5GB Agent 131232
-> [other devices]
-> Win11 PC running SW Agent 131236 (and in live use by a user, so potential PC reboots or disconnects)
-> ~30-40m cat5e cable -> another 2.5GbE switch:
-> HW Agent 129878
-> [other devices]That setup should now remain stable for a month or two until WeLink ISP can persuade me they have solved whatever technical issues they were having and want to attempt again to supplant current ISP (AT&T fiber), after WeLink failed to perform adequately this month.
-
Yes sorry, I replied to Ed's post in error. I was tired. You can disregard that. Sorry for the confusion.
By the way, only our dedicated hardware agents are rebooted. On Windows, the service restarts itself nightly only. This is because PCs can also be in use by people and we would not want them to lose any work they were doing.