Track Internet disconnections, provider outages with historical data, and automated speed testing.
For Windows, Linux, ARM64, ARMa7. Learn more by visiting www.outagesio.com
Notice: If you created an account on app.outagesio.com, simply use the same credentials to log in here.
2.5Gbps Hardware Agent Repeatedly "Rebooting", Missing Data
-
The setup is unrelated to the problems you've reported. Once we've resolved those issues, we can try to help with what you're seeing.
-
Just so you know, we are still monitoring this. Since the problem is not clear, we're trying tests which take a while to confirm.
For example, we've updated your agent to the newest version that is not released yet to see if that might help and other things.
We do see that the agent gets a fair number of disconnections which are not IP outages.
Still on this.
-
Update.
Day two of looking into this. We spent a large portion of the day trying to understand why this is happening to your agent only.
There is barely a pattern but we can tell that the main program stops communicating after a while and eventually, just comes back. In the meantime, another program keeps communicating but that one does not monitor the Internet.
Your agent is running the same firmware as all the others are and the same agent version yet yours has this odd problem of the main program not communicating. That said, while it has the same version, it could still be a slightly bit different one that somehow was flashed on yours by mistake.
One thing that came up. Are you using the power supply that came with the agent or another one?
Second, is there any chance you can give us remote ssh access to this agent. You can lock it down to one of our IPs of course but being able to look on the agent itself might show us something we simply cannot see from remote.
If ssh is possible, Ed would get into a chat with you tomorrow to get the into and would then take a look.
-
I am using the power supply that came with the 2.5Gbps agent.
I still think there is something odd going on with the ISP (WeLink). Other devices on the LAN exhibit problems too when using WeLink, which I think are an artifact of WeLink service.
So, I have switched connection back to AT&T fiber for now, starting about 1715 Central 13 Jan.
I have also left a yellow hardware agent (129878) and a software agent (131236) running on the same LAN in case that helps.
I would need specific step-by-step instructions to enable ssh access to the 2.5Gbps hardware agent. -
For ssh access, you'd have to look up what ever firewall you are using and create a rule that allows a remote IP to a LAN device (port forwarding).
Your other option would be to send it back so we can look at it.
The problem is that your ISP could be having certain problems but it would not explain the behavior we're seeing from your agent. From all we can tell, the programs are running correctly but something keeps blocking the monitoring program for long periods of time.
Since both programs communicate with the same remote networks, then both should stop communicating if it was a communications issue but only one does.
It's quite weird which is why gaining access to the device might give us a better understanding of what is going on.
-
BTW, I assume you noticed that the missing notifications was solved right after you brought it up so again, thanks for bringing that up.
At the moment, it seems to be something on your end but what, we cannot tell from here. Quite challenging. By gaining ssh access into it, where the problem is, we would have a better chance to find out what's going on.
No need to send it back, there is nothing wrong with the agent. You might notice it seems to be behaving a bit better already.
We continue testing small changes to see if those affect your agent and the odd behavior.You might notice it rebooting now and then and if you do, that's us testing incremental changes because we don't want to affect all of the other working agents.
We do not see these things with other agents so what ever is happening, it is specific to your location or better said, something on your network.
We are still investigating and making some progress which we'll share.
-
-
Hi,
I am trying to compare different behaviors within the same LAN and to do that I am asking if it would be possible to have all 3 agents 131236 (wash geek), 131232 (wash 2.5) and 129878 (wash) up and running.
For the moment I can say that both 131236 and 131232 are behaving in a similar way: i.e. they become inactive but NO outage.
Usually this means the problem has to be located within the LAN (your network, firewall or switching system) and not on the WAN (the ISP provider) but it is not always so crystal clear.In short some data stops from being collected by the agent when it doesn't have access to our servers, but there is no evident "internet outage" recorded: this can happen for some of the reason that were commented at the beginning of this thread (cable, signal, etc.).
Let me know if 129878 can be powered on.
-
The 3 agents are recording a similar situation on Jan 13 at around 13:33 Chicago time but at the same time are not recording any type of outage.
Two different technologies (Windows, Openwrt) and three different versions (MT300 and MT3000 even if they both are openwrt are different in binaries) but they all:
- cannot identify an outage
- are monitoring inactives
- they disagree in some minor timing, which can be related to the way the three agents are connected to the LAN
So next question is: is it possible, without any specific detail, understand if all three agents are connected the same way within the LAN (different VLANs, directly connected to the router or thru a switch, different rules in the firewall)
A simple hand drawn picture is more than enough, as I said no company detail is needed, jsut trying to see where the problem is originated and why they behave in such a way.
-
The 3 agents are recording a similar situation on Jan 13 at >around 13:33 Chicago time but at the same time are not >recording any type of outage.
We are confused why you keep saying the agents are not reporting outages, all three have done just that.
This is your 129878;
This is your 131232;
This is your 131236;
As can be seen, outages are being logged. There was a problem with notifications which we fixed as soon as you told us about it.
Did you read the links that were sent? Outages are being recorded but most of your problems are not IP outages, they are some other kind of disconnection.
Two different technologies (Windows, Openwrt) and >three different versions (MT300 and MT3000 even if >they both are openwrt are different in binaries) but they >all:
The technology is the same in terms of how the agent works but the binaries are of course different since they run on different platforms.
cannot identify an outage
See images above.are monitoring inactives
Because that is where most of the problems you are experiencing are, disconnections, not IP outages.they disagree in some minor timing, which can be >related to the way the three agents are connected to >the LAN
Yes, there would be minor differences in timing which is expected since loops have to run so timing can be a bit different.So next question is: is it possible, without any specific >detail, understand if all three agents are connected the >same way within the LAN (different VLANs, directly >connected to the router or thru a switch, different rules >in the firewall)
Sorry, I'm not seeing the actual question. Can you elaborate on this please.
A simple hand drawn picture is more than enough, as I >said no company detail is needed, jsut trying to see >where the problem is originated and why they behave in >such a way.
An image of what? From whom? For what?We've spent two days solid on this and still willing to help but cannot know anything beyond what you are explaining without remote access. Even with remote access, it might not be clear what's going on but we're still trying to help. Just kind of blindfolded at our end other than what we see from your agents.
What we see is that there is something on your network that seems to be blocking communications here and there. It seems to be random, like maybe a firewall rule suddenly kicking in after it's seen a certain amount of traffic to/from our network.
If this were a common problem, we would see it across all of our agents but we don't. We only see it in your reports and it's as odd to us as it is to you.
Remember that this service is not the ultimate answer to everything, it is just another tool. It automates as much as possible logging disconnections and IP outages. It is meant to monitor your ISP, not your LAN. It is not a network diagnostic tool, it is part of your other tools that you would use to find and solve IP problems.
Other than monitoring to see if we can catch a clue, we so far have not seen anything that tells us what is going on but we are also blind since we cannot see what's going on from the agents point of view.