Track Internet disconnections, provider outages with historical data, and automated speed testing.
For Windows, Linux, ARM64, ARMa7. Learn more by visiting www.outagesio.com
Notice: If you created an account on app.outagesio.com, simply use the same credentials to log in here.
Agents is running but not reporting
-
-
Sorry but I am getting confused by these two threads.
Both agent 129005 (which we are discussing in another forum's post) and this one have the same IP 192.168.0.200 which kind of doesnt make any sense to me.
Please lets finish the first the post titled "Agents reports wrong router was down ?" first since from there I can understand the infrastructure where the agent has been installed and then I can answer on this, ok ?
-
@SBK Most of the servers where the agents are installed have the same IP because most of my servers are in different cities but we try to use the same LAN configuration everywhere.
For example, 129004 is Bangladesh and 129005 is in Nepal.
I would prefer to focus on this issue as I have this problem on other servers and I don't know what to do about it.
Thank you :)
-
@SBK So what do you suggest to do about this ? Is there any logs somewhere I could check ?
I suspect one thing. I sometimes install the agent after a fresh Windows Server installation and before applying Windows updates. After a fresh installation, there are many updates to apply and I'm thinking that maybe the updates break something in the agent. But it might be also totally unrelated.
-
Agent 129004 is sending correctly pings while 129005 is not.
This means 129005 is not allowed to process ICMP otherwise that info would be recorded.I also see that the agent is alive only for few hours meaning that either the PC is up only for few hours or the process running in Windows is stopped: usually that means the installation was not done using admin privileges which could be one possible issue
On agent 129004 I see this
and this
bit strange the outages on your router (192.168.0.1)are always around11 to 14 seconds (10 seconds of handover I still think of that) -
Agent 129004 is sending correctly pings while 129005 is not.
This means 129005 is not allowed to process ICMP otherwise that info would be recorded.I also see that the agent is alive only for few hours meaning that either the PC is up only for few hours or the process running in Windows is stopped: usually that means the installation was not done using admin privileges which could be one possible issue
On agent 129004 I see this
and this
bit strange the outages on your router (192.168.0.1)are always around11 to 14 seconds (10 seconds of handover I still think of that)While 129005 I see this:
and this
and this too
quite consistent the downtime of around 10 seconds for your router -
@SBK I had this problem on agent 129013 too. The agent was reported disconnected from the portal but was actually running on the server. I tried to stop it from services.msc and then start it again and it solved the problem.
This is not a communication issue between the agent and your servers but most likely a bug on the agent.
-
@SBK I uninstalled and then installed agent 129004 again. It was working after the installation then I shutdown the computer. After starting the computer again, the agent was reported disconnected from the portal.
I had to restart the agent from windows services console in order to see it online from the portal.
-
@SBK Same problem with agent 128994, it stop pinging at 9:24am despite the computer being turned on and the internet connection working properly. I had to restart the agent to see it again online on the portal.
Maybe the agent has some issues with Windows Server 2016 ?
-
I am checking into this but I seem to recall that old operating systems like this can no longer be supported because MS doesn't maintain some of the .NET stuff that the agent needs to run correctly.
That or check if the servers where the agents are not working right are fully updated or not. That could be another reason.
-
Oh ok that may be a reason. Thank you for checking.
I have a routine to install daily all the Windows updates so most of the servers are up to date. I check the server of agent 128994 and it's indeed at the latest level of patches.
-
I looked at that agent and it does look fine. The last time it communicated was 2022-06-18 12:03:51 and it seems to be correctly installed since it was sending everything and getting what it needed.
It just suddenly stopped communicating. Either the service is not running or something has blocked it.
Can you check to make sure the server still has the allowed service in the firewall, to make sure some other security software on the server itself didn't block it.
Also, is it possible something upstream might block it after a while? If the service is running and it's showing Disconnected, it means it is not able to reach the OutagesIO network.
-
@OutagesIO_Support said in Agents is running but not reporting:
Either the service is not running or something has blocked it.
You seem convinced there is a communication problem between the agent and your servers. Can you elaborate why and how can I troubleshoot this further ? Can't it be a bug in the agent preventing it to communicate with your server ? Can I check this with Wireshark or access any log to confirm this ? (e.g. if I don't see the agent trying to communicate with your servers it means it's not a network problem).
My Windows Servers configuration are very simple and there is no security software apart from the embedded Windows Defender. I don't change the firewall configuration so outbound ICMP is allowed. And if you check the 2nd post of this thread, I was able to ping your servers from a computer that was reported disconnected despite the agent running.
-
I would not say I'm convinced of anything yet, just trying to work through possible things to narrow down what's going on. Since you know the Internet is up but you see the agent Disconnected, it means it is unable to reach the OutagesIO servers.
Which server were you pinging and getting something back from? The app and several other things don't allow ICMP. You should be able to ping tpw.outagesio.com however which is a test ping reply server.
A bug is always possible but in this case, it doesn't seem to be an agent problem because the agent did show that it was communicating for a while.
The firewall ports basically just need port 80, 443 and ICMP out/reply for some of the tests. There isn't any log and wireshark will only show connections to the OutagesIO network when the agent is able to communicate.
The only thing I can think of right now are the two things I mentioned.
I truly suspect the Windows versions as being too old because we update the agent now and then when MS makes changes that force us to.
Since you are having to use that version, maybe we can find an older version of the agent that would function on your servers but with that would come some missing functionality that we've added over the years.
Depending on how critical your needs are, our hardware agent is inexpensive, runs 24/7, is self updating so you'll never have any of these issues and it will never miss a thing.
All of the above said, I'll get together with SBK tomorrow, the other person helping you and see if there is anything obvious we can spot.
-
I investigated this further today and noticed OtmWinClient.exe is not running on the servers that are reported disconnected by the portal :
If I restart the service or launch OtmWinClient.exe manually, then the portal sees the agent connected.
Can you explain what is OtmWinClient.exe and do you know why it is sometimes not running ?
-
Hi,
So, the first thing we need in order to help each other is to pick a question and stay there. Right now, we have a mix of things in two questions which is making it very hard to follow everything.
Can we stick to this one please?
As for the agent, I'm not sure what you are asking when you say what is the exe? That is the agent itself, that's the service. Maybe I'm not understanding the question so if you can clarify, that would be great.
As for why the agent might not be running, as mentioned earlier, it might be because these are old operating systems and might not have all of the more modern updates that MS has on the newer OS's. That is the main thing I suspect at least.
Also, you were asked if you could install a regular Win 7/8/10/11 at the same location where one of your agents is not working so that we can compare side by side what is happening. It can be a vm or a BMS, doesn't matter.
Now, one thing I do see that looks very wrong is that the installation notes specifically that you must remove any existing agents installed on the server before installing a new one. From what I can see, you have multiple instances running on this server which could be why it's not able to run correctly or runs eratically, one instance messing up the other.
Can you try this please.
Pick a server that doesn't seem to be working right but DID run correctly at one point, meaning it sent hops, pings, etc. This just proves that ICMP isn't being blocked or wasn't at one point at least.
Head to that agent dashboard on app.outagesio.com.
Click on manage, then pick the re-install option.Before installing anything, remove ALL of the instances you have on this server. Check the services and the software list to make sure there are no echo networks packages or otm services left.
At that point, continue the installation as usual. Click yes to agree, download the agent, save it where you know you can find it, confirm the download, then click on the file to start the installation, enter the codes and complete the process.
Then show us what the above screen looks like and see if it's working.