Track Internet disconnections, provider outages with historical data, and automated speed testing.
For Windows, Linux, ARM64, ARMa7. Learn more by visiting www.outagesio.com
Notice: If you created an account on app.outagesio.com, simply use the same credentials to log in here.
Agents is running but not reporting
-
I looked at that agent and it does look fine. The last time it communicated was 2022-06-18 12:03:51 and it seems to be correctly installed since it was sending everything and getting what it needed.
It just suddenly stopped communicating. Either the service is not running or something has blocked it.
Can you check to make sure the server still has the allowed service in the firewall, to make sure some other security software on the server itself didn't block it.
Also, is it possible something upstream might block it after a while? If the service is running and it's showing Disconnected, it means it is not able to reach the OutagesIO network.
-
@OutagesIO_Support said in Agents is running but not reporting:
Either the service is not running or something has blocked it.
You seem convinced there is a communication problem between the agent and your servers. Can you elaborate why and how can I troubleshoot this further ? Can't it be a bug in the agent preventing it to communicate with your server ? Can I check this with Wireshark or access any log to confirm this ? (e.g. if I don't see the agent trying to communicate with your servers it means it's not a network problem).
My Windows Servers configuration are very simple and there is no security software apart from the embedded Windows Defender. I don't change the firewall configuration so outbound ICMP is allowed. And if you check the 2nd post of this thread, I was able to ping your servers from a computer that was reported disconnected despite the agent running.
-
I would not say I'm convinced of anything yet, just trying to work through possible things to narrow down what's going on. Since you know the Internet is up but you see the agent Disconnected, it means it is unable to reach the OutagesIO servers.
Which server were you pinging and getting something back from? The app and several other things don't allow ICMP. You should be able to ping tpw.outagesio.com however which is a test ping reply server.
A bug is always possible but in this case, it doesn't seem to be an agent problem because the agent did show that it was communicating for a while.
The firewall ports basically just need port 80, 443 and ICMP out/reply for some of the tests. There isn't any log and wireshark will only show connections to the OutagesIO network when the agent is able to communicate.
The only thing I can think of right now are the two things I mentioned.
I truly suspect the Windows versions as being too old because we update the agent now and then when MS makes changes that force us to.
Since you are having to use that version, maybe we can find an older version of the agent that would function on your servers but with that would come some missing functionality that we've added over the years.
Depending on how critical your needs are, our hardware agent is inexpensive, runs 24/7, is self updating so you'll never have any of these issues and it will never miss a thing.
All of the above said, I'll get together with SBK tomorrow, the other person helping you and see if there is anything obvious we can spot.
-
I investigated this further today and noticed OtmWinClient.exe is not running on the servers that are reported disconnected by the portal :
If I restart the service or launch OtmWinClient.exe manually, then the portal sees the agent connected.
Can you explain what is OtmWinClient.exe and do you know why it is sometimes not running ?
-
Hi,
So, the first thing we need in order to help each other is to pick a question and stay there. Right now, we have a mix of things in two questions which is making it very hard to follow everything.
Can we stick to this one please?
As for the agent, I'm not sure what you are asking when you say what is the exe? That is the agent itself, that's the service. Maybe I'm not understanding the question so if you can clarify, that would be great.
As for why the agent might not be running, as mentioned earlier, it might be because these are old operating systems and might not have all of the more modern updates that MS has on the newer OS's. That is the main thing I suspect at least.
Also, you were asked if you could install a regular Win 7/8/10/11 at the same location where one of your agents is not working so that we can compare side by side what is happening. It can be a vm or a BMS, doesn't matter.
Now, one thing I do see that looks very wrong is that the installation notes specifically that you must remove any existing agents installed on the server before installing a new one. From what I can see, you have multiple instances running on this server which could be why it's not able to run correctly or runs eratically, one instance messing up the other.
Can you try this please.
Pick a server that doesn't seem to be working right but DID run correctly at one point, meaning it sent hops, pings, etc. This just proves that ICMP isn't being blocked or wasn't at one point at least.
Head to that agent dashboard on app.outagesio.com.
Click on manage, then pick the re-install option.Before installing anything, remove ALL of the instances you have on this server. Check the services and the software list to make sure there are no echo networks packages or otm services left.
At that point, continue the installation as usual. Click yes to agree, download the agent, save it where you know you can find it, confirm the download, then click on the file to start the installation, enter the codes and complete the process.
Then show us what the above screen looks like and see if it's working.
-
Hi,
The agent developer says you seem to have installed the agent multiple times using different users.
Just uninstall all of those as mentioned then install just one agent using the highest level user you have on the system and it should work again.
And of course, you need to install only on 64bit systems.
-
- I can't install a regular Windows 7/8/10/11 because we need Windows server
- There was no previous installation and I have never installed the agent multiple times. I always install the agent only once and under the Adminstrator user. The reason you see muttiple "OtmServiceApplication.exe" is because it seems to be launched every time a user logs in on the server. If you check the screenshot, you will see that the user is different for each instance (PC1, PC7, Administrator...). I assume this is the tray icon so I am not surprised it is launched for every user. There can be up to 15 to 20 users logged in at the same time on one server.
- Yesterday I checked on a few servers that had this problem and I confirm I can ping tpw.outages.io. I already tested it in my 2nd post on a server that was before working correctly and sent you a screenshot so please recheck.
- I have already tried the reinstall procedure for some servers yesterday in the last couple of weeks and it doesn't make a difference. The agent works fine after the installation but it will stop working at some point.
I think I have covered all your points.
The problem now became obvious to me. OtmWinClient.exe is sometimes not started on the servers which is why I see the agent disconnected. The reason remains unknown, this problem is random and seem to happen on every installation. The only solution I have so far is to restart the service manually. "OtmService.exe and "OtmServiceApplication.exe" are always running, the problem is only with "OtmWinClient.exe". Can you please explain the difference between these 3 programs ?
-
Yes, you covered all of the points and I appreciate that.
Thank you for responding to all that so that we can be in sync. Now we better understand where we are at with everything.
In this case, this needs to be looked at by the developer. He mentioned one thing when saying that the agent appeared to have been installed multiple times;
That’s an interesting screenshot. It looks like multiple different users (column 4) have installed otm. So I need to look into that and make sure the app is always installed as “all users”. Interesting that and something I may have overlooked.
I have to wait for a reply to better answer why there are three programs installed as I've only ever seen two but have never run it on 'server' versions.
One more thing that was mentioned is that the server must support .NET 4.7.2 I think but the installer should download it automatically if it's not on the OS already.
Have you tried the new version we made available today? It's possible that the newest version might pull down something different that the OS needs to get this to run right.
Just make sure there aren't any other agents installed which I know you know now :).
And please hang in there. I'm sure we'll figure this out. It just takes a little time to get in sync, gather more info then fix the problem.
-
I have installed agent V1.69.2106 on all the servers so far.
I tried to install the agent V1.72.2202 today and the installation fails when trying to download .Net framework :
I have pasted the logs here : https://zerobin.net/?ba17bcb3567c4c9c#YWUMi/KRCnDBQMbaTB+hOC133xxGI5NBPFvoxbCIvNQ=
I checked on the server and here are the .Net versions currently installed :
-
I don't know why the installer has a certificate error when trying to download the .Net installer. I downloaded the installer from Chrome using the same download link as in the logs without any problem.
After installing .NET 4.7.2, the agent was installed successfully.
-
@OutagesIO_Support For now the agent is running fine but it's too early to tell, the problem usually comes after few days.
-
That might help.
Just a suggestion but before you do too much work on many servers though, maybe stick to one until we have this figured out. Once we do, then do the rest.
The dev noticed he left his test cert on the build. Sometimes, it's hard to fully test new builds. We might not see problems but as members install and report them, we fix them.
The problem is, most people don't report them so it's always nice when someone reports or works with us so we can figure things out, just like in this situation.
We appreciate your patience. We'll get a new build out to you asap.
-
Yes I agree with you and we should do only one change at a time.
However it will be easier for me to apply this change everywhere as it will enable me to see faster if it's working or not. That's why I updated all the servers to .Net 4.7.2. However, I only updated the agent on one server and will not make the change on the other servers for now.
It didn't take too much effort to upgrade all the servers to .NET 4.7.2 as I just had to write a little script and run it on all the servers (Itarian will automate that with a few clicks).
To know which servers have a problem, I just compare the list of online servers in Itarian (a remote management solution) with the list of online servers in outages.io. And then I know which one have an issue. I recently realized that this problem seem to be on all the servers and it's random.
So if I test only one server, it may take a long time before I can confirm if the change is stable or not. Whereas if I apply this change on all the servers, I will be quick to see because until now there are always some servers where the agent crashes (I manage about 22 servers).
Tomorrow the servers will be rebooted and the .NET 4.7.2 update will be applied. I will check again if the problem is still there or not.
People often report to me that the internet is slow or not working. With your tool I can check whether it's coming from the ISP or from the local installation, It's very useful for me and I am really willing to solve this agent problem so thank you for your support.
-
However it will be easier for me to apply this change everywhere as
it will enable me to see faster if it's working or not.No worries, what ever works for you. I was just a bit concerned that you might spend a lot of time updating a bunch of servers only to find out the agent won't work.
To know which servers have a problem, I just compare the list of >online servers in Itarian (a remote management solution) with the list >of online servers in outages.io.
We use something called Zabbix, to monitor servers/health etc but actually use our own hardware agents for connectivity monitoring.
So if I test only one server, it may take a long time before I can >confirm if the change is stable or not. Whereas if I apply this
Yes, that makes sense. I hope we can figure out what the random aspect of it is. While we don't actually support the version of OS you are using, if we can get to a point where it's working right, we could add it to our list of supported platforms.
change on all the servers, I will be quick to see because until now >there are always some servers where the agent crashes
I suspect it's not crashing but the service is stopping for some reason. We've not seen much for actual crashes since even our early versions but definitely see that the service can stop for various reasons.
The problem of course is there are countless variations of software, drivers, packages, updates, etc etc, even when we think they are nearly clones. Sometimes, something really small can affect one server but not another.
The dev is looking into what you've said and testing some ideas.
People often report to me that the internet is slow or not working. >With your tool I can check whether it's coming from the ISP or from >the local installation,
That's why we built it actually. Customers would call us thinking what we manage was broken but 99% of the time, it was their own Internet that was not working right.
We thought it would be better to monitor from their own locations perspective to see if for ourselves. They didn't have to explain anything to us, we could see it and deal with it.
The service can also be used to gain analytics across all of the ISP's that your organization uses. From your own resources to remote customers/employees, all of that can be consolidated to reveal a lot of interesting information that you'd never get from any ISP. Our Enterprise level gives you that.
For example, you could break down which areas by country, states/provinces, cities, towns, etc, experience the most or least problems.
You could see which ISP's are most or less reliable, when they have the most problems and where. You could see which are more reliable in terms of the speeds you are paying for and all sorts of other metrics.
You can even see where they have weak or trouble points.
It's very useful for me and I am really willing to solve this agent >problem so thank you for your support.
We love hearing this kind of input and yes, we'll solve it. It just takes some time to find the leads to know what to fix.
-
Ok thank you.
Checking this morning, many servers are reported disconnected from outages.io meaning updating to .NET 4.7.2 did not solve the issue.
-
Hi,
If you re-install yet again, you'll get the same version but with the cert fixed. Here is the answer to your question about the various services.
OtmWinClient is the actual client app. This is the same app as on the other platforms and is build from the same source code.
OtmService is a windows service that launches and kills OtmWinClient. This is accessible via the windows service control and starts otm when windows boots.
OtmServiceApplication is the app that shows the tray icon and allows to start/stop the service via a popup menu. This is maybe badly named and would be better named as OtmTrayApplication.
The rest, we are still working on and are building servers so we can test here as well.
-
Thanks for the clarifications.
Are you building Windows servers to test ? Should I wait for your instructions before testing any further ?
-
Yes, Windows servers for testing. I'm not sure which yet but one is a very old 2003 server. I don't seem to have a 2016 version otherwise I would give that a try too.
You can try the most recent version just to see if the cert problem is gone which it should be otherwise, it's just a matter of time for us to do some testing, coding, testing.
Here is what I know so far.
The current and previous versions of the installer is already installing for “all users”.
It will probably work on most server editions as long as it’s 64 bit.
On server, each user gets a copy of the tray icon app.
That tray icon app just starts and stops the service.
There is only one instance of the agent running but multiple instances of the tray app. That should not be a problem. It should be possible to restrict that to administrators so I will check that.
I know we touched on this but you could try what we talked about. Make sure all instances are removed then re-install just one as full admin using the new version and see what happens.
The new version pulls everything down needed so maybe it would pull something down that got missed before.