Track Internet disconnections, provider outages with historical data, and automated speed testing.
For Windows, Linux, ARM64, ARMa7. Learn more by visiting www.outagesio.com
Notice: If you created an account on app.outagesio.com, simply use the same credentials to log in here.
Agents is running but not reporting
-
Hello
I deployed the new Agent only on one server, but I haven't seen the problem again.
What is the difference between v1.76 I am using now and v1.77 ?
-
Well that's good news. I assume you installed it on a server you were seeing those service problems with.
The v1.77 has some additional updates that we've been working on.
Here's a post about it. About v1.77.2207
The main things are;
-Improved thread management was a core requirement in order to prevent the potential loss of occasional data being sent to dashboard.
-Improved Windows service function to prevent the service from potentially stopping on some Win machines.
-For those using Extended reports, automated speed testing now shows downloads, uploads and latency.
Also, we have lowered our pricing. A subscription to Extended is only $4.00/month for the holidays.
-
Today I noticed agent 129006 was online in my tools but it was reporting disconnected from outages.io portal.
I stopped the service, copied the log files.
After restarting the service, the agent is showing online again on your portal.I looked at the logs and I can access the IP address the agent is trying to connect to from the host so it is not a connectivity issue :
Let me know if you find something interesting.
-
Hi again,
Thanks for sharing this.
Got the log, (removed the shared url).
Looking at the log, the agent never crashed or stopped so what ever happened doesn't seem to be the agent itself that is stopping but something is preventing it from communicating.
Is it possible that agents are being blocked by something at the network edge or with the provider for some reason or another.
-
I don't think something is blocking it on the network, it's a very basic setup.
It really looks like the problem is coming from the agent as it started reporting as soon as I restarted the agent.
-
In both those logs, the agent just stopped communicating.
It wasn't experiencing a problem internally at least but the service either stopped abruptly or something blocked it so it could not continue.Really not sure what is happening at this point.
Is there any MS tool you could use to monitor the service so that we could get more information? You said nothing ever shows up in the server logs but we've never seen that happen. Not sure why this is happening either.
If we could get more system logs, that could help find what ever is happening and fix it if we know what it is.
All I can think of is that this version of MS is somehow different than newer versions but we spent months looking at this, making a few changes but now you're still seeing this.
I did a quick search and there are tools for MS that can help find the cause of crashing services/processes.
For example;
https://techcommunity.microsoft.com/t5/ask-the-performance-team/what-killed-my-process/ba-p/375329 -
I can try that and also share system logs but I would like some guidance (what to export, do I need to put any filter, what to configure with the process monitoring tool, etc)
Also, the process is not crashing or anything, I can see OtmWinClient.exe running. It just need to be restarted to report again. The problem is occurring less frequently than with the previous version of the agent for sure, but it is still happening sometimes.
-
Hi,
I'm sorry, I don't know how to use it, just something I found on the net doing a quick search.
All MS server products should have tools that help monitor processes to find problems.
The point is that there are tools available that can monitor a process on a deeper level to figure out what is happening.
Really not sure what to suggest. I think I would first suggest that you uninstall the logging version and throw it out since it won't be of any value. Get back to the normal version that you can download from the site and see how that goes.
The logging version isn't meant for long term use anyhow as it has a lot of extra baggage just to try and log what it's doing.
Maybe going back to the standard version on that server will solve the problem since you mentioned this doesn't seem to be happening so much lately with the new version.
-
It turns out the dev was able to see something in the log you shared and is testing something now.
He says it was very useful so hoping we'll see a fix soon. It takes a while to test but as soon as possible, I'll share the update with you.
-
Ok, I'll wait for an updated version
-
Believe me, we've worked on that so many times and what you see is the simplest we've been able to make it to date. This was after trying over a dozen different ways.
The keys are mainly to ensure that the agents are installed to the correct account. If the software didn't have to be tied to the application site, then it would be simpler but you would not have a central place to check on all locations.
Most people don't care because they have only one or two agents and typically, companies use the hardware agent which automatically updates itself.
By the way, the logs you sent were absolutely helpful. The dev was able to notice a condition that sends the service into a sort of spiral with backed up processed internally.
We'll release another update after testing. We're trying hard to get to what we call the final version.
-
Ok it's good you identified the bug and working on a fix. Hope we'll get to a stable agent at some point.
FYI for other tool that require an agent (Action1 or Itarian for example), I can directly download a customize agent from their website and my credentials are embedded in the installer. I just need to execute it and there is no need to input any credentials during installation or after, and the agents are self-updating.
I understand you don't have this use case now and you don't want to spend time on it, but just to let you know it became quite an industry standard nowadays.
-
There are a lot of things we'd like to add and something like this could be done but we simply don't have the resources at this time.
Most of our members are using the service for free while development and ongoing costs are very high. We actually want to have a free version that can help people and want to add features that organizations would be willing to buy.
We have a hard time advertising, constantly being outbid by large ISP's and outage sites that would prefer people never learn about a service like ours. We work mainly by word of mouth advertising.
I think it would be a good idea and I'll pass it along to development but I think you are the only person that's asked about this.
BTW, if the hardware agent could work for you, you don't have to ship them around the world. If you can buy them to have them shipped inside each country you are in, we offer a service where we could prep them from remote for $20.00 each.
This way, you would not have to update software, it would be done automatically each time there is a release.
-
No problem, I understand all of this.
I am not interested by the hardware agent because :
- delivery in our countries and each center will be very complicated to manage
- adding hardware in the existing computer rooms will look like additional complexity for the people working there who are afraid of touching digital equipment
- I like to have the agent on the server because there is sometimes a problem with the network cable of that server and your tool enables me to detect it.
-
I only brought it up because it might work at least in some locations for you. In terms of installation, they simply to connect it to a free DHCP port.
Anyhow, I have some good news. The log you sent seems to have been a lot more useful than I thought.
The dev found two things that could spiral some threads out of control causing the agent to stop communicating.It would explain why in some cases, it looks like the service is down but it's actually not, it's the agent itself that is stuck. In other cases, it might cause the service to get shut down depending on what's running on the server.
This kind of problem is the worse possible because it's so random making it very hard to test as the only thing we can do is wait for it to finally misbehave.
So it seems there will be another update as soon as this is fixed.
-
I'm glad you found the cause of the issue. I'll wait for you to release another update.
-
Hi,
You can remove any logging versions you have at this point and go back to the normal download one.
The special logging versions become useless once we update on our end.
The log definitely helped, it just takes a while to test since this is a problem that's eluded us for quite some time as you know.