day-long outage not recorded

OutagesIO_Support

The agent is supposed to keep track of outages, no matter how long they are, seconds, minutes, hours, days and then some. So long as it doesn't get turned off (in windows) or rebooted in hardware, then it will send its report once the Internet is back up.

In some business cases, this is an important even to know about, especially if it was hours or days long.

The issue in this case seems to be that because it's a hardware agent, it rebooted automatically while it was logging the outage and therefore, the outage report was lost.

No need to be puzzled, no one has mentioned this before. In most business cases, IT people respond quickly to problems so even seeing inactive or disconnected is enough for them to take action. In cases like remote sensors for example where they are Internet connected, it might go unnoticed as unimportant but to someone, that report can be quite important.

The issue is that even if you wanted to know about outages that were 15+ minutes long, the agent could not tell you that if it was not able to get online for 24hrs for example. That's why we have the inactive and disconnected notifications.

OutagesIO_Support

I forgot to add that the hardware agent also reboots daily to ensure no memory or other corruption and because it gets any updates that have become available.

jwladd

@OutagesIO_Support
Maybe I didn't have notifications set up properly on the dashboard, but I never received a notification about an inactive agent or disconnected agent. Possibly a notification was not in my primary Google inbox. what would be the sender's email address? I could use that to search my email.

OutagesIO_Support

I believe it's always 'noreply@outages.io' or 'noreply@outagesio.com'. I think there are some services still using 'outages.io' here and there on the back end.

If you share the agent and the date again, I can look in the logs. We get a copy of all outgoing email notifications specifically to always know that emails are going out.

jwladd

agent # 129676 out from ~ 5/14 1:12 until ~ 5/16 3:24
When I look at the historical record for this time period, nothing shows up.

OutagesIO_Support

This is what I see before and after that range;

These are the emails sent;

And the reasons why;

jwladd

I can't see emails older than a month if they were in Span or Trash. Let's assume that I had email notifications set wrong during the time period May 14-May 16, so let's ignore the email question for the time being. However, I don't see the inactive events on my dashboard that you find in your logs.

OutagesIO_Support

Those are only available on the back end to help us with support.

jwladd

Also, as I noted in my initial post in this thread, I am fairly certain that I my Google IoTs were offline for a couple of days. If I understand correctly the log info that you just sent me, my internet service was out from 5/13 22:15 to 5/15 23:01; however the historical log of outages that I have shows short outages on 5/14 that implies internet connectivity on most of 5/14.

OutagesIO_Support

Keep in mind that the service is really meant only to monitor from your lan to and out of your provider. Anything 'beyond' is worthless since monitoring the Internet as a whole is not our function.

What I can say is you are experiencing most of the problems around 3am and 10pm and they are all very short other than the long one we've been talking about.

I would discount anything beyond the provider as the first step as those are only informational.
I would discount all inactive as those only mean your agent was not communicating for up to 30 seconds and that could be for any reason.
I would discount any Secured servers related ones also as those mean the agent wasn't able to reach our network but it made it past your ISP which is what really matters.

In other words, what you want to monitor is your provider only. If you see local network issues, those are things you have to look into locally and the rest is mainly about your provider.

jwladd

What I want to monitor is my internet connectivity at a remote site. It's important to know whether the cause is local or my provider, but the first order question is "Is my remote site connected to the internet?" I thought that was the point of your hardware agent. The back end logs available to you see to be different from the front end logs available to me. I guess I am misunderstanding the system.

OutagesIO_Support

You aren't misunderstanding teh service but maybe I'm explaining too many things :).
Yes, the service is specifically to monitor the provider/communications at remote locations so you're use is what it's for.
The additional information I'm showing you is simply to help us talk about what you're asking about. That extra information is simply our own in-house logging so we can help members.

I'm not sure at this point where there is a disconnect in our conversation.

jwladd

Thanks for your patience. Let's start by getting me to understand your back end log. Does this log indicate no communication between Arizona and my agent from 5/13 22:15 to 5/15 23:01?.

SBK

Hi John,
Gimme some time to review what has been said here so I can try to clarify the doubts

jwladd

I'll be away for the rest of today, but will resume conversation on Sunday. Thanks for your time.

SBK

First of all let me clarify some terms we are using to avoid confusion or misunderstanding:

outage, is an IP outage that is reported by the agent
inactive/disconnected/online, are statuses that we check against the agent from our network (read this)

With this in mind, outages are a confirmation of the inactive/disconnected statuses our network detects while "polling" the agent.

The important thing shown in previous comments is this sequence:

Inactive 2023-05-13 22:15:53
Disconnected 2023-05-13 22:45:24
Online 2023-05-15 23:01:05
Inactive 2023-05-17 08:00:12

This means your agent was no more available from our "polling" on May 13 @ 22:15 and resumed on May 15 @ 23:01

My interpretation of the above is twofold:

we definitely had no communication with the agent between May 13 @ 22:15 and May 15 @ 23:01 but it was not confirmed by an outage
the date/time you gave us are not the ones we see

Our goal is to give more info to our customers to be able to troubleshoot and sometimes non IP outages (explanation here) are the more complex to find and highlight

Maybe in the future we could release to the public some of the internal tools we are now testing to give a better experience!

jwladd

Thank you for your thoughtful response! I will study it and possibly respond with more questions. For the moment my only question is why the historical record of inactive/disconnected that I find on my dashboard is different from the historical record of inactive/disconnected that you record on the back end?

OutagesIO_Support

As explained earlier, it's for us to know some of the events that happened so that we can help members.
Some things have no value is being offered to members because they might be more confusing than useful and it's the case for these logs.

Ed might be able to explain it better if this is not clear.

SBK

@jwladd

I bet you are calling "historical record of inactive/disconnected" what is really an historical alerts list which is not the same thing.

Dashboard alerts assume there is a person controlling them from the GUI taking care of acknowledge them and reset.
Just to clarify with an example, let's suppose you have an "inactive alert" at a certain time, if that one is not acknowledged and reset even if the agent comes back online and then again goes inactive you only will see one event.

On the other hand we keep trace of every single event.

So if you compare the historical alerts with what I commented last time you will be missing some of them.

Hope this clarifies that part

jwladd

I didn't realize I was to acknowledge an alert. I see that once I acknowledge an alert, I need to then do a reset. It all makes sense as I learn more. thanks for your awesome support.

Information and Support

day-long outage not recorded