Razberry--Strange device communication failures
Posted: 14 Jan 2019 23:02
Hello,
I've got a Razberry with 45 Z-wave devices throughout my house, mostly mains-powered light switches, as well as about 5 outlet plugs. Most are Z-wave, but some are Z-wave+.
I've recently started having an issue where a large number (~20) devices will show as failed. I fiddle with the system using the Expert UI for a while, and somehow things start working again. However, within a day or so (sometimes much sooner), they fail again. I believe it's the same set of devices that show as failed. Strangely, sometimes even when they show as failed, they still receive commands and update. I'm seeing in the job queue that these devices will even get the NoOperation ("Delivered"), but it'll still log a job that says the node is failed. This seems very strange to me.
I do have a script that polls many of the switches, as they are old and do not offer instant update. I made the script max out at 15 jobs in the queue, so it won't continuously flood the network if it gets stalled. Normally, this runs fine. I mention this as I have a hunch that increased network activity may trigger the failure. Most recently, things were running mostly fine, but a couple nodes were require retransmission (but succeeding, or would get marked failed and then soon after marked operating from the automatic NoOp check). I started a network reorganization that worked great for the first 10 or so nodes, then everything started failing after that and I was in the bad state again.
In previous cases of trying to troubleshoot this, I thought maybe the battery powered motion sensors I have may be causing issues, so I excluded those. Upon excluding a device, it seems Z-way does a number of things and it somehow would restore all functionality to normal. I also got this to happen by re-including existing switches that are near the Razberry. But I'd really like to get to the bottom of the failure, especially since I don't know a reliable way to recover remotely.
Finally, something strange has happened where I can't even turn on node inclusion or exclusion via the Expert UI. Those buttons simply do not do anything (seem to be disabled). I've never experienced that before and I'm not sure why. The state even survives full power cycles of the Raspberry Pi. So, I also need to figure out what is wrong there.
Thanks for any help!
I've got a Razberry with 45 Z-wave devices throughout my house, mostly mains-powered light switches, as well as about 5 outlet plugs. Most are Z-wave, but some are Z-wave+.
I've recently started having an issue where a large number (~20) devices will show as failed. I fiddle with the system using the Expert UI for a while, and somehow things start working again. However, within a day or so (sometimes much sooner), they fail again. I believe it's the same set of devices that show as failed. Strangely, sometimes even when they show as failed, they still receive commands and update. I'm seeing in the job queue that these devices will even get the NoOperation ("Delivered"), but it'll still log a job that says the node is failed. This seems very strange to me.
I do have a script that polls many of the switches, as they are old and do not offer instant update. I made the script max out at 15 jobs in the queue, so it won't continuously flood the network if it gets stalled. Normally, this runs fine. I mention this as I have a hunch that increased network activity may trigger the failure. Most recently, things were running mostly fine, but a couple nodes were require retransmission (but succeeding, or would get marked failed and then soon after marked operating from the automatic NoOp check). I started a network reorganization that worked great for the first 10 or so nodes, then everything started failing after that and I was in the bad state again.
In previous cases of trying to troubleshoot this, I thought maybe the battery powered motion sensors I have may be causing issues, so I excluded those. Upon excluding a device, it seems Z-way does a number of things and it somehow would restore all functionality to normal. I also got this to happen by re-including existing switches that are near the Razberry. But I'd really like to get to the bottom of the failure, especially since I don't know a reliable way to recover remotely.
Finally, something strange has happened where I can't even turn on node inclusion or exclusion via the Expert UI. Those buttons simply do not do anything (seem to be disabled). I've never experienced that before and I'm not sure why. The state even survives full power cycles of the Raspberry Pi. So, I also need to figure out what is wrong there.
Thanks for any help!