Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential mesh.checkConnection() improvement #249

Closed
TMRh20 opened this issue Jun 24, 2024 · 6 comments · Fixed by #250
Closed

Potential mesh.checkConnection() improvement #249

TMRh20 opened this issue Jun 24, 2024 · 6 comments · Fixed by #250

Comments

@TMRh20
Copy link
Member

TMRh20 commented Jun 24, 2024

Right now mesh.checkConnection() is a bit intensive on the network and especially the master node, because each node will, in turn, check its connection which requires an address lookup + reception + response from the master node.

I am thinking about a change to the general checkConnection() function as follows:

template<class network_t, class radio_t>
bool ESBMesh<network_t, radio_t>::checkConnection()
{
    RF24NetworkHeader header;
    header.to_node = network.parent();
    header.type = NETWORK_PING;
    for(uint8_t i = 0; i < MESH_CONNECTION_CHECK_ATTEMPTS; i++){
      if(network.write(header, 0, 0)){
        return 1;
      }
    }
    return 0;
    
}

Essentially, each node would only be contacting their parent node directly with an auto-ack message. This way if say top nodes like 05 and 04 lose their connection for a short time, it could potentially take a bit longer for the mesh to partially fall apart in some cases, then reconverge. This might be a good thing, because other nodes should take up the top spots in a short period of time.

This will make it much easier for distant nodes to verify their connectivity as well.

This would leave it up to users to call getAddress(); if they want the old method of connection checking. The whole theory here is to make the mesh less dependent on the master node.

In any case, this is just up for discussion right now.

@2bndy5
Copy link
Member

2bndy5 commented Jun 24, 2024

This would definitely be acceptable.

Should it also safeguard from using checkConnection() when mesh_address == 04444? I'm guessing that such a condition would still return accurate info under the given proposal.

The current implementation has an added prevention about calling checkConnection() on the master node embedded into getAddress(). This prevention should probably be added the proposal here as well.

The docs would need to be clarified as well. Something like:

checkConnection() only pings the current node's parent to avoid network/bandwidth congestion. If validation should depend on a response from the master node, then getAddress() can be used instead:

if (mesh.getAddress(nodeID) < 0) {
   // not connected
}

Even though _node_id is currently a private member, user code should/can be aware of the node's ID since that is supposed to be managed by the network admin anyway.

@TMRh20
Copy link
Member Author

TMRh20 commented Jun 24, 2024

Should it also safeguard from using checkConnection() when mesh_address == 04444?

Yes, it should return immediately. Good call.

...added prevention about calling checkConnection() on the master node

Yes that is probably a good idea as well

I will do some more testing before putting in a PR, but I think this would be a good change.

@2bndy5
Copy link
Member

2bndy5 commented Jun 25, 2024

This way if say top nodes like 05 and 04 lose their connection for a short time, it could potentially take a bit longer for the mesh to partially fall apart in some cases, then reconverge

I especially like this bit. It seems much better suited to the name "mesh".

@TMRh20
Copy link
Member Author

TMRh20 commented Jun 25, 2024

Wondering if we should do a define or something, so the old behaviour is easier to get to?

Like #define RF24MESH_CONN_CHECK_TYPE and let advanced users select the old methodology if required.
Its easy to add a few lines of code, but its easier to un-comment a single line.

@2bndy5
Copy link
Member

2bndy5 commented Jun 25, 2024

It is a good idea. Since it is a binary toggle, I think we could further eliminate the magic number used as a macro value

#define RF24MESH_CONN_CHECK_PARENT 1
#define RF24MESH_CONN_CHECK_MASTER 0
#define RF24MESH_CONN_CHECK_TYPE RF24MESH_CONN_CHECK_PARENT
// To use old behavior: 
// #define RF24MESH_CONN_CHECK_TYPE RF24MESH_CONN_CHECK_MASTER

Then in function body

#if RF24MESH_CONN_CHECK_TYPE == RF24MESH_CONN_CHECK_PARENT
// New behavior
#else 
// Old behavior 
#endif

@TMRh20
Copy link
Member Author

TMRh20 commented Jun 26, 2024

ConnTime2

The above chart shows how long a group of 10 nodes in my radio network have been connected to the MQTT server, as interference and re-organization of the mesh interferes with this and nodes will disconnect/reconnect from time to time.

All nodes are reporting in decimals (hours) except for nodes 13, 5 and 14 which are in minutes. Node 2, 8, 9, 11, 16 and 240 are all running the updated checkConnection() function.

As can be seen, the only node with a PA+LNA module, node 2, has been connected for 12.22 hours, with no loss of connectivity to the MQTT server. The next best is node 11 with 5.35hrs. This is on a very busy mesh with 17 nodes reporting in every few seconds.

The following is over a number of hours of operation showing the success rate of pinging a RPi5 close to the master node.
PingGuage2

Its starting to look like a significant improvement in overall connection time and connectivity in general.

TMRh20 added a commit that referenced this issue Jun 28, 2024
- Adjust mesh.checkConnection function to verify connectivity with parent only. Leave old functionality in place with a #define
- Suggestions in place per @2bndy5
closes #249
@TMRh20 TMRh20 closed this as completed in 2f5aea2 Jun 30, 2024
2bndy5 added a commit that referenced this issue Jul 7, 2024
* mesh.checkConnection via parent not master

- Adjust mesh.checkConnection function to verify connectivity with parent only. Leave old functionality in place with a #define
- Suggestions in place per @2bndy5
closes #249

* Fix last commit - _nodeID not nodeID

* review changes

* trigger compile size reports for PRs

Specifically for PRs that only change the lib src code and not examples.

* doc review

use doxygen references

* [docs] clarify return value of checkConnection()

* format RF24Mesh.h

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* review supported CMake options

also defaults RF24_DRIVER to SPIDEV (per nRF24/RF24#971)

* add reference to new config macro in checkConnection() doc.

---------

Co-authored-by: Brendan <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants