Snapshot test #1284

gluk256 · 2019-03-06T11:53:11Z

Tests the message propagation in simulated networks.

supercedes #1016

closes #987

holisticode

Great work! Some of my comments are redundant, and some are just about asking more comments / docs.

I can't really assess the semantics of the tests though, I lack understanding of what the desired behavior really is, also taking into consideration @nolash 's comment in gitter about the issues of these tests not being overcome, I don't know what else needs to be done

swarm/network/simulation/kademlia.go

holisticode · 2019-03-06T21:18:50Z

swarm/pss/snapshot_test.go

+		var prev int
+
+		// loop through all nodes and add the message to recipient indices
+		for _, nod := range sim.Net.GetNodes() {


There are a few implicit assumptions here which are not clear for someone not really informed about what needs to be done (including me). Please add more documentation of what's going on

holisticode · 2019-03-06T21:19:27Z

swarm/pss/snapshot_test.go

+			}
+
+			if po >= depth {
+				maxMessages++


For example, why increase the number of maxMessages only if po >= depth?

because in this case we expect the msg to be delivered to this node

holisticode · 2019-03-06T21:19:38Z

swarm/pss/snapshot_test.go

+				allowedMsgs[nod.ID()] = append(allowedMsgs[nod.ID()], uint64(i))
+			}
+
+			// a node with the smallest PO (wrt msg) will be the sender


? Why is this?

in order to see if the msg will propagate properly

@holisticode or in other words; we want to maximize the distance the message has to travel. Therefore we select the node that's farthest away from the message address.

holisticode · 2019-03-06T21:23:22Z

swarm/pss/snapshot_test.go

+	return snap
+}
+
+func assingTestVariables(sim *simulation.Simulation, msgCount int) {


I don't understand the logic of when a node is supposed to receive a message and when not. This should be described somewhere (apologies if it's there and I couldn't find or understand it)

I agree. The test needs a very clear description on the feature it tests and how it tests it. This is pretty complicated stuff.

holisticode · 2019-03-06T21:24:33Z

swarm/pss/snapshot_test.go

+		// however, it might just mean that not all possible messages are received
+		// now we must check if all required messages are received
+		log.Debug("--------------------------------------------------------------------------------", "rcv", msgCnt)
+		if msgCnt < msgsToReceive {


I don't understand the difference between msgsToReceive and maxMessages

zelig · 2019-03-06T22:28:02Z

swarm/pss/snapshot_test.go

+	return snap
+}
+
+func assingTestVariables(sim *simulation.Simulation, msgCount int) {


assingTestVariables
?

zelig · 2019-03-06T22:30:33Z

swarm/pss/snapshot_test.go

+}
+
+func TestProxNetwork(t *testing.T) {
+	if (*runNodes > 0 && *runMessages == 0) || (*runMessages > 0 && *runNodes == 0) {


set proper defaults, no need to check

this is needed in case of invalid explicit input

I think it's better to add the flag in the actual test itself, since these flags are only relevant for this specific test - unlike the pss_test.go flags, which apply to the full package test. Keep in mind you will need flag.Parse() too in that case (you get it for free now with pss_test.go:init()

zelig · 2019-03-06T22:40:24Z

swarm/pss/snapshot_test.go

+	t.Logf("completed %d", result.Duration)
+}
+
+func sendAllMsgs(sim *simulation.Simulation, msgs [][]byte, senders map[int]enode.ID) {


why use rpc here? just 'bucket' the pss object

actually, i am not sure. i inherited this code, and it works :)

@zelig all the send and network tests in pss_test.go use RPC. If we are to change this, please let's change it in one go when we clean up the test files, which is a pending task anyway.

However, let's also keep in mind that most usage of Pss is indeed through RPC. Thus I don't think it's unreasonable to use RPC calls in these higher level tests.

zelig · 2019-03-06T22:41:43Z

swarm/pss/snapshot_test.go

+			}
+		}
+	}
+	return nil


unreacable code?

some compilers might not detect this, and wihtout this statement they will complain that function does not return value.

zelig · 2019-03-06T22:44:10Z

swarm/pss/snapshot_test.go

+		select {
+		case <-doneC: // graceful exit
+			setDone()
+			errC <- nil


this will block if doneC is closed on line 263, no?

zelig · 2019-03-06T22:47:02Z

swarm/pss/snapshot_test.go

@@ -0,0 +1,433 @@
+package pss


this filename is incorrect. only testing prox

zelig · 2019-03-06T22:48:27Z

swarm/pss/pss.go

-		if err := p.enqueue(pssmsg); err != nil {
-			return err
-		}
+	if len(pssmsg.To) < addressLength || prox {


you can submit this fix separately if you want quick result :) the rest will take a bit...

I warmly encourage this, too.

zelig · 2019-03-06T22:53:08Z

swarm/pss/snapshot_test.go

+// within their nearest neighborhood depth, and stores them as recipients.
+// Upon sending the messages, it verifies that the respective message is passed to the message handlers of these recipients.
+// It will fail if a recipient handles a message it should not, or if after propagation not all expected messages are handled (timeout)
+func testProxNetwork(t *testing.T) {


your test is functionally LGTM but i feel it could be improved and simplified.

global vars

RPC vs API call via bucket object

snapshot handling?

flags

conn labels

I concur on global vars, snapshot handling and flags. My opinions given in individual comments.

nolash · 2019-03-08T08:00:40Z

swarm/pss/snapshot_test.go

+	mu    sync.Mutex // keeps handlerDonc in sync
+	sim   *simulation.Simulation
+
+	handlerDone   bool // set to true on termination of the simulation run


I see you still are using globals in the test. I've already given my opinion on this; I think it is risky design, and I have even gotten scorn from the golang irc channel for doing the same myself at some point (code smell, I believe they called it).

At a minimum I would recommend creating a struct that can encapsulate the state and all methods accessing the state (isDone(), setDone()), which every relevant method gets passed. I will not recommend merging this PR without this amendment.

I would also recommend that we look at how to refactor this test to fit the action/trigger/expectation paradigm of p2p/simulations.go:Simulation but I recommend that can be a separate PR after this.

nolash · 2019-03-08T08:07:37Z

swarm/pss/snapshot_test.go

+	t.Logf("completed %d", result.Duration)
+}
+
+func sendAllMsgs(sim *simulation.Simulation, msgs [][]byte, senders map[int]enode.ID) {


@zelig all the send and network tests in pss_test.go use RPC. If we are to change this, please let's change it in one go when we clean up the test files, which is a pending task anyway.

However, let's also keep in mind that most usage of Pss is indeed through RPC. Thus I don't think it's unreasonable to use RPC calls in these higher level tests.

nolash · 2019-03-08T08:19:07Z

swarm/network/simulation/kademlia.go

@@ -96,3 +98,100 @@ func (s *Simulation) kademlias() (ks map[enode.ID]*network.Kademlia) {
 	}
 	return ks
 }
+
+func (s *Simulation) WaitTillSnapshotRecreated(ctx context.Context, snap simulations.Snapshot) error {
+	expected := listSnapshotConnections(snap.Conns)


Please use prefix get not list ... we don't seem to use "list" for this behavior elsewhere:

find $GOPATH/src/github.com/ethereum/go-ethereum/swarm/ -iname "*.go" -exec grep -P "list\S*\(" {} \;

nolash · 2019-03-08T08:23:53Z

swarm/network/simulation/kademlia.go

+func isAllDeployed(expected []uint64, actual []uint64) bool {
+	exp := make([]uint64, len(expected))
+	copy(exp, expected)
+	if len(exp) > 0 {


cosmetic nitpick; rather check len(expected) before anything else

swarm/network/simulation/kademlia.go

nolash · 2019-03-08T08:45:17Z

swarm/pss/snapshot_test.go

+				allowedMsgs[nod.ID()] = append(allowedMsgs[nod.ID()], uint64(i))
+			}
+
+			// a node with the smallest PO (wrt msg) will be the sender


@holisticode or in other words; we want to maximize the distance the message has to travel. Therefore we select the node that's farthest away from the message address.

nolash · 2019-03-08T08:47:14Z

swarm/pss/snapshot_test.go

+
+		msgsToReceive += len(targets)
+		for _, id := range targets {
+			recipients[i] = append(recipients[i], id)


why can't you assign directly to recipients[] above? (if you implement a struct to hold the state this will be a single method call to update both arrays).

because in some cases targets might be reset (if new closest is found).
please see line 157:

targets = nil

nolash · 2019-03-08T08:48:58Z

swarm/pss/snapshot_test.go

+			po, _ := pof(msgs[i], nodeAddrs[nod.ID()], 0)
+			depth := kademlias[nod.ID()].NeighbourhoodDepth()
+
+			// only nodes with closest IDs (wrt msg) will receive the msg


only nodes with closest IDs

This doesn't really explain the distinction between "target" and "allowed." Can you please be more specific?

nolash · 2019-03-08T08:50:36Z

swarm/pss/snapshot_test.go

+	if err != nil {
+		t.Fatalf("failed to recreate snapshot: %s", err)
+	}
+	assingTestVariables(sim, msgCount)


Please correct this typo; assing -> assign

nolash · 2019-03-08T08:53:24Z

swarm/pss/snapshot_test.go

+		case hn := <-msgC:
+			received++
+			log.Debug("msg received", "msgs_received", received, "total_expected", msgsToReceive, "id", hn.id, "serial", hn.serial)
+			if received >= maxMessages {


Why would we allow received > maxMessages?

nolash · 2019-03-08T08:59:22Z

For the record, during meeting yesterday the neighborhood reciprocity issue was discussed with @zelig, and we agreed that indeed the functionality tested here - where only the closest peer to the message can be guaranteed recipient - is the best we can do, at least according to the thinking of the current time.

nolash · 2019-03-14T06:44:40Z

swarm/network/simulation/kademlia.go

@@ -96,3 +98,107 @@ func (s *Simulation) kademlias() (ks map[enode.ID]*network.Kademlia) {
 	}
 	return ks
 }
+
+// WaitTillSnapshotRecreated is blocking until all the connections specified
+// in the snapshot are actually up and running.


Actually not quite precise. It's until all the connections are registered in the kademlia*

nolash · 2019-03-14T06:47:41Z

swarm/network/simulation/kademlia.go

 	for _, c := range conns {
 		res = append(res, getConnectionHash(c.One, c.Other))
+		c.String()


swarm/network/simulation/kademlia_test.go

nolash · 2019-03-14T07:16:06Z

swarm/pss/prox_test.go

 	if err != nil {
 		t.Fatal(err)
 	}
-	ctx, cancel := context.WithTimeout(context.Background(), time.Second*3)
+	ctx, cancel := context.WithTimeout(context.Background(), time.Second*60)


Do we need different timeouts for different tests params?

i don't see how i can do better

Do we have any idea how long those longrunning tests take to complete?

nolash · 2019-03-14T07:17:56Z

swarm/pss/prox_test.go

-func runFunc(ctx context.Context, sim *simulation.Simulation) error {
-	go handlerChannelListener(ctx)
-	go sendAllMsgs(sim, msgs, senders)
+func runFunc(tstdata *testData, ctx context.Context, sim *simulation.Simulation) error {


Please comment the individual functions.

nolash · 2019-03-14T07:19:58Z

swarm/pss/prox_test.go

 	if err != nil {
 		t.Fatalf("failed to recreate snapshot: %s", err)
 	}
-	assingTestVariables(sim, msgCount)
-	result := sim.Run(ctx, runFunc)
+	initializeTestData(&tstdata, msgCount)


Again, please consider inline comments in the tests. Here, for example,

// initialize and run the test

it may seem mundane, but I find it really helps readability.

nolash · 2019-03-14T07:21:17Z

swarm/pss/prox_test.go

+			log.Debug("msg received", "msgs_received", received, "total_expected", tstdata.requiredMessages, "id", hn.id, "serial", hn.serial)
+			if received == tstdata.allowedMessages {
+				tstdata.doneC <- struct{}{}
+				close(tstdata.doneC)


Isn't it enough to merely close here?

swarm/pss/prox_test.go

zelig

pss change as PR - can quickly merge
simulation change another - see swarm/simulation: wait till snapshot connections are recreated #1298
then the pure prox test simulation, using constructors instead of init

zelig · 2019-03-14T21:01:56Z

swarm/network/simulation/kademlia.go

@@ -96,3 +98,106 @@ func (s *Simulation) kademlias() (ks map[enode.ID]*network.Kademlia) {
 	}
 	return ks
 }
+


ideally we should submit kademlia changes as a separate PR tracking as #1298

zelig · 2019-03-14T21:04:24Z

swarm/pss/prox_test.go

+	return int(msgCount), int(nodeCount)
+}
+
+func readSnapshot(t *testing.T, nodeCount int) simulations.Snapshot {


should this not move under swarm/network/simulation?

no, this is just a helper function, only relevant to this particular test, and not needed otherwise.

zelig · 2019-03-14T21:05:08Z

swarm/pss/prox_test.go

+func initializeTestData(d *testData, msgCount int) {
+	log.Debug("TestProxNetwork start")
+	d.nodeAddrs = make(map[enode.ID][]byte)
+	d.recipients = make(map[int][]enode.ID)


yes make it in the constructor?

zelig · 2019-03-14T21:05:59Z

swarm/pss/prox_test.go

+
+// Here we test specific functionality of the pss, setting the prox property of
+// the handler. The tests generate a number of messages with random addresses.
+// Then, for each message it calculates which nodes in the network the msg address


which nodes in the network have the msg

zelig · 2019-03-14T21:07:31Z

swarm/pss/prox_test.go

+// recipients. The difference between allowed and required recipients results
+// from the fact that the nearest neighbours are not necessarily reciprocal.
+// Upon sending the messages, the test verifies that the respective message is
+// passed to the message handlers of these required recipients. Test will fail


The test fails if

zelig · 2019-03-14T21:15:01Z

swarm/pss/prox_test.go

+}
+
+// runFunc is the main test function, called by Simulation.Run()
+func runFunc(tstdata *testData, ctx context.Context) error {


could we rename this?

zelig · 2019-03-14T21:15:33Z

swarm/pss/prox_test.go

+	t.Logf("completed %d", result.Duration)
+}
+
+func sendAllMsgs(tstdata *testData) {


this could be a func with testData receiver no?

nolash · 2019-03-15T07:09:59Z

swarm/network/simulation/kademlia_test.go

+
+	if !isAllDeployed(b, c) {
+		t.Fatal("isAllDeployed failed")
+	}


... and the positive? :)

this condition remains valid (if i understand your question correctly)

nolash · 2019-03-15T07:19:45Z

swarm/pss/prox_test.go

+func initializeTestData(d *testData, msgCount int) {
+	log.Debug("TestProxNetwork start")
+	d.nodeAddrs = make(map[enode.ID][]byte)
+	d.recipients = make(map[int][]enode.ID)


func newTestData() *tstData { return &tstData{ d.nodeAddrs: make(map[enode.ID][]byte), d.recipients: make(map[int][]enode.ID), d.allowed: make(map[int][]enode.ID), d.expectedMsgs: make(map[enode.ID][]uint64), d.allowedMsgs: make(map[enode.ID][]uint64), d.senders: make(map[int]enode.ID), handlerC: make(chan handlerNotification), doneC: make(chan struct{}), errC: make(chan error), msgC: make(chan handlerNotification), kademlias: map[enode.ID]*network.Kademlia, } } func (d *tstData) init() { log.Debug("TestProxNetwork start") for _, nodeId := range d.sim.NodeIDs() { d.nodeAddrs[nodeId] = nodeIDToAddr(nodeId) } [....] } func testProxNetwork(t *testing.T) { testData := newTestData() [...] }

nolash

Thanks for your patience :)

zelig · 2019-03-15T10:48:31Z

@gluk256 approved as is, but consider following up with #1298

zelig · 2019-03-16T10:00:47Z

merged as ethereum/go-ethereum#19278

gluk256 added 5 commits March 1, 2019 16:36

swarm/pss: fixed bug in pss.process, test added

0f4db1c

swarm/pss: test case updated

1c94504

swarm/pss: WaitTillSnapshotRecreated() func added

243724e

swarm/pss: snapshot test updated

f322700

swarm/pss: WaitTillSnapshotLoaded() fixed

1d7aa22

gluk256 requested a review from nolash March 6, 2019 11:54

gluk256 self-assigned this Mar 6, 2019

gluk256 requested a review from holisticode March 6, 2019 11:54

gluk256 added the ready for review label Mar 6, 2019

swarm/pss: gofmt applied

4de05a3

holisticode reviewed Mar 6, 2019

View reviewed changes

zelig suggested changes Mar 6, 2019

View reviewed changes

zelig mentioned this pull request Mar 8, 2019

swarm/pss: Network tests with prox handlers #1016

Closed

nolash suggested changes Mar 8, 2019

View reviewed changes

nolash added pss push-sync labels Mar 8, 2019

gluk256 added 8 commits March 8, 2019 19:30

swarm/pss: refactoring, file renamed

9f1b685

swarm/pss: input data fixed

acc14ba

swarm/pss: race condition fixed

3105ab6

swarm/pss: test timeout increased

26ac00d

swarm/pss: eliminated the global variables

3fcebc1

swarm/pss: tests added

15ba825

swarm/pss: comments added

7657525

swarm/pss: comment fixed

eda75e1

holisticode approved these changes Mar 14, 2019

View reviewed changes

zelig mentioned this pull request Mar 14, 2019

swarm/simulation: wait till snapshot connections are recreated #1298

Closed

nolash suggested changes Mar 14, 2019

View reviewed changes

swarm/pss: refactored according to review

b86a8ac

zelig suggested changes Mar 14, 2019

View reviewed changes

nolash suggested changes Mar 15, 2019

View reviewed changes

swarm/pss: style fix

144e477

nolash approved these changes Mar 15, 2019

View reviewed changes

swarm/pss: increased timeout

ec45587

zelig approved these changes Mar 15, 2019

View reviewed changes

gluk256 mentioned this pull request Mar 15, 2019

swarm/pss: negihbourhood addressing simulation tests ethereum/go-ethereum#19278

Merged

nolash added submitted to ethereum/go-ethereum and removed ready for review labels Mar 15, 2019

zelig closed this Mar 16, 2019

zelig mentioned this pull request Mar 20, 2019

kademlia, connectivity, stability -- meta #1068

Closed

53 tasks

Snapshot test #1284

Snapshot test #1284

Conversation

gluk256 commented Mar 6, 2019

holisticode left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gluk256 Mar 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gluk256 Mar 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gluk256 Mar 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolash Mar 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gluk256 Mar 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolash commented Mar 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolash left a comment

Choose a reason for hiding this comment

zelig commented Mar 15, 2019

zelig commented Mar 16, 2019

gluk256 Mar 7, 2019 •

edited

Loading

gluk256 Mar 7, 2019 •

edited

Loading

gluk256 Mar 7, 2019 •

edited

Loading

nolash Mar 8, 2019 •

edited

Loading

gluk256 Mar 8, 2019 •

edited

Loading