support retry on recovery #352

kongo2002 · 2016-11-02T12:20:38Z

Hi,

especially during recovery of huge event logs it is desirable to have some kind of retry-mechanism of the event log's replay. Otherwise a small hiccup on the storage backend that didn't respond in a timely manner will lead to an immediate shutdown of the EventsourcedView and therefore discard all of its progress so far.

Cheers,
Gregor

krasserm · 2016-11-03T06:27:06Z

eventuate-core/src/main/resources/reference.conf

@@ -44,6 +44,9 @@ eventuate {
    # resumed automatically after the replayed event batch has been handled
    # (= replay backpressure).
    replay-batch-size = 4096
+
+    # Maximum number of replay attempts before finally stopping the actor itself.
+    max-replay-attempts = 10


We have a naming convention for these kind of parameters (*-retry-max)

krasserm · 2016-11-03T06:29:50Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

@@ -133,6 +136,12 @@ trait EventsourcedView extends Actor with Stash {
    settings.replayBatchSize

  /**
+   * Maximum number of replay attempts before finally stopping the actor itself.
+   */
+  def maxReplayAttempts: Int =


This method is only overridden in tests. Why do we need this method instead of a special test config?

krasserm · 2016-11-03T06:31:08Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcingProtocol.scala

@@ -84,7 +84,7 @@ object EventsourcingProtocol {
  /**
   * Failure reply after a [[Replay]].
   */
-  case class ReplayFailure(cause: Throwable, instanceId: Int)
+  case class ReplayFailure(cause: Throwable, fromSequenceNr: Long, instanceId: Int)


Please name this parameter like the corresponding one in ReplaySuccess. They are semantically equivalent.

krasserm · 2016-11-03T06:33:25Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

+        // retry replay request while decreasing the remaining attempts
+        logger.warning("replay failed - {} attempts remaining [{}]", remainingAttempts, cause.getMessage)
+        context.become(initiating(remainingAttempts))
+        replay(from)


Whenever Eventuate retries something it also delays the retry by a configurable amount of time. This should be done here as well.

The Akka Streams adapter currently uses an adapter-specific read-retry-delay parameter for such a delay. When introducing a retry-delay for replay, the adapter should be changed to use that new config parameter instead.

krasserm · 2016-11-03T06:37:27Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

-      Try(onRecovery(Failure(cause)))
-      context.stop(self)
+    case ReplayFailure(cause, from, iid) => if (iid == instanceId) {
+      val remainingAttempts = replayAttempts - 1


Is remainingAttemps really needed? Set replayAttempts parameter accordingly and use it directly.

krasserm · 2016-11-03T06:37:57Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

+      val remainingAttempts = replayAttempts - 1
+      if (remainingAttempts < 1) {
+        // all replay attempts exceeded -> stop the actor
+        logger.error(cause, "replay failed ({} attempts exceeded), stopping self", maxReplayAttempts)


Is the number of attempts exceeded or reached?

krasserm · 2016-11-03T06:48:29Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedWriterSpec.scala

        val actor = unrecoveredEventsourcedWriter()
        actor ! "cmd"
        processRead(Success("rs"))
        processLoad(actor)
        processReplay(actor, 1)
        appProbe.expectMsg("cmd")
      }
+      "retry on failure" in {


retry what on which failure?

krasserm · 2016-11-03T06:52:49Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedWriterSpec.scala

+        logProbe.expectMsg(Replay(1, 2, None, instanceId))
+        logProbe.sender() ! ReplaySuccess(Nil, 0L, instanceId)
+
+        appProbe.expectMsg("cmd")


Why do you test command stashing here again?

that way we can easily ensure that the recovery succeeds after all I think

krasserm · 2016-11-03T06:54:27Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedViewSpec.scala

      msgProbe.expectMsg(TestException)
    }
+    "retry recovery on replay failure" in {


is recovery or replay retried?

krasserm

LGTM after including new review comments

krasserm · 2016-11-04T06:27:06Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

-      context.stop(self)
+    case ReplayFailure(cause, progress, iid) => if (iid == instanceId) {
+      if (replayAttempts < 1) {
+        // all replay attempts exceeded -> stop the actor


maximum number of replay attempts reached? Is that comment needed at all? It just repeats what the logger.error in the next line says

krasserm · 2016-11-04T06:29:15Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

+    case ReplayFailure(cause, progress, iid) => if (iid == instanceId) {
+      if (replayAttempts < 1) {
+        // all replay attempts exceeded -> stop the actor
+        logger.error(cause, "replay failed ({} retries reached), stopping self", settings.replayRetryMax)


replay failed (maximum number of {} replay attempts reached), stopping self (see also next comment)

krasserm · 2016-11-04T06:38:21Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

+      } else {
+        // retry replay request while decreasing the remaining attempts
+        val attemptsRemaining = replayAttempts - 1
+        logger.warning("replay failed - {} attempts remaining [{}] - scheduling retry in {}ms",


logger.warning("replay failed: [{}] ({} replay attempts remaining), scheduling retry in {}ms", cause.getMessage, attemptsRemaining, settings.replayRetryDelay.toMillis) (more consistent with previous logger.error)

krasserm · 2016-11-04T06:39:43Z

eventuate-core/src/main/scala/com/rbmhtechnology/eventuate/EventsourcedView.scala

    }
+    case ReplayRetry(from) =>


from -> progress like in matching ReplaySuccess and ReplayFailure?

krasserm · 2016-11-04T06:45:30Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedViewSpec.scala

@@ -536,3 +536,44 @@ class EventsourcedViewSpec extends TestKit(ActorSystem("test")) with WordSpecLik
    }
  }
 }
+
+object EventsourcedViewReplaySpec {


Should be called EventsourcedViewReplayRetrySpec because other replay tests run elsewhere.

krasserm · 2016-11-04T06:47:37Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedViewSpec.scala

+  import EventsourcedViewSpec._
+  import EventsourcingProtocol._
+
+  val instanceId: Int = EventsourcedView.instanceIdCounter.get


This will break as soon as you add a new test below. Add to beforeEach.

krasserm · 2016-11-04T06:47:58Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedViewSpec.scala

+
+  val instanceId: Int = EventsourcedView.instanceIdCounter.get
+  val logProbe: TestProbe = TestProbe()
+  val msgProbe: TestProbe = TestProbe()


Same for these two probes

krasserm · 2016-11-04T06:50:02Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedViewSpec.scala

+
+      msgProbe.expectMsg(TestException)
+    }
+  }


No success test case?

krasserm · 2016-11-04T06:50:42Z

eventuate-core/src/test/scala/com/rbmhtechnology/eventuate/EventsourcedWriterSpec.scala

+        logProbe.sender() ! ReplaySuccess(Nil, 0L, instanceId)
+
+        appProbe.expectMsg("cmd")
+      }


No failure test case?

krasserm

LGTM. Please squash and push. Thanks for your contribution, Gregor!

krasserm suggested changes Nov 3, 2016

View reviewed changes

krasserm self-assigned this Nov 3, 2016

krasserm reviewed Nov 4, 2016

View reviewed changes

krasserm added the event sourcing label Nov 4, 2016

krasserm added this to the 0.8 milestone Nov 4, 2016

krasserm approved these changes Nov 4, 2016

View reviewed changes

support retry on recovery

f5b5358

kongo2002 force-pushed the recovery-retry branch from 8e8686b to f5b5358 Compare November 4, 2016 12:16

krasserm merged commit 331d6cd into RBMHTechnology:master Nov 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support retry on recovery #352

support retry on recovery #352

kongo2002 commented Nov 2, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

krasserm Nov 3, 2016

kongo2002 Nov 3, 2016

krasserm Nov 3, 2016

krasserm left a comment

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm Nov 4, 2016

krasserm left a comment

support retry on recovery #352

support retry on recovery #352

Conversation

kongo2002 commented Nov 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krasserm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krasserm left a comment

Choose a reason for hiding this comment