Refactor `Miner` to use lock propagating technique #7079

0x7CFE · 2017-11-17T06:23:03Z

The problem

Currently, there are several places in the Miner code that contain the following note:

// --------------------------------------------------------------------------
// | NOTE Code below requires transaction_queue and sealing_work locks.     |
// | Make sure to release the locks before calling that method.             |
// --------------------------------------------------------------------------

As far as I understand, this is done to prevent deadlocks in case when someone tries to acquire a lock that would also be acquired somewhere down the stack.

Unfortunately, such approach is not ideal. Since such contract is not expressed in the code, compiler could not check it at runtime, because rustc does not understand English (yet?).

Another major limitation is refactoring safety. People tend to make mistakes during large code refactoring because of a limited attention budget and because they typically thinking about something else.

tl;dr:

I propose to refactor locking code to make lock requirements explicit. It may be achieved by accepting resource references as method parameters. Luckily we're using Rust that makes it easy and safe by statically checking all references to be valid.

Medical history

Consider the following (simplified) code:

/// Some resource to be accessed concurrently
struct Resource;

/// Struct that accesses the resource in its methods
struct User {
    resource: Mutex<Resource>;
}

impl User {
    fn foo(&self) {
        // (some code specific to foo)

        // use the resource
        let resource = self.resource.lock();
        resource.do_something();
    }

    fn bar(&self) {
        // (some code specific to bar)

        // use the resource
        let resource = self.resource.lock();
        resource.do_something();
    }
}

Since we have duplicated functionality in both methods, at some point we decided to extract it into the separate method use_resource:

    fn use_resource(&self) {
        let resource = self.resource.lock();
        resource.do_something();
    }

    fn foo(&self) {
        // (some code specific to foo)

        // use the resource
        self.use_resource();
    }

    fn bar(&self) {
        // (some code specific to bar)

        // use the resource
        self.use_resource();
    }

But now we have a problem. There is nothing in the foo or bar code that reminds us about a lock that's taken deep inside the use_resource, so we write the Big Fat Warning™:

    fn foo(&self) {
        // WARNING: The code below takes the resource lock, 
        // be sure not to call it when lock is already taken!!!
        self.use_resource();
    }

The solution

In order to avoid disaster we need to make lock requirements explicit. This may be done by accepting resource references as one of method parameters:

    fn use_resource(&self, resource: &Resource) {
        resource.do_something();
    }

Yep, that simple and that's all. In places where we would need to call use_resource we would have no other options, but to:

Lock the resource and pass the reference
Propagate resource requirement above the stack

Rule of thumb is that you should propagate the locks above, except for the methods that act as an entry point to the struct. So, helper methods should accept references whereas API methods should acquire locks themselves.

Rust will guarantee that such references will not outlive the lock. For convenience we may provide companion method:

    fn lock_and_use_resource(&self) {
        let resource = self.resource.lock();
        self.use_resource(&resource);
    }

It's name reminds us of a lock taken and prevents misuse.

Pros

All lock requirements are explicit and expressed in method signatures
Error driven development — users could not forget to acquire the lock
Fearless refactoring
More readable code
Less footguns, etc.

Cons

May extend the period when lock is taken
User may cheat and take lock locally instead of accepting a reference
Verbose code, extra transitive parameters in methods just to propagate them below
Lifetime and ref/mut fiddling may be required
Still does not prevent deadlocks completely

P.S.

Of course, this technique may be used not only for Miner, but in other parts of the code as well. Miner was just the first thing where I found it.

The text was updated successfully, but these errors were encountered:

0x7CFE · 2017-11-17T07:07:48Z

I've tried to write as simple as possible, but actually there are way more details. Here are some of them.

Technically, instead of &Resource we may accept and propagate MutexGuard or RwLockReadGuard depending on the locking mechanism used.

That would advertise our intentions even better, we literally say "this method requires mutex guard to be taken".

Also this technique does not prevent user from passing reference to another object of the same type. It may be solved using the newtype pattern but that may be too verbose:

struct Resource;

struct A(Resource);
struct B(Resource);

struct User {
    a: Mutex<A>;
    b: Mutex<B>;
}

impl User {
    fn accepts_a(a: &A) { ... }
    fn accepts_b(b: &B) { ... }
}

debris · 2017-11-17T09:02:19Z

I love this idea. It'd be a great improvement, but what about cases where deadlocks happen because of different lock order? Do you have any idea how we could prevent this from happening?

/// Some resource to be accessed concurrently
struct Resource;

/// Struct that accesses the resource in it's methods
struct User {
    resource: Mutex<Resource>,
    resource2: Mutex<Resource>,
}

impl User {
    fn use_resources(resource: &MutexGuard<Resource>, resource2: &MutexGuard<Resource>) {
        let resource = resource.lock();
        let resource2 = resource2.lock();
    }

    fn use_resources2(resource: &MutexGuard<Resource>, resource2: &MutexGuard<Resource>) {
        // possible deadlock, cause resource and resource2 may be acquired in the same time
        let resource2 = resource2.lock();
        let resource = resource.lock();
    }
}

0x7CFE · 2017-11-17T09:23:46Z

It'd be a great improvement, but what about cases where deadlocks happen because of different lock order?

Good point, @debris! Usual approach is to introduce custom lock guard, something like ResourceLock, and provide a single lock constructor that would take locks in a known order. Methods should request a reference to such lock instead of individual resources. This pattern works well for cases, where you need to take the very same locks in many places.

As for your code (real code should declare lifetimes as required):

struct ResourceLock {
    resource1: MutexGuard<Resource>,
    resource2: MutexGuard<Resource>,
}

fn lock_resource() -> ResourceLock {
    ResourceLock {
        resource1: resource.lock(),
        resource2: resource2.lock(),
    }
}

impl User {
    fn use_resources(resources: &ResourceLock) { ... }
}

There might even be a lock hierarchy, where we may upgrade the lock structure by locking additional resources.

Suppose we have two resources Foo and Bar. FooLock may be upgraded to BarLock but not vice versa, because that would lead to a deadlock. We may only downgrade BarLock to FooLock by releasing Bar lock. Also, AsRef and Deref may be implemented for convenience.

impl FooLock {
    fn into_bar(self) -> BarLock {
        BarLock {
            foo_guard: self.guard, // inherit existing lock
            bar_guard: bar.lock(); // *
        }
    }
}

impl BarLock {
    fn into_foo(self) -> FooLock {
        self.bar_guard.unlock();

        FooLock {
            guard: self.foo_guard
        }
    }
}

*: bar should be either known to FooLock already, or be passed as parameter.

adria0 · 2020-07-27T14:42:17Z

Closing the issue due to its stale state

0x7CFE added F6-refactor 📚 Code needs refactoring. M4-core ⛓ Core client code / Rust. labels Nov 17, 2017

0x7CFE assigned debris and arkpar Nov 17, 2017

5chdn added this to the 1.10 milestone Nov 17, 2017

5chdn added the P5-sometimesoon 🌲 Issue is worth doing soon. label Nov 17, 2017

5chdn modified the milestones: 1.10, 1.11 Jan 23, 2018

5chdn modified the milestones: 1.11, 1.12 Mar 1, 2018

0x7CFE mentioned this issue Mar 10, 2018

New Transaction Queue implementation #8074

Merged

10 tasks

0x7CFE unassigned debris and arkpar Mar 12, 2018

5chdn modified the milestones: 1.12, 1.13 Apr 24, 2018

5chdn modified the milestones: 2.1, 2.2 Jul 17, 2018

5chdn added this to the 2.2 milestone Sep 27, 2018

5chdn modified the milestones: 2.2, 2.3 Oct 29, 2018

5chdn modified the milestones: 2.3, 2.4 Jan 10, 2019

5chdn modified the milestones: 2.4, 2.5 Feb 21, 2019

soc1c removed this from the 2.5 milestone Apr 2, 2019

soc1c added this to the 2.6 milestone Apr 2, 2019

ordian modified the milestones: 2.6, 2.7 Jul 12, 2019

niklasad1 mentioned this issue Oct 17, 2019

Possible deadlock by recursive calling of parking_lot::read() in one thread in ethcore client #11176

Closed

adria0 closed this as completed Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `Miner` to use lock propagating technique #7079

Refactor `Miner` to use lock propagating technique #7079

0x7CFE commented Nov 17, 2017 •

edited

Loading

0x7CFE commented Nov 17, 2017 •

edited

Loading

debris commented Nov 17, 2017

0x7CFE commented Nov 17, 2017 •

edited

Loading

adria0 commented Jul 27, 2020

Refactor Miner to use lock propagating technique #7079

Refactor Miner to use lock propagating technique #7079

Comments

0x7CFE commented Nov 17, 2017 • edited Loading

The problem

tl;dr:

Medical history

The solution

Pros

Cons

P.S.

0x7CFE commented Nov 17, 2017 • edited Loading

debris commented Nov 17, 2017

0x7CFE commented Nov 17, 2017 • edited Loading

adria0 commented Jul 27, 2020

Refactor `Miner` to use lock propagating technique #7079

Refactor `Miner` to use lock propagating technique #7079

0x7CFE commented Nov 17, 2017 •

edited

Loading

0x7CFE commented Nov 17, 2017 •

edited

Loading

0x7CFE commented Nov 17, 2017 •

edited

Loading