-
Notifications
You must be signed in to change notification settings - Fork 43
Conversation
Some early feedback from my side. I see 3 problems here: 1.) If the renter deletes a file eary the farmer will not get paid for it but still hold the shard until it expires. 2TB used space but only 200GB paid space? No chance for the farmer to see the difference between unpaid and paid used space? -> Farmer will delete all shards and start with a new farmer until that one has to mouch unpaid space as well. We need a way to clean up the space on the farmer. 2.) Farmer will lose all contracts if he has a too high timeoutRate. TimeoutRate is not working at the moment. See storj-archived/complex#82 and storj-archived/complex#81 As long as timeoutRate is not stable and tested we should not use it or we are burning reliable farmer in a few seconds. 3.) timeoutRateThreshold is only 1 hour. A hardware upgrade, ISP downtime, UPnP issue (storj-archived/core#600) different farmer maintenance jobs like the farmer GBh calculation script require a higher timeoutRateThreshold. How about 12 hours downtime per week? timeoutRateThreshold 0.07 and 168 hours for the timeout calculation. |
e222499
to
07cc7e1
Compare
lib/server/routes/reports.js
Outdated
let token = pointer.token; | ||
self._createStorageEvent(token, | ||
farmer.nodeID, | ||
mirror.contact, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mirror.contact
itself is not a nodeID. I would expect dirty StorageEvents stored in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... okay. Will take a second look at this.
lib/cron/storage-events.js
Outdated
const cursor = StorageEvent.find({ | ||
timestamp: { | ||
$lt: finalityTime, | ||
$gt: lastTimestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two or more StorageEvents can have the same timestamp. If the job gets canceld by StorageEventsCron.MAX_RUN_TIME
the lastTimestamp could be set at the first StorageEvent and restarting the job with $gt: lastTimestamp
would skip all other StorageEvents with the same timestamp.
I don't think $gte
is a better choice. In this case it would not skip StorageEvents but it would read them twice :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be safe to repeat processing storage events within a certain degree (many hours). The event should remain unmodified, and the update to the user should not apply based on the timestamp: https://github.com/Storj/service-storage-models/pull/136/files#diff-6cde9fa75ac789afe7ff15f670cfc049R462
lib/server/routes/reports.js
Outdated
event.success = true; | ||
this.updateReputation(event.farmer, points.TRANSFER_SUCCESS); | ||
} else { | ||
this.updateReputation(event.farmer, points.TRANSFER_FAILURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of mirror creation I will send a ClientReport. By claiming all mirror transfers as failed I can reduce the reputation of all my neighbors. Better reputation for myself.
Not critical. We can solve this one later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we can take a look at this later. We would likely need to do something similar as we have with users.
Let me explain that a little bit better with an example: |
@littleskunk Offline time is calculated between timeouts, not from lastSeen. If the lastSeen is not greater than the last timeout, the time is considered offline. See: https://github.com/Storj/service-storage-models/blob/master/lib/models/contact.js#L155 |
Rebased on the latest master branch |
I see a problem with the offline penalty. Lets take an example: 1.) Farmer joins the network and gets 1 contract. Now we have 1 TB offline shards but the bridge doesn't notice it and will not create new mirrors. Other timeouts are very unlikely. The bridge will not send any alloc requests or request a upload pointer because spaceAvailable is set to false. Download pointer are ordered by lastSeen. Is it possible to modify the bridge monitor? Lets ping all farmer with more than one contract? That should solve the problem. The bridge would ping the farmer again and stops pinging as soon as the offline penalty is applied. |
Sounds like we need to fix timeoutRate to reset without a timeout event? |
@braydonf I guess you can implement that faster than me. How about a check if lastTimeout is more than 72 hours old and in that case set timeoutRate to 0. Somewhere around this function should effect all important messages: For storj-archived/complex#82 I will try to add ALLOC and RENEW to that function in order to trigger a responseTime calculation. |
Rebased again from latest master |
@littleskunk We can likely add a timeout reset in |
Rebased again from latest master, and resolved conflicts. |
Fix cron scheduling pattern
Deleting empty bucket failed with "404 Bucket not found" because no shards were found for this bucket.
lib/server/routes/reports.js
Outdated
} | ||
|
||
const isClientReport = req.user ? | ||
(event.client === req.user.id) : req.farmerNodeID ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this needs to compare event.client
to req.user._id
rather than req.user.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, what was the problem that came up due to it?
Fix for deleting empty bucket
I noticed some strange storage events generated while uploading a file.
This event is never updated with EDIT: Something that can help debugging is that the client/farmer id
Both events are between the same |
@kaloyan-raev Events without a user are replication events, such as the first example, however that one looks like an attempt to mirror from and to the same farmer. That would explain why it didn't work, but not why it was created, so we would probably need to dig a bit more to find out why. |
@kaloyan-raev For the second one, it looks like the duplicate event was created just after the first by 100ms. It likely failed because the first one was started already. There may be a race condition as to why there are two to begin with though. |
Duplicate mirror traffic. That is correct. The shard was transfered twice to the same farmer. See #564 |
Today I downloaded 200 MB file. The download was successful. But for the first 3 shards I got
It does not really makes sense as I doubt any download was unsuccessful. It's a perfect networking environment - downloading from a Storj/integration docker container running on the same machine. I also noticed that for those shards who got
BTW, in a mirroring storage event is it the "farmer" downloading from the "client", or vice versa? |
The client is downloading from the farmer. The mirror was completed at I would expect a download retry with the other farmer. That should work and finaly you get the complete file. The download retry could be hidden somewhere in the logfile. Do you see any download success for the same shard hash? |
Change `req.user.id` to `req.user._id` to match middleware _id assignment
Okay, summary of what we've found so far with another run of testing:
|
Thanks @dylanlott and @kaloyan-raev for the PRs! |
There is also an issue we just found with the cron working processing storage events multiple times, and then recording those events on the user as the totalRate. So in a case where the client is not reporting with SIP9 the unknown rate is still less than totalRate. |
As follow up to previous comment, it looks like the lastTimeout isn't accurate. For three seperate runs of cron the lastTimeout was the same, even though they were 2 minutes apart each:
|
incorrect reporting rates.
There was an issue where the reputation pointst were awarded too soon without having the full perspective of both client and farmer reports. This now updates reports after there is a chance for both of those reports and that the storage event has been resolved.
There was an issue where duplicate mirror events could be triggered for each side of the exchange report, once for the farmer and another for the client. The replication/mirror is triggered by a successfull client report. There was another issue where a mirror would be attempted from/to the same farmer. There has been a check added to make sure that the source and destination are not the same.
There was previously code that would immediatly resolve the storage event to true once the report was made. However we've moved the functionality to exist in one place in the cron worker, so it's necessary for the resolving to handle this case too.
Okay, all known issues reported here should now be resolved. |
- service storage models - mongodb adapter - complex
Features
Implements: https://github.com/Storj/sips/blob/master/sip-0009.md
Specific updates
Todo:
package.json
with released service-storage-models with necessary updatesPart of: