-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCC scanner commit in batches #27527
Conversation
@PVince81, thanks for your PR! By analyzing the history of the files in this pull request, we identified @DeepDiver1975, @butonic and @oparoz to be potential reviewers. |
also @mmattel might be interested too |
Idea, can also be done in a second step, but would like to see it together: |
Maybe helps to fix: #25029 |
@@ -45,6 +45,8 @@ | |||
* @package OC\Files\Utils | |||
*/ | |||
class Scanner extends PublicEmitter { | |||
const MAX_ENTRIES_TO_COMMIT = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has it been tested with a low memory limit, let's say 255MB or less?. It looks too big to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried locally without this patch with 47k files, did not run into the memory issue unfortunately.
In fact, in all of my tests I never ran into a memory issue. I'll redo one one of these days as scanning 100k files can take many many hours...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you run the test, would you be able to check the memory consumption and share the result?
The consumption should show a pattern like a sawtooth.
Such a result would be eg informative in the documenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmattel what tools do you recommend to make sure measurement ? Or just log memory usage to the ownCloud log ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho, owncloud log and the use of php memory consumption methods would be sufficient.
We need to get a guideline.
Currently we are flying blind...
lib/private/Files/Utils/Scanner.php
Outdated
@@ -69,6 +85,7 @@ public function __construct($user, $db, ILogger $logger) { | |||
$this->logger = $logger; | |||
$this->user = $user; | |||
$this->db = $db; | |||
$this->useTransaction = !(\OC::$server->getLockingProvider() instanceof DBLockingProvider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need a comment. It looks weird that you're using transactions when you aren't using DB locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was there before I just changed the way the state is stored.
no idea why this condition exists... might need some git archeology
lib/private/Files/Utils/Scanner.php
Outdated
$this->triggerPropagator($storage, $internalPath); | ||
$this->entriesToCommit++; | ||
if ($this->useTransaction) { | ||
$propagator = $storage->getPropagator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this inside the if
. No need to get a propagator if it isn't going to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, also no need to increase entriesToCommit
if we don't use it.
Will move both deeper.
@@ -236,5 +252,20 @@ public function scan($dir = '') { | |||
private function triggerPropagator(IStorage $storage, $internalPath) { | |||
$storage->getPropagator()->propagateChange($internalPath, time()); | |||
} | |||
|
|||
private function postProcessEntry(IStorage $storage, $internalPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can make the function easier. The overall approach is to open a transaction, scan files and finally commit the transaction. The problem I see with the function is that it doesn't follow the same approach, but instead do a scan (in fact, propagate the change), commit a transaction and open a new one, which makes the process difficult to follow.
Maybe a conditional open transaction, scan and finally conditional commit is easier to follow. The only thing to be careful about should be the opening transaction and the last commit, which aren't conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate on what you mean with "conditional open transaction" ? I don't really follow.
Note that one thing we do not want to do is commit at the end of a folder because said folder could contain more than 10k entries or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function would something along these lines (I don't expect the db->inTransaction
works, but I think there is something similar)
if ($this->entriesToCommit === 0 && !$this->db->inTransaction) {
$this->db->beginTransaction();
}
// do whatever you need
if ($this->entriesToCommit >= self::MAX_ENTRIES_TO_COMMIT && $this->db->inTransaction) {
$this->db->commit();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with your proposed change is that we also need to include the propagator's transaction, which somehow doesn't fit with the $this->db->inTransaction()
approach. While both are supposed to be in sync, I find that having this specific condition using inTransaction
more confusion.
Additionally we also need to check $this->useTransaction
in the if
statements to find out whether we should at all use transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about removing the the first and last "transactions" (the first beginTransaction and the last commit) and let it being handled by the postProcessEntry function completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is readablity. The "natural" way of doing things is open -> do something -> close. The current way is do something -> close -> open new.
The function supposes that there is a transaction in progress (if any, obviously), and that there is "something" that will close the transaction afterwards.
From my point of view, either the function doesn't know a transaction is in progress (as it was before) or it knows and can handle transactions properly. If it knows, it shouldn't depend on others to open and / or close the transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it work if I simply add a comment "note: transaction is initiazed and closed elsewhere" to de-confuse devs ?
I don't feel it's worth spending time rewriting this bit if you balance the time needed (and retesting needed) with the small benefit of a bit less confusion. I'd rather add a comment then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that rewriting this bit could take some time to re-test, but I think it's something we should do now. I don't think that adding a comment is enough in this case.
Calling @butonic and / or @DeepDiver1975 to break the tie.
@jvillafanez I've adjusted the other bits |
@butonic has tested this and confirmed that memory usage is reduced |
Great, do you have numbers that you can share, as we discussed above? |
Tested with very long file and folder names:
Memory profile without this PR looks like this: |
@butonic how is the profile looking with this PR? |
Anyway, 👍 |
HURRA, congrats, pls merge ! @PVince81 I propose a note in the feature list of v10! @settermjd I propose a documentation note which is helpful for physical servers sizing. |
Nah... need to first rearrange the code to make @jvillafanez and future devs who read this code happy |
@mmattel 25Mb is only the amount of ram in use as seen by php-memprof. Top shows a usage of 73952kb so ~74Mb. |
Any update on this? |
I don't have time to finish this, sorry |
Cant we merge with the suggested comment as note above and do the rest later? |
ok, here you go then |
Thanks, 😄 |
Maybe a note at ownCloud 10.0 Features under General to be added ? |
@mmattel sounds good, can you add it ? |
ok will do, was not sure if I am allowed to. |
Done |
stable9.1. #27881 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Description
Autocommit after 10k entries when running occ scanner to avoid excessive memory usage.
Related Issue
Fixes #27516
Motivation and Context
How Has This Been Tested?
Manually
Screenshots (if appropriate):
Types of changes
Checklist:
@jvillafanez @butonic
also see #27516 (comment) for potential concerns.