csync: mtime diff < 1s => conflict not detected #5589

moscicki · 2017-03-08T17:07:26Z

Seen on versions 2.2.4, 2.1.1, 1.7. This is likely a csync issue.

Reproducer, depends how quick is the sync on this line: https://github.com/cernbox/smashbox/blob/master/lib/test_basicSync.py#L179

./bin/smash -t 4 ./lib/test_basicSync.py

When two clients (loser and winner) create the conflicting files within 1s then the synchronization of the client that comes second (loser) does not detect the conflict and the file is left unsynchronized.

In this case what's get created on disk is this:

# ls -la --full-time /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/*/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.000000000 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/checker/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.000000000 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/creator/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.642944123 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.175954142 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/winner/TEST_FILE_ADDED_BOTH.dat

As you can see below, the content with the checksum cac0c3f51680222f14025641074e35fa from the loser is not synchronized nor turned into conflict file:

# md5sum /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/*/TEST_FILE_ADDED_BOTH.dat
cac0c3f51680222f14025641074e35fa  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/checker/TEST_FILE_ADDED_BOTH.dat
cac0c3f51680222f14025641074e35fa  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/creator/TEST_FILE_ADDED_BOTH.dat
4501f99f2b79d0345f26f1394aca58a3  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat
cac0c3f51680222f14025641074e35fa  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/winner/TEST_FILE_ADDED_BOTH.dat

The test ends with failed checks:

2017-03-07 15:51:46,828 - ERROR - loser - inconsistent md5 of /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat: expected cac0c3f51680222f14025641074e35fa, got 4501f99f2b79d0345f26f1394aca58a3 error_check(actual_md5 == md5, "inconsistent md5 of %s: expected %s, got %s"%(fn,md5,actual_md5)) failed in expect_content() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 74]
2017-03-07 15:51:46,834 - ERROR - loser - inconsistent md5 of /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_MODIFIED_BOTH.dat: expected cac0c3f51680222f14025641074e35fa, got 4501f99f2b79d0345f26f1394aca58a3 error_check(actual_md5 == md5, "inconsistent md5 of %s: expected %s, got %s"%(fn,md5,actual_md5)) failed in expect_content() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 74]
2017-03-07 15:51:46,836 - ERROR - loser - expected 2 got 0 conflict files error_check(len(expected_conflict_files) == len(actual_conflict_files), "expected %d got %d conflict files"%(len(expected_conflict_files),len(actual_conflict_files))) failed in expect_conflict_files() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 94]
2017-03-07 15:51:46,837 - ERROR - loser - expected conflict file for TEST_FILE_ADDED_BOTH not found error_check(any([bfn in fn for fn in actual_conflict_files]), "expected conflict file for %s not found"%bfn) failed in expect_conflict_files() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 102]
2017-03-07 15:51:46,838 - ERROR - loser - expected conflict file for TEST_FILE_MODIFIED_BOTH not found error_check(any([bfn in fn for fn in actual_conflict_files]), "expected conflict file for %s not found"%bfn) failed in expect_conflict_files() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 102]
2017-03-07 15:51:47,819 - ERROR - loser - 5 error(s) reported

At the level of the sync client log, the conflict is silently skipped:

[03/07 15:51:42.681348, 9] csync_walker:  file: /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat [inode=1179749 size=1000]
[03/07 15:51:42.681407, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/07 15:51:42.750369, 9] csync_walker:  file: ownclouds://u:[email protected]/cernbox/desktop/remote.php/webdav/home/my-minitest/TEST_FILE_ADDED_BOTH.dat [file_id=90631270840664064 size=1000]
[03/07 15:51:42.750558, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/07 15:51:42.751783, 9] _csync_merge_algorithm_visitor:  INSTRUCTION_NONE     client file: TEST_FILE_ADDED_BOTH.dat
[03/07 15:51:42.751924, 9] _csync_merge_algorithm_visitor:  INSTRUCTION_NONE     server file: TEST_FILE_ADDED_BOTH.dat
No journal entry found for  "TEST_FILE_ADDED_BOTH.dat" 
OCC::SyncJournalFileRecord::SyncJournalFileRecord(const OCC::SyncFileItem&, const QString&) "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat" Retrieved inode  1179749 (previous item inode:  0 ) 
No journal entry found for  "TEST_FILE_ADDED_BOTH.dat" 
"INSERT OR REPLACE INTO metadata (phash, pathlen, path, inode, uid, gid, mode, modtime, type, md5, fileid, remotePerm, filesize, ignoredChildrenRemote, contentChecksum, contentChecksumTypeId) VALUES (?1 , ?2, ?3 , ?4 , ?5 , ?6 , ?7,  ?8 , ?9 , ?10, ?11, ?12, ?13, ?14, ?15, ?16);" 5426047158163907264 24 "TEST_FILE_ADDED_BOTH.dat" 1179749 "1488898302" "0" "90631270840664064:50a6bf69" "90631270840664064" "RWCKNVD" 1000 0 "" "" 0

If you compare it to the correct run (when timing > 1s):

[03/08 17:50:55.204342, 9] csync_walker:  file: /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat [inode=1835071 size=1000]
[03/08 17:50:55.204397, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/08 17:50:55.263984, 9] csync_walker:  file: ownclouds://u:[email protected]/cernbox/desktop/remote.php/webdav/home/my-minitest/TEST_FILE_ADDED_BOTH.dat [file_id=90910166689513472 size=1000]
[03/08 17:50:55.264088, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/08 17:50:55.265093, 9] _csync_merge_algorithm_visitor:  INSTRUCTION_NONE     client file: TEST_FILE_ADDED_BOTH.dat
[03/08 17:50:55.265211, 8] _csync_merge_algorithm_visitor:  INSTRUCTION_CONFLICT server file: TEST_FILE_ADDED_BOTH.dat
virtual void OCC::PropagateDownloadFileQNAM::start() "TEST_FILE_ADDED_BOTH.dat" 0 
"INSERT OR REPLACE INTO downloadinfo (path, tmpfile, etag, errorcount) VALUES ( ?1 , ?2, ?3, ?4 )" "TEST_FILE_ADDED_BOTH.dat" ".TEST_FILE_ADDED_BOTH.dat.~542b85e" "90910166689513472:50a6bf69" 0 
!!! OCC::GETFileJob created for "https://cernbox.cern.ch/cernbox/desktop/" + "home/my-minitest/TEST_FILE_ADDED_BOTH.dat" "" 
void OCC::PropagateDownloadFileQNAM::slotGetFinished()  QUrl( "https://u:[email protected]/cernbox/desktop/remote.php/webdav/home/my-minitest/TEST_FILE_ADDED_BOTH.dat" )  FINISHED WITH STATUS 0 "" 0 1000 1000 0 "" "1000" 
Created conflict file "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat" -> "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH_conflict-20170308-175054.dat" 
OCC::SyncJournalFileRecord::SyncJournalFileRecord(const OCC::SyncFileItem&, const QString&) "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat" Retrieved inode  1835087 (previous item inode:  0 ) 
"INSERT OR REPLACE INTO metadata (phash, pathlen, path, inode, uid, gid, mode, modtime, type, md5, fileid, remotePerm, filesize, ignoredChildrenRemote, contentChecksum, contentChecksumTypeId) VALUES (?1 , ?2, ?3 , ?4 , ?5 , ?6 , ?7,  ?8 , ?9 , ?10, ?11, ?12, ?13, ?14, ?15, ?16);" 5426047158163907264 24 "TEST_FILE_ADDED_BOTH.dat" 1835087 "1488991854" "0" "90910166689513472:50a6bf69" "90910166689513472" "RWCKNVD" 1000 0 "" "" 0 
"DELETE FROM downloadinfo WHERE path=?1" "TEST_FILE_ADDED_BOTH.dat" 
void OCC::SyncEngine::slotItemCompleted(const OCC::SyncFileItem&, const OCC::PropagatorJob&) "TEST_FILE_ADDED_BOTH.dat" INSTRUCTION_CONFLICT 5 ""

The text was updated successfully, but these errors were encountered:

moscicki · 2017-03-08T17:16:21Z

I "fixed" the test for the moment by this workaround (you need to undo it to reproduce the problem):

https://github.com/cernbox/smashbox/blob/master/lib/test_basicSync.py#L181

ogoffart · 2017-03-09T08:32:02Z

the "mtime" resolution of the csync primitive is indeed in second.
The other way to cause a change to be detected is to change the size of the file.

ogoffart · 2017-03-09T08:40:04Z

This has been like that from the beginning, and so far, nobody asked for a better resolution.
Changing it would not be easy, as this might be in the existing sync database, so we'd need a migration plan.

moscicki · 2017-03-09T09:58:26Z

or ultimately checksums?

…

On Thu, Mar 9, 2017 at 9:32 AM, Olivier Goffart ***@***.***> wrote: the "mtime" resolution of the csync primitive is indeed in second. The other way to cause a change to be detected is to change the size of the file. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5589 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAl9jUk7cqqlHOWsYuvg9GoLMvWfurbNks5rj7kEgaJpZM4MXDFY> .

-- --- Best regards, Kuba

hodyroff · 2017-03-09T13:14:02Z

We need to think about it at a later stage. With human interaction it seems very unlikely to generate this, when the sync client is used my more machines we need to come up with a better architecture resolution eventually. Do understand that the limits are that changes occur in the same second and that the size of the file has also not changed.
@settermjd for documentation (desktop client limitations or sync algorithm architecture).

moscicki · 2017-03-09T13:50:09Z

Okey, no problem but would it be possible please to quantify "later stage"? Please keep in mind that this synchronization system is not only used for "human interaction" in the sense of opening one office file at a time in a group of 5 people. People are (trying to) synchronize lots of machine generated content with thousands of files across several 10s or even 100s of users. Is owncloud not suitable for this kind of usage?

…

On Thu, Mar 9, 2017 at 2:14 PM, hodyroff ***@***.***> wrote: We need to think about it at a later stage. With human interaction it seems very unlikely to generate this, when the sync client is used my more machines we need to come up with a better architecture resolution eventually. Do understand that the limits are that changes occur in the same second and that the size of the file has also not changed. @settermjd <https://github.com/settermjd> for documentation (desktop client limitations or sync algorithm architecture). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5589 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAl9jcaqO67NUiG74cRs8VqCVijmROfiks5rj_sdgaJpZM4MXDFY> .

-- --- Best regards, Kuba

hodyroff · 2017-03-09T14:51:28Z

If this is a real live scenario we would need to put a larger prio on it. Can this happen without a size change - which would also trigger the right behaviour? I understood right now you did hit it based on a testing scenario.

moscicki · 2017-03-09T16:01:59Z

The problem with real life scenario is that we do not have means to detect this problem other than a user actively looking into the content and noticing the difference. So there is no way for me to say how often (if at all) someone in a sufficiently large user base if affected. Would it be possible to at least add a warning or error?

hodyroff · 2017-03-10T08:07:44Z

Yes. An error sounds like a good idea for now. We will discuss it for 2.4. reminds me that I wanted to ask for a new error tab so the errors can easily seen on the client side ... and not go away after a short second or so ...

phil-davis · 2017-03-10T08:29:09Z

To log/report an error the code will have to know that the problem happened. At the moment it thinks that the files at client and server ends are the same (=same size and "close enough" mtime).
To know that they are different, something would have to checksum the file at each end and send one of the checksums to the other side for verification.
If that is done, then the client will know that there is a conflict and can handle it "properly" anyway.

So I don't see how there is a viable interim stage that just logs/reports an error.

I think the issue with the checksum thing is that for big files it takes time/resources to calculate the checksum. If the code only asks for it when this occasional mtime/size match happens, then the checksum does not have to be calculated (and stored) for every file. But of course if a check is needed on a big file then there will be delay. If checksums are stored for every file, then the check is quick, but the system always has overhead of calculating and storing the checksums.

ckamm · 2017-03-10T12:32:27Z

@moscicki see also #2169

I think it would be possible to use higher-resolution mtimes locally where they are available, but that doesn't help with your test case since local and remote mtimes are compared. On casual inspection, it looks like the server mtimes have second resolution (<d:getlastmodified>Thu, 11 Dec 2014 10:21:40 GMT</d:getlastmodified>).

To address this particular case (NEW locally, NEW remotely, same mtime/size but different content) the checksum comparison @phil-davis mentions would make sense. That would work if the server is aware of a checksum for the file. Then the local checksum could be computed in this edge case only and used as an extra criterion. The case is detected at https://github.com/owncloud/client/blob/master/csync/src/csync_reconcile.c#L275

ckamm · 2017-03-10T13:10:26Z

@ogoffart From an implementation point of view, I expect we could mark the file as CONFLICT and create PropagateDownloadFile jobs instead of skipping files in this situation. Then we'd calculate the checksum in these jobs and skip the download if it matches the remote checksum.

The main potential issue I see is that NEW/NEW with identical size/mtime happens for all files on db-loss. Previously this would be a really fast operation, but now it'd need to calculate the checksum for each file.

ogoffart · 2017-03-10T13:36:08Z

@ckamm: we need to keep the current behavior for NEW/NEW.

for SYNC/SYNC, we might change the behavior and download the server version. we already do a byte to byte comparison of the two files before marking the conflict.

ogoffart · 2017-03-10T13:36:43Z

Of course, if the server were to give us a content checksum before we download, that would solve the issue as well.

guruz · 2017-03-10T13:45:26Z

@ogoffart @ckamm FYI update on checksums by @IljaN owncloud/core#26655

ckamm · 2017-03-10T13:48:05Z

@ogoffart If the server has checksums (probably only @moscicki's servers at this point?) querying owncloud.org:checksums in the PROPFIND should return them.

Given that we want to check checksums in NEW/NEW situations.

ogoffart · 2017-03-10T14:03:54Z

checksum actually is not enough. We need a collision safe hash.

ckamm · 2017-03-10T14:19:26Z

@ogoffart Yes. We'd need to verify whether what the server returns is usable for this check. It currently looks like the oc server will maintain SHA1 checksums, that'd be suitable.

ckamm · 2017-06-14T11:36:38Z

PR #5838

* For conflicts where mtime and size are identical: a) If there's no remote checksum, skip (unchanged) b) If there's a remote checksum, create a PropagateDownload and compute the local checksum. If the checksums are identical, don't download the file and just update metadata. * Avoid exposing the existence of checksumTypeId beyond the database layer. This makes handling checksums easier in general because they can usually be treated as a single blob. This change was prompted by the difficulty of producing file_stat_t entries uniformly from PROPFINDs and the database.

* For conflicts where mtime and size are identical: a) If there's no remote checksum, skip (unchanged) b) If there's a remote checksum that's a useful hash, create a PropagateDownload job and compute the local hash. If the hashes are identical, don't download the file and just update metadata. * Avoid exposing the existence of checksumTypeId beyond the database layer. This makes handling checksums easier in general because they can usually be treated as a single blob. This change was prompted by the difficulty of producing file_stat_t entries uniformly from PROPFINDs and the database.

guruz · 2017-06-15T13:40:39Z

@moscicki Would you be able to test the PR with your server setup?

moscicki · 2017-06-15T14:01:13Z

Yes. Please send me a pointer where to find packages with this fix.

…

On Thu, Jun 15, 2017 at 3:40 PM, Markus Goetz ***@***.***> wrote: @moscicki <https://github.com/moscicki> Would you be able to test the PR with your server setup? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5589 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAl9jWxKZR8D8K2nBHSVOzzCaWhlRIDuks5sETRbgaJpZM4MXDFY> .

-- --- Best regards, Kuba

ckamm · 2017-06-16T06:53:00Z

@moscicki Note that there's only a behavior change if the server sends MD5 or SHA1 hashes in the oc:checksums field of PROPFINDs.

guruz · 2017-06-16T09:51:39Z

Should be building @moscicki
https://build.opensuse.org/package/show/isv:ownCloud:community:nightly/owncloud-client

guruz · 2017-07-27T08:36:30Z

@moscicki Any news?

moscicki · 2017-07-27T08:38:04Z

Will test next week. Sorry.

…

On Thu, Jul 27, 2017 at 10:36 AM, Markus Goetz ***@***.***> wrote: @moscicki <https://github.com/moscicki> Any news? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5589 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAl9jR3Qb2bQ5H8u2FniSyNsP-uSqmqWks5sSEwSgaJpZM4MXDFY> .

-- --- Best regards, Kuba

ckamm · 2017-10-26T12:40:59Z

@moscicki Have you had a chance to test this?

moscicki · 2017-10-27T14:05:06Z

Working on it. Please allow few more days. @giorgosalex is having a look.

guruz · 2017-12-04T18:47:44Z

You can try the 2.4.0 rc1 @giorgosalex @moscicki https://owncloud.org/changelog/desktop/

guruz · 2018-02-08T16:04:47Z

Assuming it works..

hodyroff added Feature research labels Mar 9, 2017

hodyroff added this to the 2.4.0 milestone Mar 10, 2017

ckamm added a commit to ckamm/owncloud-client that referenced this issue Mar 10, 2017

wip: Draft of where changes are needed for owncloud#5589

a7ff590

Given that we want to check checksums in NEW/NEW situations.

ckamm mentioned this issue Apr 13, 2017

Client rescans all the files after a decryp-all #5686

Closed

ogoffart assigned ckamm Apr 28, 2017

ckamm mentioned this issue Jun 14, 2017

Compare hashes for files where mtime and size match #5838

Closed

ckamm added the ReadyToTest QA, please validate the fix/enhancement label Jun 15, 2017

guruz closed this as completed Feb 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csync: mtime diff < 1s => conflict not detected #5589

csync: mtime diff < 1s => conflict not detected #5589

moscicki commented Mar 8, 2017

moscicki commented Mar 8, 2017

ogoffart commented Mar 9, 2017

ogoffart commented Mar 9, 2017

moscicki commented Mar 9, 2017 via email

hodyroff commented Mar 9, 2017

moscicki commented Mar 9, 2017 via email

hodyroff commented Mar 9, 2017

moscicki commented Mar 9, 2017

hodyroff commented Mar 10, 2017

phil-davis commented Mar 10, 2017 •

edited

Loading

ckamm commented Mar 10, 2017 •

edited

Loading

ckamm commented Mar 10, 2017 •

edited

Loading

ogoffart commented Mar 10, 2017

ogoffart commented Mar 10, 2017

guruz commented Mar 10, 2017

ckamm commented Mar 10, 2017 •

edited

Loading

ogoffart commented Mar 10, 2017

ckamm commented Mar 10, 2017 •

edited

Loading

ckamm commented Jun 14, 2017

guruz commented Jun 15, 2017

moscicki commented Jun 15, 2017 via email

ckamm commented Jun 16, 2017

guruz commented Jun 16, 2017

guruz commented Jul 27, 2017

moscicki commented Jul 27, 2017 via email

ckamm commented Oct 26, 2017

moscicki commented Oct 27, 2017

guruz commented Dec 4, 2017

guruz commented Feb 8, 2018

csync: mtime diff < 1s => conflict not detected #5589

csync: mtime diff < 1s => conflict not detected #5589

Comments

moscicki commented Mar 8, 2017

moscicki commented Mar 8, 2017

ogoffart commented Mar 9, 2017

ogoffart commented Mar 9, 2017

moscicki commented Mar 9, 2017 via email

hodyroff commented Mar 9, 2017

moscicki commented Mar 9, 2017 via email

hodyroff commented Mar 9, 2017

moscicki commented Mar 9, 2017

hodyroff commented Mar 10, 2017

phil-davis commented Mar 10, 2017 • edited Loading

ckamm commented Mar 10, 2017 • edited Loading

ckamm commented Mar 10, 2017 • edited Loading

ogoffart commented Mar 10, 2017

ogoffart commented Mar 10, 2017

guruz commented Mar 10, 2017

ckamm commented Mar 10, 2017 • edited Loading

ogoffart commented Mar 10, 2017

ckamm commented Mar 10, 2017 • edited Loading

ckamm commented Jun 14, 2017

guruz commented Jun 15, 2017

moscicki commented Jun 15, 2017 via email

ckamm commented Jun 16, 2017

guruz commented Jun 16, 2017

guruz commented Jul 27, 2017

moscicki commented Jul 27, 2017 via email

ckamm commented Oct 26, 2017

moscicki commented Oct 27, 2017

guruz commented Dec 4, 2017

guruz commented Feb 8, 2018

phil-davis commented Mar 10, 2017 •

edited

Loading

ckamm commented Mar 10, 2017 •

edited

Loading

ckamm commented Mar 10, 2017 •

edited

Loading

ckamm commented Mar 10, 2017 •

edited

Loading

ckamm commented Mar 10, 2017 •

edited

Loading