Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csync: mtime diff < 1s => conflict not detected #5589

Closed
moscicki opened this issue Mar 8, 2017 · 29 comments
Closed

csync: mtime diff < 1s => conflict not detected #5589

moscicki opened this issue Mar 8, 2017 · 29 comments
Assignees
Labels
ReadyToTest QA, please validate the fix/enhancement research
Milestone

Comments

@moscicki
Copy link
Contributor

moscicki commented Mar 8, 2017

Seen on versions 2.2.4, 2.1.1, 1.7. This is likely a csync issue.

Reproducer, depends how quick is the sync on this line: https://github.com/cernbox/smashbox/blob/master/lib/test_basicSync.py#L179

./bin/smash -t 4 ./lib/test_basicSync.py

When two clients (loser and winner) create the conflicting files within 1s then the synchronization of the client that comes second (loser) does not detect the conflict and the file is left unsynchronized.

In this case what's get created on disk is this:

# ls -la --full-time /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/*/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.000000000 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/checker/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.000000000 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/creator/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.642944123 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat
-rw-r--r--. 1 root root 1000 2017-03-07 15:51:42.175954142 +0100 /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/winner/TEST_FILE_ADDED_BOTH.dat

As you can see below, the content with the checksum cac0c3f51680222f14025641074e35fa from the loser is not synchronized nor turned into conflict file:

# md5sum /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/*/TEST_FILE_ADDED_BOTH.dat
cac0c3f51680222f14025641074e35fa  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/checker/TEST_FILE_ADDED_BOTH.dat
cac0c3f51680222f14025641074e35fa  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/creator/TEST_FILE_ADDED_BOTH.dat
4501f99f2b79d0345f26f1394aca58a3  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat
cac0c3f51680222f14025641074e35fa  /data/smashdir.minitest-NO_CONFLICT_PROBLEM/test_basicSync/winner/TEST_FILE_ADDED_BOTH.dat

The test ends with failed checks:

2017-03-07 15:51:46,828 - ERROR - loser - inconsistent md5 of /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat: expected cac0c3f51680222f14025641074e35fa, got 4501f99f2b79d0345f26f1394aca58a3 error_check(actual_md5 == md5, "inconsistent md5 of %s: expected %s, got %s"%(fn,md5,actual_md5)) failed in expect_content() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 74]
2017-03-07 15:51:46,834 - ERROR - loser - inconsistent md5 of /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_MODIFIED_BOTH.dat: expected cac0c3f51680222f14025641074e35fa, got 4501f99f2b79d0345f26f1394aca58a3 error_check(actual_md5 == md5, "inconsistent md5 of %s: expected %s, got %s"%(fn,md5,actual_md5)) failed in expect_content() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 74]
2017-03-07 15:51:46,836 - ERROR - loser - expected 2 got 0 conflict files error_check(len(expected_conflict_files) == len(actual_conflict_files), "expected %d got %d conflict files"%(len(expected_conflict_files),len(actual_conflict_files))) failed in expect_conflict_files() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 94]
2017-03-07 15:51:46,837 - ERROR - loser - expected conflict file for TEST_FILE_ADDED_BOTH not found error_check(any([bfn in fn for fn in actual_conflict_files]), "expected conflict file for %s not found"%bfn) failed in expect_conflict_files() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 102]
2017-03-07 15:51:46,838 - ERROR - loser - expected conflict file for TEST_FILE_MODIFIED_BOTH not found error_check(any([bfn in fn for fn in actual_conflict_files]), "expected conflict file for %s not found"%bfn) failed in expect_conflict_files() ["/root/functional_test/smashbox/lib/test_basicSync.py" at line 102]
2017-03-07 15:51:47,819 - ERROR - loser - 5 error(s) reported

At the level of the sync client log, the conflict is silently skipped:

[03/07 15:51:42.681348, 9] csync_walker:  file: /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat [inode=1179749 size=1000]
[03/07 15:51:42.681407, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/07 15:51:42.750369, 9] csync_walker:  file: ownclouds://u:[email protected]/cernbox/desktop/remote.php/webdav/home/my-minitest/TEST_FILE_ADDED_BOTH.dat [file_id=90631270840664064 size=1000]
[03/07 15:51:42.750558, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/07 15:51:42.751783, 9] _csync_merge_algorithm_visitor:  INSTRUCTION_NONE     client file: TEST_FILE_ADDED_BOTH.dat
[03/07 15:51:42.751924, 9] _csync_merge_algorithm_visitor:  INSTRUCTION_NONE     server file: TEST_FILE_ADDED_BOTH.dat
No journal entry found for  "TEST_FILE_ADDED_BOTH.dat" 
OCC::SyncJournalFileRecord::SyncJournalFileRecord(const OCC::SyncFileItem&, const QString&) "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat" Retrieved inode  1179749 (previous item inode:  0 ) 
No journal entry found for  "TEST_FILE_ADDED_BOTH.dat" 
"INSERT OR REPLACE INTO metadata (phash, pathlen, path, inode, uid, gid, mode, modtime, type, md5, fileid, remotePerm, filesize, ignoredChildrenRemote, contentChecksum, contentChecksumTypeId) VALUES (?1 , ?2, ?3 , ?4 , ?5 , ?6 , ?7,  ?8 , ?9 , ?10, ?11, ?12, ?13, ?14, ?15, ?16);" 5426047158163907264 24 "TEST_FILE_ADDED_BOTH.dat" 1179749 "1488898302" "0" "90631270840664064:50a6bf69" "90631270840664064" "RWCKNVD" 1000 0 "" "" 0 

If you compare it to the correct run (when timing > 1s):

[03/08 17:50:55.204342, 9] csync_walker:  file: /data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat [inode=1835071 size=1000]
[03/08 17:50:55.204397, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/08 17:50:55.263984, 9] csync_walker:  file: ownclouds://u:[email protected]/cernbox/desktop/remote.php/webdav/home/my-minitest/TEST_FILE_ADDED_BOTH.dat [file_id=90910166689513472 size=1000]
[03/08 17:50:55.264088, 8] _csync_detect_update:  file: TEST_FILE_ADDED_BOTH.dat, instruction: INSTRUCTION_NEW <<=
[03/08 17:50:55.265093, 9] _csync_merge_algorithm_visitor:  INSTRUCTION_NONE     client file: TEST_FILE_ADDED_BOTH.dat
[03/08 17:50:55.265211, 8] _csync_merge_algorithm_visitor:  INSTRUCTION_CONFLICT server file: TEST_FILE_ADDED_BOTH.dat
virtual void OCC::PropagateDownloadFileQNAM::start() "TEST_FILE_ADDED_BOTH.dat" 0 
"INSERT OR REPLACE INTO downloadinfo (path, tmpfile, etag, errorcount) VALUES ( ?1 , ?2, ?3, ?4 )" "TEST_FILE_ADDED_BOTH.dat" ".TEST_FILE_ADDED_BOTH.dat.~542b85e" "90910166689513472:50a6bf69" 0 
!!! OCC::GETFileJob created for "https://cernbox.cern.ch/cernbox/desktop/" + "home/my-minitest/TEST_FILE_ADDED_BOTH.dat" "" 
void OCC::PropagateDownloadFileQNAM::slotGetFinished()  QUrl( "https://u:[email protected]/cernbox/desktop/remote.php/webdav/home/my-minitest/TEST_FILE_ADDED_BOTH.dat" )  FINISHED WITH STATUS 0 "" 0 1000 1000 0 "" "1000" 
Created conflict file "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat" -> "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH_conflict-20170308-175054.dat" 
OCC::SyncJournalFileRecord::SyncJournalFileRecord(const OCC::SyncFileItem&, const QString&) "/data/smashdir.minitest/test_basicSync/loser/TEST_FILE_ADDED_BOTH.dat" Retrieved inode  1835087 (previous item inode:  0 ) 
"INSERT OR REPLACE INTO metadata (phash, pathlen, path, inode, uid, gid, mode, modtime, type, md5, fileid, remotePerm, filesize, ignoredChildrenRemote, contentChecksum, contentChecksumTypeId) VALUES (?1 , ?2, ?3 , ?4 , ?5 , ?6 , ?7,  ?8 , ?9 , ?10, ?11, ?12, ?13, ?14, ?15, ?16);" 5426047158163907264 24 "TEST_FILE_ADDED_BOTH.dat" 1835087 "1488991854" "0" "90910166689513472:50a6bf69" "90910166689513472" "RWCKNVD" 1000 0 "" "" 0 
"DELETE FROM downloadinfo WHERE path=?1" "TEST_FILE_ADDED_BOTH.dat" 
void OCC::SyncEngine::slotItemCompleted(const OCC::SyncFileItem&, const OCC::PropagatorJob&) "TEST_FILE_ADDED_BOTH.dat" INSTRUCTION_CONFLICT 5 "" 

@moscicki
Copy link
Contributor Author

moscicki commented Mar 8, 2017

I "fixed" the test for the moment by this workaround (you need to undo it to reproduce the problem):

https://github.com/cernbox/smashbox/blob/master/lib/test_basicSync.py#L181

@ogoffart
Copy link
Contributor

ogoffart commented Mar 9, 2017

the "mtime" resolution of the csync primitive is indeed in second.
The other way to cause a change to be detected is to change the size of the file.

@ogoffart
Copy link
Contributor

ogoffart commented Mar 9, 2017

This has been like that from the beginning, and so far, nobody asked for a better resolution.
Changing it would not be easy, as this might be in the existing sync database, so we'd need a migration plan.

@moscicki
Copy link
Contributor Author

moscicki commented Mar 9, 2017 via email

@hodyroff
Copy link

hodyroff commented Mar 9, 2017

We need to think about it at a later stage. With human interaction it seems very unlikely to generate this, when the sync client is used my more machines we need to come up with a better architecture resolution eventually. Do understand that the limits are that changes occur in the same second and that the size of the file has also not changed.
@settermjd for documentation (desktop client limitations or sync algorithm architecture).

@moscicki
Copy link
Contributor Author

moscicki commented Mar 9, 2017 via email

@hodyroff
Copy link

hodyroff commented Mar 9, 2017

If this is a real live scenario we would need to put a larger prio on it. Can this happen without a size change - which would also trigger the right behaviour? I understood right now you did hit it based on a testing scenario.

@moscicki
Copy link
Contributor Author

moscicki commented Mar 9, 2017

The problem with real life scenario is that we do not have means to detect this problem other than a user actively looking into the content and noticing the difference. So there is no way for me to say how often (if at all) someone in a sufficiently large user base if affected. Would it be possible to at least add a warning or error?

@hodyroff
Copy link

Yes. An error sounds like a good idea for now. We will discuss it for 2.4. reminds me that I wanted to ask for a new error tab so the errors can easily seen on the client side ... and not go away after a short second or so ...

@hodyroff hodyroff added this to the 2.4.0 milestone Mar 10, 2017
@phil-davis
Copy link
Contributor

phil-davis commented Mar 10, 2017

To log/report an error the code will have to know that the problem happened. At the moment it thinks that the files at client and server ends are the same (=same size and "close enough" mtime).
To know that they are different, something would have to checksum the file at each end and send one of the checksums to the other side for verification.
If that is done, then the client will know that there is a conflict and can handle it "properly" anyway.

So I don't see how there is a viable interim stage that just logs/reports an error.

I think the issue with the checksum thing is that for big files it takes time/resources to calculate the checksum. If the code only asks for it when this occasional mtime/size match happens, then the checksum does not have to be calculated (and stored) for every file. But of course if a check is needed on a big file then there will be delay. If checksums are stored for every file, then the check is quick, but the system always has overhead of calculating and storing the checksums.

@ckamm
Copy link
Contributor

ckamm commented Mar 10, 2017

@moscicki see also #2169

I think it would be possible to use higher-resolution mtimes locally where they are available, but that doesn't help with your test case since local and remote mtimes are compared. On casual inspection, it looks like the server mtimes have second resolution (<d:getlastmodified>Thu, 11 Dec 2014 10:21:40 GMT</d:getlastmodified>).

To address this particular case (NEW locally, NEW remotely, same mtime/size but different content) the checksum comparison @phil-davis mentions would make sense. That would work if the server is aware of a checksum for the file. Then the local checksum could be computed in this edge case only and used as an extra criterion. The case is detected at https://github.com/owncloud/client/blob/master/csync/src/csync_reconcile.c#L275

@ckamm
Copy link
Contributor

ckamm commented Mar 10, 2017

@ogoffart From an implementation point of view, I expect we could mark the file as CONFLICT and create PropagateDownloadFile jobs instead of skipping files in this situation. Then we'd calculate the checksum in these jobs and skip the download if it matches the remote checksum.

The main potential issue I see is that NEW/NEW with identical size/mtime happens for all files on db-loss. Previously this would be a really fast operation, but now it'd need to calculate the checksum for each file.

@ogoffart
Copy link
Contributor

@ckamm: we need to keep the current behavior for NEW/NEW.

for SYNC/SYNC, we might change the behavior and download the server version. we already do a byte to byte comparison of the two files before marking the conflict.

@ogoffart
Copy link
Contributor

Of course, if the server were to give us a content checksum before we download, that would solve the issue as well.

@guruz
Copy link
Contributor

guruz commented Mar 10, 2017

@ogoffart @ckamm FYI update on checksums by @IljaN owncloud/core#26655

@ckamm
Copy link
Contributor

ckamm commented Mar 10, 2017

@ogoffart If the server has checksums (probably only @moscicki's servers at this point?) querying owncloud.org:checksums in the PROPFIND should return them.

ckamm added a commit to ckamm/owncloud-client that referenced this issue Mar 10, 2017
Given that we want to check checksums in NEW/NEW situations.
@ogoffart
Copy link
Contributor

checksum actually is not enough. We need a collision safe hash.

@ckamm
Copy link
Contributor

ckamm commented Mar 10, 2017

@ogoffart Yes. We'd need to verify whether what the server returns is usable for this check. It currently looks like the oc server will maintain SHA1 checksums, that'd be suitable.

@ckamm
Copy link
Contributor

ckamm commented Jun 14, 2017

PR #5838

ckamm added a commit that referenced this issue Jun 15, 2017
* For conflicts where mtime and size are identical:

  a) If there's no remote checksum, skip (unchanged)
  b) If there's a remote checksum, create a PropagateDownload
     and compute the local checksum. If the checksums are identical,
     don't download the file and just update metadata.

* Avoid exposing the existence of checksumTypeId beyond the database
  layer. This makes handling checksums easier in general because they
  can usually be treated as a single blob.

  This change was prompted by the difficulty of producing file_stat_t
  entries uniformly from PROPFINDs and the database.
ckamm added a commit that referenced this issue Jun 15, 2017
* For conflicts where mtime and size are identical:

  a) If there's no remote checksum, skip (unchanged)
  b) If there's a remote checksum that's a useful hash, create a
     PropagateDownload job and compute the local hash. If the hashes
     are identical, don't download the file and just update metadata.

* Avoid exposing the existence of checksumTypeId beyond the database
  layer. This makes handling checksums easier in general because they
  can usually be treated as a single blob.

  This change was prompted by the difficulty of producing file_stat_t
  entries uniformly from PROPFINDs and the database.
@ckamm ckamm added the ReadyToTest QA, please validate the fix/enhancement label Jun 15, 2017
@guruz
Copy link
Contributor

guruz commented Jun 15, 2017

@moscicki Would you be able to test the PR with your server setup?

@moscicki
Copy link
Contributor Author

moscicki commented Jun 15, 2017 via email

@ckamm
Copy link
Contributor

ckamm commented Jun 16, 2017

@moscicki Note that there's only a behavior change if the server sends MD5 or SHA1 hashes in the oc:checksums field of PROPFINDs.

@guruz
Copy link
Contributor

guruz commented Jun 16, 2017

@guruz
Copy link
Contributor

guruz commented Jul 27, 2017

@moscicki Any news?

@moscicki
Copy link
Contributor Author

moscicki commented Jul 27, 2017 via email

@ckamm
Copy link
Contributor

ckamm commented Oct 26, 2017

@moscicki Have you had a chance to test this?

@moscicki
Copy link
Contributor Author

Working on it. Please allow few more days. @giorgosalex is having a look.

@guruz
Copy link
Contributor

guruz commented Dec 4, 2017

You can try the 2.4.0 rc1 @giorgosalex @moscicki https://owncloud.org/changelog/desktop/

@guruz
Copy link
Contributor

guruz commented Feb 8, 2018

Assuming it works..

@guruz guruz closed this as completed Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ReadyToTest QA, please validate the fix/enhancement research
Projects
None yet
Development

No branches or pull requests

6 participants