Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce pagination in files-filter report #26507

Merged
merged 3 commits into from
Mar 22, 2017

Conversation

DeepDiver1975
Copy link
Member

@DeepDiver1975 DeepDiver1975 commented Oct 28, 2016

Description

Allow pagination when listing files

ToDos

  • integration tests
  • add cache for file list
  • integrate on client side

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DeepDiver1975 DeepDiver1975 added this to the 9.2 milestone Oct 28, 2016
@mention-bot
Copy link

@DeepDiver1975, thanks for your PR! By analyzing the history of the files in this pull request, we identified @PVince81 and @nickvergessen to be potential reviewers.

@DeepDiver1975 DeepDiver1975 force-pushed the filelist-pagination-webdav-report branch from d67ffa0 to 980d966 Compare November 4, 2016 16:02
@PVince81
Copy link
Contributor

PVince81 commented Nov 9, 2016

for #13915

@PVince81
Copy link
Contributor

PVince81 commented Dec 6, 2016

add cache for file list

memcache ? 😦

$requestedProps[] = $propVal['name'];
}
$requestedProps = $report->properties;
$filterRules = $report->filters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really ? did the format change ?

use Sabre\Xml\Reader;
use Sabre\Xml\XmlDeserializable;

class FilterRequest implements XmlDeserializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oohhhh, clever

@PVince81
Copy link
Contributor

PVince81 commented Dec 6, 2016

Code looks good so far

throw new PreconditionFailed('Cannot filter by non-existing tag', 0, $e);
if (!is_null($report->limit)) {
$length = $report->limit['size'];
$offset = $report->limit['page'] * $length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first page should be 1, not 0.

How about using limit and offset instead ? Might be easier to grasp...

@PVince81
Copy link
Contributor

Test report report.xml:

<?xml version="1.0" encoding="utf-8" ?>
<oc:filter-files xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns" >
        <a:prop>
                <a:getetag/>
        </a:prop>
        <oc:limit page="0" size="10" />
</oc:filter-files>
% curl -u admin:admin -X REPORT -H "Content-Type: text/xml" --data-binary "@report.xml" 'http://localhost/owncloud/remote.php/dav/files/admin/'

@PVince81
Copy link
Contributor

Some concerns: the regular filter report returns a flat list of files from all folders. This flat list can be paginated in this report, this is fine.

However we will also need a paginated call for folder-specific contents, without recursing. Should we make a separate report time for this maybe ? Or add a property <oc:recursive /> to make it a flat list, and by default only return for the current folder ?

@PVince81 PVince81 force-pushed the filelist-pagination-webdav-report branch from a108477 to f4aa5d2 Compare December 19, 2016 17:57
@PVince81
Copy link
Contributor

@DeepDiver1975 I've rebased this.

Also I discovered today that Sabre has a shorter way of compiling results from multiple nodes using getPropertiesForMultiplePaths. Since we're touching this report anyway, I added it here f4aa5d2

There will still be a performance issue here because we need to resolve every node individually.
Ideally for pagination we might already know when we're fetching from a single folder, so this fetching should be optimizable. Might be doable using the IMultiget interface which getPropertiesForMultiplePaths uses internally: https://github.com/fruux/sabre-dav/blob/master/lib/DAV/IMultiGet.php#L34. When implementing this, we could sort the results by folder and optimize fetching props of specific grouped folders.

Side note: if we ever implement Webdav sync it also uses IMultiGet 😉

@PVince81
Copy link
Contributor

@PVince81
Copy link
Contributor

Arghh, no, not good. getPropertiesForMultiplePaths would redo a getChild() on the parent nodes because we haven't created any Sabre node yet for all the results.

I suspect that the old implementation also suffered from this. Since we have the files API node already, we should already be able to wrap it (it's a FileInfo after all) into a Sabre node and treat it directly as a result. No need for DB refetching...

I'll revert my commit because it will degrade the performance. From what I see we already had created the matching Sabre File and Directory in findNodesByFileIds.

More about this in a minute after the revert...

@PVince81
Copy link
Contributor

PVince81 commented Dec 19, 2016

So, I've reverted the commit instead of deleting it, in case we still want it. (Squashing will kill it later).

Now looking at findNodesByFileIds(): this has BAD performance because it will resolve every fileid using Folder::getById(). Now for arbitrary results like "/test/x1", "/test/sub/x2", "/abc/def" it makes sense as we're not in the same folder. This was valid for the "filter by tags" and "filter by favorite" feature in which results are usually not inside the same folder.

However if we want REPORT to be used for pagination on simple folders, then we need a different approach there. Usually calling getChildren() on the parent is more efficient as it does a single DB query for getDirectoryContents().

So to solve the above question we really need to discuss "recursive / flattened / arbitrary results" vs "only show results from the current folder".

One idea would be to provide one generic REPORT type that only does a PROPFIND on the files endpoints (or even any part of the tree!) and paginates whatever the PROPFIND / getChildren() returns. All results are children of the current node.

Then we provide another REPORT, which is the one from this PR, to really do search-like queries. In such queries the results are not limited to the current folder, so we have to use getById() for every result.

Or a hybrid variant is to have a single REPORT type but with a oc:recursive property. When unset, we'd resort to a simple getChildren() + post-filter on the results. When set, we'd do the other approach that gets the file ids of the results first, then resolve then. Note that both implementations are completely different.

@DeepDiver1975 thoughts ?

@PVince81
Copy link
Contributor

I wrote down some ideas about implementing a generic pagination report that would work for every PROPFIND-enabled endpoint: #26840

@PVince81
Copy link
Contributor

Regarding recursive vs children, I think we could even call them differently:

  • recursive: is more a SEARCH operation because you're searching for all nodes that match a specific criteria in all subcollections. So the approach is more like asking an index (if we had one) to give results in many folders and then compile these results in one result set.
  • children: is more of a filter where you basically just get the contents of the current collection (children) and then apply a filter on their properties. All data is already there within the current collection, so no need for an index, just need to remove the nodes from the result that do not match the filter criteria.

@PVince81
Copy link
Contributor

So for "recursive" vs "flat" the answer from @DeepDiver1975 was using the "Depth" header.
Which means that to search on a single collection one specifies "Depth: 1" and in all subcollections "Depth: infinite".

@PVince81 PVince81 force-pushed the filelist-pagination-webdav-report branch from 1d87fb6 to c355695 Compare February 28, 2017 18:52
@PVince81
Copy link
Contributor

Rebased.

Also I found a way to simplify the response code to use a more Sabr-y way to do it: c355695

All discovered while working on the customgroups REPORT: owncloud/customgroups#27.

Note: in the customgroups pagination I used a different format for the search pattern + pagination:

<?xml version="1.0" encoding="utf-8" ?>
<oc:search-query xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns">
        <a:prop>
                <oc:role></oc:role>
                <oc:display-name></oc:display-name>
        </a:prop>
        <oc:search>
                <oc:pattern>us</oc:pattern>
                <oc:limit>5</oc:limit>
                <oc:offset>0</oc:offset>
        </oc:search>
</oc:search-query>

@PVince81
Copy link
Contributor

How about we finish this without the cache ? It will be crude but at least it will make it possible for clients to at least request the first page with limit.

@PVince81 PVince81 force-pushed the filelist-pagination-webdav-report branch from c355695 to a7ed90f Compare March 17, 2017 18:42
@PVince81
Copy link
Contributor

Rebased and pushed some changes:

  • pagination (offset+limit) now works also for tags/favorite
  • disabled case where no filter is specified because getChildren() was not enough and would only search one level deep. Actually we need to use Node::search() but when I tried I noticed that it was missing attributes in its FileInfo properties (like owner) which makes it currently unusable. Something to dig into in the future (<= @DeepDiver1975)
  • added unit tests

Next up: integration tests for pagination when filtering.

@PVince81 PVince81 changed the title [WIP] Introduce pagination in files-filter report Introduce pagination in files-filter report Mar 20, 2017
@PVince81
Copy link
Contributor

21:23:41     /var/lib/jenkins/workspace/owncloud-core_core_PR-26507-XBSZ3PUQKGNP3NP62UQHR4G7JKGONNKXFRY3NXEAITQYFZVK5JDQ/tests/integration/features/favorites.feature:81
21:23:41     /var/lib/jenkins/workspace/owncloud-core_core_PR-26507-XBSZ3PUQKGNP3NP62UQHR4G7JKGONNKXFRY3NXEAITQYFZVK5JDQ/tests/integration/features/favorites.feature:93

@PVince81 PVince81 assigned PVince81 and unassigned DeepDiver1975 Mar 20, 2017
@PVince81
Copy link
Contributor

Fixed integration tests and also added a new integration test for pagination.

@DeepDiver1975 @jvillafanez please review

$requestedProps = $report->properties;
$filterRules = $report->filters;

if (empty($filterRules['systemtag']) && is_null($filterRules['favorite'])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should refine the condition... empty is always scary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure... "systemtag" here is an array so empty fits but "favorite" is only a single string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if "systemtag" will be always an array no matter what, then it's fine. Something similar for "favorite" (wondering if '' or '0' would behave as we expect).
Anyway a comment would be useful to clarify any doubt, mainly to say that this is on purpose.

}
// TODO: if depth one, could use a different search algo based on getChildren() ...
// TODO: Depth header filtering?
$results = $folder->search($report->search['pattern']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove dead code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boohoo, ok I'll move it to another WIP PR

}
}

$obj = new self();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use self instead of explicitly calling the current class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written by @DeepDiver1975 and likely copied from other Sabre deserializer classes where self was used. More convenient than rewriting the current class name I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was copied from Sabre, i guess it's fine. Don't like it, but ok.

' . $filterRules . '
</oc:filter-rules>
</oc:filter-files>';
<oc:filter-files xmlns:a="DAV:" xmlns:oc="http://owncloud.org/ns" >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to find a better way to generate this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll let @owncloud/qa go wild to improve this separately.

return new File($this->fileView, $filesNode);
} else if ($filesNode instanceof \OCP\Files\Folder) {
return new Directory($this->fileView, $filesNode);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log something here unless this null is properly detected and reported. I wouldn't like to miss this case if something goes horribly wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll throw a "WTF this can't be" style exception

Copy link
Member

@jvillafanez jvillafanez Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance that this exception is catched in the middle and doesn't show up? just making sure this will be visible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wrapped in a SabreException later upstream:

<?xml version="1.0" encoding="utf-8"?>
<d:error xmlns:d="DAV:" xmlns:s="http://sabredav.org/ns">
  <s:exception>Exception</s:exception>
  <s:message>Unrecognized Files API node returned, aborting</s:message>
</d:error>

@jvillafanez
Copy link
Member

I haven't checked deeply the FilesReportPlugin.php file, in case someone wants to recheck that file

@PVince81
Copy link
Contributor

I haven't checked deeply the FilesReportPlugin.php file, in case someone wants to recheck that file

There aren't many changes there so I think it's ok.

Basically the response stays the same except that the array we pass there is sliced.
Most of the added logic is to slice at the right moment.

@jvillafanez adjusted, see last two commits

@PVince81
Copy link
Contributor

Final review ? @jvillafanez @DeepDiver1975
Thanks.

@jvillafanez
Copy link
Member

#26507 (comment) and #26507 (comment) . Other than that 👍

- disable REPORT without filter

Because we need to use Node::search($pattern) to find all matching nodes
in all the subfolder recursively, but the result nodes contain
incomplete information like owner.

- remove unneeded trailing slash when buildling response href

This got obsoleted when switching to generateMultiStatus()

- add integration pagination test for favorites REPORT

- throw exception for unknown node types in FilesReportPlugin
@PVince81 PVince81 force-pushed the filelist-pagination-webdav-report branch from 9a23cd4 to fad6a05 Compare March 22, 2017 17:28
@PVince81
Copy link
Contributor

Added comment, squashed my commits into a smaller set.

Since there are no actual code changes -> instamerge

@PVince81 PVince81 merged commit b31294a into master Mar 22, 2017
@PVince81 PVince81 deleted the filelist-pagination-webdav-report branch March 22, 2017 17:29
@PVince81
Copy link
Contributor

Raised doc ticket with some instructions: https://github.com/owncloud/documentation/issues/2950

@lock
Copy link

lock bot commented Aug 3, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants