Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sandbox ZStandard #95

Merged
merged 1 commit into from
Feb 2, 2022
Merged

Sandbox ZStandard #95

merged 1 commit into from
Feb 2, 2022

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Jan 20, 2022

This change is Reviewable

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 14 files at r1.
Reviewable status: 8 of 14 files reviewed, 15 unresolved discussions (waiting on @cblichmann and @oshogbo)


contrib/README.md, line 11 at r1 (raw file):

------------------------------------------------ | ---------------------------------------------------------------- | -----------
YARA - The pattern matching swiss knife          | [github.com/VirusTotal/yara](https://github.com/VirusTotal/yara) | Bazel
Zstandard - Fast real-time compression algorithm | [github.com/facebook/zstd](https://github.com/facebook/zstd)     | cmake

We should add a separate section for this. The current "list" was meant for sandboxes that are upstreamed.


contrib/zstd/CMakeLists.txt, line 1 at r1 (raw file):

# SPDX-License-Identifier: Apache-2.0

We use a full license header everywhere:

# Copyright 2022 Google LLC
#                Mariusz Zaborski <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

I'm not sure about the copyright line, though. Shouldn't we add you as a contributor to an AUTHORS file instead?


contrib/zstd/CMakeLists.txt, line 5 at r1 (raw file):

#                Mariusz Zaborski <[email protected]>

cmake_minimum_required(VERSION 3.18)

We need this to work back to at least Debian Buster, which ships with 3.13. The rest of SAPI uses 3.12, so I suggest changing that here, too.


contrib/zstd/CMakeLists.txt, line 20 at r1 (raw file):

  libzstd

  GIT_REPOSITORY https://github.com/facebook/zstd.git

Please pin this to a specific commit. main/master will make our builds unreliable


contrib/zstd/CMakeLists.txt, line 38 at r1 (raw file):

  FUNCTIONS
    ${FUNCTIONS_LIST}

I think we can include the list of sandboxed functions directly.


contrib/zstd/CMakeLists.txt, line 52 at r1 (raw file):

)

if (SAPI_ZSTD_ENABLE_EXAMPLES)

We should check SAPI_ENABLE_TESTS and SAPI_ENABLE_EXAMPLES instead.


contrib/zstd/sandboxed.h, line 27 at r1 (raw file):

};

#endif // SAPI_LIBZSTD_SANDBOXED_

Use two spaces between code and comment


contrib/zstd/example/main.cc, line 22 at r1 (raw file):

}

int main(int argc, char** argv) {

Prefer char* argv[]

Code quote:

 char** argv

contrib/zstd/example/main.cc, line 23 at r1 (raw file):

int main(int argc, char** argv) {
  bool memMode = false;

Google C++ style: Use snake_case for local variables.


contrib/zstd/example/main.cc, line 30 at r1 (raw file):

  google::InitGoogleLogging(argv[0]);

  while ((opt = getopt(argc, argv, "mdl:")) != -1) {

As we're already using Abseil, you can also use its Flags library. That does not support short options, though.


contrib/zstd/example/main.cc, line 91 at r1 (raw file):

  }

  return 0;

Suggest using EXIT_SUCCESS and EXIT_FAILURE constants.


contrib/zstd/test/zstd_test.cc, line 283 at r1 (raw file):

};

};  // namespace

No need for a semicolon after closing a namespace.


contrib/zstd/utils/utils_zstd.cc, line 23 at r1 (raw file):

  std::streamsize ssize = getStreamSize(inFile);
  sapi::v::Array<uint8_t> inBuf(ssize);
  inFile.read((char*)inBuf.GetData(), ssize);

Please do not use C-style casts, but the appropriate C++ one => reinterpret_cast<>

Code quote:

(char*)

contrib/zstd/utils/utils_zstd.cc, line 86 at r1 (raw file):

  int iserr;

  /* Create necessary buffers. */

Prefer single line comments starting with //

Code quote:

 /* Create necessary buffers. */

contrib/zstd/utils/utils_zstd.cc, line 102 at r1 (raw file):

  SAPI_ASSIGN_OR_RETURN(iserr, api.ZSTD_CCtx_setParameter(
                                   &rdctx, ZSTD_c_compressionLevel, level));

This indentation looks odd. Can you run clang-format over this file again?

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 14 files reviewed, 15 unresolved discussions (waiting on @cblichmann and @oshogbo)


contrib/README.md, line 11 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

We should add a separate section for this. The current "list" was meant for sandboxes that are upstreamed.

Fixed.


contrib/zstd/CMakeLists.txt, line 1 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

We use a full license header everywhere:

# Copyright 2022 Google LLC
#                Mariusz Zaborski <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

I'm not sure about the copyright line, though. Shouldn't we add you as a contributor to an AUTHORS file instead?

Fixed.
I don't see AUTHORS file - should we create one?


contrib/zstd/CMakeLists.txt, line 5 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

We need this to work back to at least Debian Buster, which ships with 3.13. The rest of SAPI uses 3.12, so I suggest changing that here, too.

Hym there is some issue as some of the commands used with FetchContent_Decalre isn't supported in 3.13 (for example FetchContent_MakeAvailable).


contrib/zstd/CMakeLists.txt, line 20 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Please pin this to a specific commit. main/master will make our builds unreliable

Done.
But I'm not sure if this is a good idea.
Shouldn't we move together with the project and thanks to tests detect issues?


contrib/zstd/CMakeLists.txt, line 38 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

I think we can include the list of sandboxed functions directly.

Fixed.


contrib/zstd/CMakeLists.txt, line 52 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

We should check SAPI_ENABLE_TESTS and SAPI_ENABLE_EXAMPLES instead.

Fixed.


contrib/zstd/sandboxed.h, line 27 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Use two spaces between code and comment

Fixed.


contrib/zstd/example/main.cc, line 22 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Prefer char* argv[]

Fixed.


contrib/zstd/example/main.cc, line 23 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Google C++ style: Use snake_case for local variables.

Fixed. I look into whole project I hope I changed in all places.


contrib/zstd/example/main.cc, line 91 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Suggest using EXIT_SUCCESS and EXIT_FAILURE constants.

Done.


contrib/zstd/test/zstd_test.cc, line 283 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

No need for a semicolon after closing a namespace.

Done.


contrib/zstd/utils/utils_zstd.cc, line 23 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Please do not use C-style casts, but the appropriate C++ one => reinterpret_cast<>

Done.


contrib/zstd/utils/utils_zstd.cc, line 86 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Prefer single line comments starting with //

Done.


contrib/zstd/utils/utils_zstd.cc, line 102 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

This indentation looks odd. Can you run clang-format over this file again?

Yes this is generated by clang-format

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 14 files reviewed, 15 unresolved discussions (waiting on @cblichmann)


contrib/zstd/example/main.cc, line 30 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

As we're already using Abseil, you can also use its Flags library. That does not support short options, though.

Done.

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, 8 of 10 files at r3, all commit messages.
Reviewable status: 12 of 14 files reviewed, 9 unresolved discussions (waiting on @cblichmann and @oshogbo)


contrib/zstd/CMakeLists.txt, line 1 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Fixed.
I don't see AUTHORS file - should we create one?

If you want to be credited then your name should go in a CONTRIBUTORS file and we should create a standard AUTHORS that lists Google as an org. Let me know and I'll set this up.


contrib/zstd/CMakeLists.txt, line 5 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Hym there is some issue as some of the commands used with FetchContent_Decalre isn't supported in 3.13 (for example FetchContent_MakeAvailable).

That should now be resolved as I have landed 5c9245d that brings us to 3.13.


contrib/zstd/CMakeLists.txt, line 20 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.
But I'm not sure if this is a good idea.
Shouldn't we move together with the project and thanks to tests detect issues?

While I agree that this helps with the detection of breakages when upstream changes, I'm rather strongly in favor of pinning versions. It's IMO the only way to stay sane on a larger scale and with many dependencies.

Also, we should solve this on a broader level: Once all contrib sandboxes are in, the CI should test them all. Then we can add another CI configuration that builds all the sandboxes with the latest version of their respective upstream projects.


contrib/zstd/example/main.cc, line 30 at r1 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Done.

I see. This is using SAPI's gflags dependency, though (which I'm trying to ultimately get rid of but the removal is blocked on glog). We should use the more modern Abseil Flags library here though:

#include "absl/flags/flag.h"

ABSL_FLAG(bool, decompress, false,"decompress");
ABSL_FLAG(bool, memory_mode, false, "in memory operations");
ABSL_FLAG(uint32_t, level, 0, "compression level");

Flag access works a bit differently then: absl::GetFlag(FLAGS_xxxx).

See https://abseil.io/docs/cpp/guides/flags for all the details.


contrib/zstd/test/CMakeLists.txt, line 1 at r3 (raw file):

# Copyright 2022 Google LLC

Tests can actually live in the same directory as the code-under-test.

The test data should then be in a directory names testdata.


contrib/zstd/test/zstd_test.cc, line 29 at r3 (raw file):

namespace {

std::string getTestFilePath(const std::string& filename) {

GetTestFilePath :)

Also, in the rest of this file.


contrib/zstd/test/zstd_test.cc, line 105 at r3 (raw file):

  std::string infile_s = getTestFilePath("text");

  absl::StatusOr<std::string> status_or_path =

I'd simply name this path


contrib/zstd/test/zstd_test.cc, line 109 at r3 (raw file):

  ASSERT_THAT(status_or_path, IsOk()) << "Could not create temp output file";
  std::string outfile_s = sapi::file::JoinPath(
      sapi::file_util::fileops::GetCWD(), status_or_path.value());

Alternative spelling: *status_or_path

Code quote:

 status_or_path.value()

contrib/zstd/test/zstd_test.cc, line 121 at r3 (raw file):

  ASSERT_LT(outfile.tellp(), infile.tellg());
};

These semicolons are not necessary

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 14 files reviewed, 9 unresolved discussions (waiting on @cblichmann)


contrib/zstd/CMakeLists.txt, line 1 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

If you want to be credited then your name should go in a CONTRIBUTORS file and we should create a standard AUTHORS that lists Google as an org. Let me know and I'll set this up.

Maybe we can get back to this when more contributions will be done.


contrib/zstd/CMakeLists.txt, line 5 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

That should now be resolved as I have landed 5c9245d that brings us to 3.13.

Oh perfect. So I guess the documentation should be fixed:
https://cmake.org/cmake/help/latest/module/FetchContent.html
I have changed it to 3.13.


contrib/zstd/CMakeLists.txt, line 20 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

While I agree that this helps with the detection of breakages when upstream changes, I'm rather strongly in favor of pinning versions. It's IMO the only way to stay sane on a larger scale and with many dependencies.

Also, we should solve this on a broader level: Once all contrib sandboxes are in, the CI should test them all. Then we can add another CI configuration that builds all the sandboxes with the latest version of their respective upstream projects.

Thats a good idea.


contrib/zstd/example/main.cc, line 30 at r1 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

I see. This is using SAPI's gflags dependency, though (which I'm trying to ultimately get rid of but the removal is blocked on glog). We should use the more modern Abseil Flags library here though:

#include "absl/flags/flag.h"

ABSL_FLAG(bool, decompress, false,"decompress");
ABSL_FLAG(bool, memory_mode, false, "in memory operations");
ABSL_FLAG(uint32_t, level, 0, "compression level");

Flag access works a bit differently then: absl::GetFlag(FLAGS_xxxx).

See https://abseil.io/docs/cpp/guides/flags for all the details.

Oh, thank you!
Fixed.


contrib/zstd/test/CMakeLists.txt, line 1 at r3 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Tests can actually live in the same directory as the code-under-test.

The test data should then be in a directory names testdata.

Can or should?
I prefer a separated directories.
But I can changed that if it's required.


contrib/zstd/test/zstd_test.cc, line 29 at r3 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

GetTestFilePath :)

Also, in the rest of this file.

Fixed.


contrib/zstd/test/zstd_test.cc, line 105 at r3 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

I'd simply name this path

I have changed this to path.

Code quote:

status_or_path

contrib/zstd/test/zstd_test.cc, line 109 at r3 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Alternative spelling: *status_or_path

I converted all it to *variable.


contrib/zstd/test/zstd_test.cc, line 121 at r3 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

These semicolons are not necessary

Fixed.

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r4.
Reviewable status: 12 of 14 files reviewed, 3 unresolved discussions (waiting on @cblichmann and @oshogbo)


contrib/zstd/test/zstd_test.cc, line 105 at r3 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

I have changed this to path.

Odd, I can't see this yet.


contrib/zstd/test/zstd_test.cc, line 109 at r3 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

I converted all it to *variable.

See above, I can't see this change yet.

@oshogbo
Copy link
Contributor Author

oshogbo commented Jan 26, 2022

Can you check now?

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 10 files at r3, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @oshogbo)


contrib/zstd/test/CMakeLists.txt, line 1 at r3 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Can or should?
I prefer a separated directories.
But I can changed that if it's required.

Well, "required"... It's what we usually do, but I don't feel too strongly about it.

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the last round for now

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @oshogbo)


contrib/zstd/sandboxed.h, line 15 at r6 (raw file):

// limitations under the License.

#ifndef SAPI_LIBZSTD_SANDBOXED_

Header guards should contain the full path:
CONTRIB_ZSTD_SANDBOXED_H_

Code quote:

SAPI_LIBZSTD_SANDBOXED_

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @cblichmann)


contrib/zstd/sandboxed.h, line 15 at r6 (raw file):

CONTRIB_ZSTD_SANDBOXED_H_
Fixed :)

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r7, 4 of 4 files at r8, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @oshogbo)

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 14 files reviewed, 2 unresolved discussions (waiting on @cblichmann and @oshogbo)


contrib/zstd/utils/utils_zstd.cc, line 74 at r9 (raw file):

    return absl::UnavailableError("Unable to decompress file");
  }
  sapi::v::Array<uint8_t> outbuf(size);

size is untrusted and we should check and limit it to some safe upper bound.

Code quote:

size

contrib/zstd/utils/utils_zstd.cc, line 93 at r9 (raw file):

absl::Status CompressStream(ZstdApi& api, std::ifstream& instream,
                            std::ofstream& outstream, int level) {

Wiktor suggested that instead of CompressStream/DecompressStream, we should provide functions that work on file descriptors instead.

His main point was that we need to transfer chunks of data to/from the sandboxee in a loop, which will be quite slow.
Passing FDs will mean the sandboxee can access the data freely and wihtout having to cross the sandbox boundary.

Code quote:

absl::Status CompressStream(ZstdApi& api, std::ifstream& instream,
                            std::ofstream& outstream, int level) {

@oshogbo oshogbo force-pushed the zstd branch 2 times, most recently from cbcec79 to fc95c1b Compare February 1, 2022 11:43
Copy link
Collaborator

@okunz okunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 8 files at r9, all commit messages.
Reviewable status: 11 of 14 files reviewed, 3 unresolved discussions (waiting on @cblichmann, @okunz, and @oshogbo)


contrib/zstd/test/zstd_test.cc, line 25 at r10 (raw file):

#include "sandboxed_api/util/temp_file.h"

using ::sapi::IsOk;

IIRC we put the using-declaration inside the namespace.

Code quote:

using ::sapi::IsOk;

@oshogbo oshogbo force-pushed the zstd branch 2 times, most recently from 692d381 to 12b71e7 Compare February 2, 2022 09:57
Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r9, 1 of 1 files at r10, 3 of 3 files at r11, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @oshogbo)


contrib/zstd/utils/utils_zstd.cc, line 74 at r9 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

size is untrusted and we should check and limit it to some safe upper bound.

Everything else LGTM (can add the FD functions later), but we should add a size check.

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @okunz and @oshogbo)


contrib/zstd/test/zstd_test.cc, line 25 at r10 (raw file):

Previously, okunz (Oliver Kunz) wrote…

IIRC we put the using-declaration inside the namespace.

Fixed.

Copy link
Contributor Author

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @cblichmann)


contrib/zstd/utils/utils_zstd.cc, line 74 at r9 (raw file):

Previously, cblichmann (Christian Blichmann) wrote…

Everything else LGTM (can add the FD functions later), but we should add a size check.

The size is checked.

Copy link
Member

@cblichmann cblichmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r12, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @cblichmann)

@copybara-service copybara-service bot merged commit 5708985 into google:main Feb 2, 2022
happyCoder92 pushed a commit that referenced this pull request May 4, 2022
PiperOrigin-RevId: 425893255
Change-Id: I6760fe1ab7734f1a27dd65e4c761c57961306a85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants