Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icu 63.1 min static data #666

Merged
merged 8 commits into from
Nov 9, 2018
Merged

Icu 63.1 min static data #666

merged 8 commits into from
Nov 9, 2018

Conversation

kevinkreiser
Copy link
Contributor

@kevinkreiser kevinkreiser commented Nov 6, 2018

This is a build of ICU with -Os and with the data packaging set to put the data in a static lib. This also removes a significant amount of the data. It narrows down the datasets to just: locales, unit, currency and a couple basic things in misc. Additionaly for those datasets the only locales supported are:

da de en eo es fi fr he id it ko my nl pl pt pt_PT ro ru sv tr uk vi zh zh_Hans

With all of these changes the total data size of the data library is down from 26MB to 3.1MB. Which is quite convenient for cross platform libraries that have to ship the same code to a variety of platforms.

There was a slight issue with the standard download/unpacking workflow. ICU does not ship the "code" used to build the data in the same targz as the code. Because of this the package must download the code and the data-code and splice them together (to get the static library with data built-in). What I've done is updated the downloading function with the ability to add a suffix to the file you downloaded. This allows the unpacking functions to then use the same suffix and thefore avoids destroying the cache of the other files that were downloaded for the package in question. Because of this change I am wondering if I should update the mason version from 0.18.0 to 0.19.0 or if even just new packages in mason also obviate the version update? @springmeyer or maybe @brunoabinader can you advise?

@kevinkreiser
Copy link
Contributor Author

nice pull request number! 👿

a0eb02a525822ea5ab409e04cf280bacccbb5d4d \
data

mason_extract_zip -d ${MASON_NAME}/source
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prior to the SLUG_SUFFIX this function would have zapped the cache for the previous download.

fi
}

function trim_data {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is what "patches" the ICU data files so that we only build some of the datasets and only for some of the locales. i could have used a patch file but ICU docs say to copy files and modify those so I regex'd my way there instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

./configure ${MASON_HOST_ARG} --prefix=${MASON_PREFIX} \
$(icu_debug_configure) \
$(cross_build_configure) \
--with-data-packaging=static \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the flag that makes it so that the data is included in the static library itself as opposed to a separate file


MASON_NAME=icu
MASON_VERSION=63.1-min-static-data
MASON_LIB_FILE=lib/libicuuc.a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@springmeyer can one put more than one library on this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to check that the install worked. It only supports a single file and can be anything that exists after the build completes. So, the way you have it here is 👍. No need (or ability) to point to multiple files and not pointing to the other libraries is okay - they will still get installed.

Copy link
Member

@brunoabinader brunoabinader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good, but I'll defer the approval to @springmeyer.


# Using uint_least16_t instead of char16_t because Android Clang doesn't recognize char16_t
# I'm being shady and telling users of the library to use char16_t, so there's an implicit raw cast
ICU_CORE_CPP_FLAGS="-DU_CHARSET_IS_UTF8=1 -DU_CHAR_TYPE=uint_least16_t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinkreiser this U_CHAR_TYPE customization (and the below -DUCHAR_TYPE=char16_t in the mason_cflags) was done by @ChrisLoer for MBGL and may not benefit your usecase, so I would consider removing it unless you specific think it is helpful for your usecase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, mbgl-core is no longer using mason for ICU, so totally fine for the mason build to stop using the U_CHAR_TYPE customization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, we looked at vendoring as well (which i think we want to do long term) but decided against biting the bullet of figuring out how to vendor the data into the lib (so we can ship without having to bundle the data separately)

Copy link
Contributor

@springmeyer springmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinkreiser this is fantastic work. Awesome job getting the data library re-packing working. However I'm requesting changes here since I think the MASON_SLUG_SUFFIX should:

  • Land in a different PR if we want to support this
  • For this PR I think it would be simpler to just download the http://download.icu-project.org/files/icu4c/63.1/icu4c-63_1-data.zip manually with curl (rather than use mason_download and running into the cache issue)

So, how about that? You revert the MASON_SLUG_SUFFIX work and just download the data.zip directly? That would avoid us needing to worry about potential breakages with other packages and would isolate this particular nuance (of needing to download a second package) to just the ICU package.

@kevinkreiser
Copy link
Contributor Author

kevinkreiser commented Nov 6, 2018

@springmeyer your comments should now be addressed. can i get another look?

Copy link
Contributor

@springmeyer springmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I think this is ready for ./mason trigger which will start giving you feedback about which platforms are working and which are not.

@kevinkreiser kevinkreiser merged commit e8fc6bf into master Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants