Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where do data struct definitions live? #88

Closed
sffc opened this issue May 14, 2020 · 8 comments · Fixed by #61 or #145
Closed

Where do data struct definitions live? #88

sffc opened this issue May 14, 2020 · 8 comments · Fixed by #61 or #145
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters T-docs-tests Type: Code change outside core library
Milestone

Comments

@sffc
Copy link
Member

sffc commented May 14, 2020

My current WIP for the data provider (#61) passes around abstract hunks of data (Any in Rust), which can be cast to specific structs using dynamic type checking at runtime.

let request = datap::Request {
    locale: "root".to_string(),
    category: datap::Category::Decimal,
    key: datap::Key::Decimal(Key::SymbolsV1),
    payload: None,
};
let response = my_data_provider.load(request).unwrap();
let decimal_data = response.borrow_payload::<&SymbolsV1>().unwrap();

My question is: where does the struct definition SymbolsV1 live in code? It could live:

  1. In the number crate, near where the struct is being consumed.
    • Pro: Data definitions are close to where they are used. No need to touch the common crate when adding or updating a struct.
    • Con: Data provider implementations (e.g., a JSON or Bincode data provider) need to depend on dozens of crates in order to pull in all the struct definitions they need.
  2. In a common ICU4X crate, alongside other struct definitions.
    • Pro: Everything the data provider needs is in one place. We can release canonical "macro" structs that are a collection of several smaller ones, useful for JSON serialization.
    • Con: Not as extensible to people who want to build their own structs on top of the ICU4X data pipeline machinery.

I'm doing Option 2 in #61. I noticed Elango is doing Option 1 in #86. Because of the extensibility argument, I have a slight preference for Option 1, but I fear it could get unwieldy for data provider implementations.

@sffc sffc added T-docs-tests Type: Code change outside core library C-data-infra Component: provider, datagen, fallback, adapters labels May 14, 2020
@sffc
Copy link
Member Author

sffc commented May 14, 2020

@echeran @zbraniecki

@echeran
Copy link
Contributor

echeran commented May 14, 2020

I'm actually in favor of Option 2 even if I inadvertently did Option 1. I think it helps keep the separate concerns of data and functions separate. Struct definitions by themselves do not cause any external code dependencies, even if they may depend on each other.

Also, I've seen precedent in monorepos (which are complex undertakings) to keep struct/schema/IDL definitions apart from everything else, which acts as a source of truth for understanding data found in the wild. I think that strategy would work in the small like it does in the big.

@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label May 28, 2020
@sffc sffc self-assigned this May 28, 2020
@sffc
Copy link
Member Author

sffc commented Jun 4, 2020

Consensus: start with Option 2.

Open question: how do we structure our common crate(s)?

  1. A crate called icu-common, which includes the data struct definitions as well as other traits like DataProvider and @filmil's 402-compatibility traits
  2. Separate crates for each type of common struct/trait definition

Tentative decision: make separate crates. It allows versioning of each type of common struct/trait to be independent, and prevents us from getting a giant kitchen sink.

@Manishearth Does this sound good to you?

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jun 4, 2020
@Manishearth
Copy link
Member

I'm not fond of crate explosions so I'd slightly prefer the first option.

If we are to split perhaps we can group things? Have a common one, plus specific crates that contain a lot of related datatypes. I'm not sure.

Maybe worth starting off with common and splitting if we see it getting big?

@sffc
Copy link
Member Author

sffc commented Jun 4, 2020

Yeah, the idea was to create crates around themes.

  • Data struct definitions (may end up containing dozens to hundreds of structs)
  • ECMA-402 traits
  • Other core traits like DataProvider

Does that sound like the right split?

@Manishearth
Copy link
Member

That seems fine! I don't want to have a bajillion tiny crates is all

@echeran
Copy link
Contributor

echeran commented Jun 10, 2020

Hmm, I was interpreting our decision here as putting all structs into one component (internal crate), similar to consolidating all protobuf schemas/definitions in one place, and having executable code / impls in a separate component. It seemed fair to allow that struct component to be used everywhere since it would not drag in many deps and become large because it didn't have executable code.

But I don't think it's possible to separate struct definitions from their impls -- rust-lang/rfcs#493. Am I understanding this correctly? If so, what do you think we should do?

@sffc
Copy link
Member Author

sffc commented Jun 10, 2020

Now that #61 is checked in, here's how you can add a new data struct:

  1. Decide if you want to add a new category, or piggy-back on an existing category. https://github.com/unicode-org/icu4x/blob/master/components/data-provider/src/lib.rs#L22
  2. If adding a new category, add an entry in the above enum (Category) and create a file with the same name, module-case, in the same directory, such as decimal.rs.
  3. Add a struct to your module file, similar to other structs in that file, such as SymbolsV1.

The structs do not require a heavy impl; mostly just extending the Serde traits, etc.

@sffc sffc added this to the 2020 Q2 milestone Jun 17, 2020
@sffc sffc linked a pull request Jun 23, 2020 that will close this issue
@sffc sffc closed this as completed in #145 Jun 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters T-docs-tests Type: Code change outside core library
Projects
None yet
3 participants