Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary implementation in readme #11

Closed
alizain opened this issue Dec 8, 2016 · 11 comments
Closed

Binary implementation in readme #11

alizain opened this issue Dec 8, 2016 · 11 comments

Comments

@alizain
Copy link
Collaborator

alizain commented Dec 8, 2016

Hey everyone,

I've added a column in the readme for binary implementations in your libraries. If you have implemented it, please let me know here or submit a PR so I can add the ✓!

@SuperPaintman @savonarola @merongivian @imdario @Lewiscowles1986 @ararslan @RobThree @fvilers @mdipierro @rafaelsales

@RobThree
Copy link

RobThree commented Dec 8, 2016

What do you define as a "binary implementation"? I assume you mean not only the "string representation" but also (access to) the "(raw) 128 bits"?

In that case: you can add a ✓ for me 😉 (amongst others I have implemented a ToByteArray(), and you can construct a ULID from a byte array).

@ararslan
Copy link

ararslan commented Dec 8, 2016

What do you define as a "binary implementation"?

Presumably an implementation of what's described in https://github.com/alizain/ulid#binary-layout-and-byte-order

@Lewiscowles1986
Copy link

I actually will try over xmas to set this up, it's very clear but I'm in a bit of a over-subscribed period atm. I also need to ensure that my library is compatible with yours as I'm a tad unsure about the sources of randomness...

@alizain
Copy link
Collaborator Author

alizain commented Dec 8, 2016

I actually will try over xmas to set this up, it's very clear but I'm in a bit of a over-subscribed period atm

Don't worry, I only asked because I wanted to make sure I mention it on the readme. I've been pretty busy as well, and have been unable to give much time to open source work.

I'm a tad unsure about the sources of randomness

What are you using today? What do you want to use in the future?

@Lewiscowles1986
Copy link

Pretty sure if the string is the same, then the binary representation is the same in most major languages... Also why not just make it string compatible? Then nearly everything gets a tick

@alizain
Copy link
Collaborator Author

alizain commented Dec 9, 2016

Hmm, that's not how I understand it. Python, for example, stores strings by default as Unicode utf-8, so it'll be using at least 1 byte per character, for the 26 characters, adds up to 208 bits, more than the 128 specified. Actually, this is the math for any language that stores its strings in ASCII too, since we're not using any special characters.

Also why not just make it string compatible?

I don't understand what you mean

@Lewiscowles1986
Copy link

Also why not just make it string compatible?

I don't understand what you mean

My bad, I read https://github.com/alizain/ulid/blob/master/README.md#specification and assumed by char you were referring to a data-type which is (on all machines I've ever worked on) at least 8 bits in width... I see what you've done there, probably what lead to the confusion you were talking of actual char-strings (in C terms). It makes things a little confusing though...

Could you explain to me how JS is outputting the chars in < 8-bit format (giving 208-bits for JS too)? It's definitely using string concatenation which should lead to 208-bit output even from your reference implementation.

Right now the .NET port, PHP port and Java port seem to be literally outputting the representation mapped to Base32 as strings, using chars native to their language. I'm pretty sure any conversion to alternative encoding should be completed outside of the library, but I'm more worried the math has gone awry somewhere.

At the level we are talking about; inter-operability would need to be done on more than just ULiD strings. If it's stored in a DB as a string the specific encoding conversion would be handled by the client, so again I ask can we not just use strings instead of "binary compatibility"?

@alizain
Copy link
Collaborator Author

alizain commented Jan 30, 2017

Sorry for the overdue reply.

@RobThree ✓ has been added!

@Lewiscowles1986

Could you explain to me how JS is outputting the chars in < 8-bit format (giving 208-bits for JS too)?

It's using 208-bits for the string representation, because the JS implementation isn't working directly with bytes.

# String representation
128 bits of information -> encoded as 26 characters with base32 -> stored as utf8/ascii text on disk, which takes up 208 bits

# Binary representation
128 bits of information -> stored as 128 bits on disk

@Lewiscowles1986
Copy link

@alizain so what is the solution for systems not supporting byte-level manipulation? Also what is the solution to the fact that many systems will use different character encoding for their strings? This seems like a nice idea, but something that really doesn't belong in most HLL's IMO

@alizain
Copy link
Collaborator Author

alizain commented Jan 30, 2017

What systems are you working on don't support byte-level manipulation?

Character encoding for strings is beyond the scope of this project. As you agreeably opined, we are not concerned with how strings are stored. The previous example was for illustration of 99.9% of existing systems.

@alizain
Copy link
Collaborator Author

alizain commented Jan 30, 2017

Also, I don't know of any character encoding scheme that is not ASCII compatible and in-use today. Do you know any?

@alizain alizain closed this as completed Jun 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants