TuckBytesInCode

Experimenting with ways to stuff binary data into source code - Ended up creating a general purpose text-encoded base conversion program

math

There seems be some unerlying math for any base converison formula.

The next table shows that bytes in and characters out can be used to calculate the base using the formula: ceiling(2 ^ (8 * bytes in / chars out)) Ceiling is used since we need a whole number of characters in a base

base name	bytes-in	chars-out	calculated base
base 64	3	4	64
base 85	4	5	85
base 91	13	16	91
base 14938	26	15	14938

This next table shows adjusting the chars-out to fit a multiple of 8. i'm choosing 8 so that things fit nicely within bytes

base	co-LCM	mult	bi * mult	2 ^ bi * mult	log2(base)
64	8	2	6	64	6
85	40	8	32	4294967296	6.40939093
91	16	8	13	8192	6.50779464
14938	120	1	208	4.113761E+062	13.8666993

co-LCM: least common multiple between chars-out and 8
mult: co-LCM / co - what we would need to multiply bytes-in to preserve the ratio
bi * mult: bytes-in * multiplier
2 ^ bi * mult: number of characters needed to represent the adjusted bytes-in
log2(base): number of bits needed to fit the number base

What's notable about this table is that base 64 fits exacly - so we could read 6 bits and directly map to the output character map - no base conversion needed.

ratios

What if we wanted to determine the bytes-in to chars-out ratio for any base ? We can use the already esablished formula base = 2 ^ (8 * bytes in / chars out) replacing the bytes in / chars out with just r we get base = 2 ^ (8 * r) solving for r we get r = log(base) / (8 * log(2))

Here's the ratios for the bases being used so far compared to the bytes-in / chars-out ratio

base	log(base)/8*log2	ratio	decimal	base no ceiling
64	0.75	3 / 4	0.75	64
85	0.801173867017213	4 / 5	0.8	84.4485062894653
91	0.813474330024837	13 / 16	0.8125	90.5096679918781
14938	1.733337422860250	26 / 15	1.73333333333333	14937.6612525378

You can see that the fractions that were picked produce decimals that are just slightly below the target (r) ratio. Also notice that when we don't ceiling the ratio to base calculation, that the base number is close to the whole number base.

The difficulty with finding a bytes-in / chars-out fraction starting from just the ratio is that it's somewhat difficult to find a fraction that fits - we want a ratio that as close as possible without going over but that still uses small-ish numbers.

There are several algorithms for finding fractions that fit our decimal. I've included some links with more information. One method I've been using with excel is a brute force search which is described below.

links related to converting decimal to fraction

Algorithm for simplifying decimal to fractions
Converting decimal to fraction c++
Decimal to fraction conversion

brute force fraction using a spread sheet

These are the formulas use in the spread sheet. formulas #3-6 are meant to be copied to each row in the spread sheet. Most spread sheets support coping entries by highlighting the cells on row 4 and 'draggin down' using the small box in the bottom-right corner.

#	name	formula
#1	ratio from base	=LOG(A2)/(8*LOG(2))
#2	min matcher	=MATCH(MIN(D4:D33),D4:D33,0)+3
#3	calc numerator	=FLOOR(B4*$B$2)
#4	ratio	=A4/B4
#5	difference	=ROUND(1000*ABS(C4-$B$2),2)
#6	closest base	=CEILING(POWER(2,8*A4/B4))

here's an example of the spread sheet using base 91 (in A2). The best row turns out to be 19 with the fraction 13 / 16 which matches the optimum values from before.

	A	B	C	D	E
1	base	base ratio #1		minimum #2
2	91	0.81347433002		19
3	numerator #3	denominator	ratio #4	difference #5	closest base #6
4	0	1	0	813.47	1
5	1	2	0.5	313.47	16
6	2	3	0.66666666666	146.81	41
7	3	4	0.75	63.47	64
8	4	5	0.8	13.47	85
9	4	6	0.66666666666	146.81	41
10	5	7	0.71428571428	99.19	53
11	6	8	0.75	63.47	64
12	7	9	0.77777777777	35.7	75
13	8	10	0.8	13.47	85
14	8	11	0.72727272727	86.2	57
15	9	12	0.75	63.47	64
16	10	13	0.76923076923	44.24	72
17	11	14	0.78571428571	27.76	79
18	12	15	0.8	13.47	85
19	13	16	0.8125	0.97	91
20	13	17	0.76470588235	48.77	70
21	14	18	0.77777777777	35.7	75
22	15	19	0.78947368421	24	80
23	16	20	0.8	13.47	85

TuckBytes.ChangeBase

In my first attempt at base conversion I started to write encode and decode functions, but I found that most of the code was the similar. On second try, I was able to combine both into a single function. TuckBytes.ChangeBase is a binary-to-text style converter and is not an intended for general-purpose numeric base conversion. Currently it really only works well if one of the sides is base256 (binary). It's possible that there might be a way to go directly say from base64 to base85, but I have not determined if that's possible without first going through base256.

TODO

base91 doesn't seem to encode the same way as base64 or base85. Need to investigate further.
add support for variants (see RFC-4648)
take a look at https://en.wikipedia.org/wiki/Binary-to-text_encoding
take a look at https://www.edn.com/design/systems-design/4460458/2/Convert-binary-number-to-any-base
maybe implement normal number base conversions
- for integers only ? might be difficult to parse fractional non-base10 inputs
maybe add a way to input a custom base
- provide an alphabet (ascii or utf8)
- determine in/out based on calculation
  - denominator (loop from 1 to 40 (or more))
  - numerator = FLOOR(den * base)
  - calculate closest base =CEILING(POWER(2,8 * num / den))
  - search until closest base matches base - num/dev is the in/out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadMe.md

ReadMe.md

TuckBytesInCode

math

ratios

links related to converting decimal to fraction

brute force fraction using a spread sheet

TuckBytes.ChangeBase

TODO

Files

ReadMe.md

Latest commit

History

ReadMe.md

File metadata and controls

TuckBytesInCode

math

ratios

links related to converting decimal to fraction

brute force fraction using a spread sheet

TuckBytes.ChangeBase

TODO