diff --git a/infra.bs b/infra.bs index b30d752..d218392 100644 --- a/infra.bs +++ b/infra.bs @@ -3,7 +3,7 @@ Group: WHATWG H1: Infra Shortname: infra Text Macro: TWITTER infrastandard -Text Macro: LATESTRD 2024-01 +Text Macro: LATESTRD 2025-01 Abstract: The Infra Standard aims to define the fundamental concepts upon which standards are built. Translation: ja https://triple-underscore.github.io/infra-ja.html Required IDs: tracking-vector @@ -701,6 +701,7 @@ Standard that should be reported and addressed.
Let |x| be "Aperture Science
".
Assert: |x| is "Aperture Science
".
+
Let |jsValue| be ! [$OrdinaryObjectCreate$](null). -
For each |mapKey| → |mapValue| of |value|: +
For each |mapKey| → |mapValue| of |value|: -
Assert: |mapKey| is a string. +
Assert: |mapKey| is a string. -
Let |mapValueJSValue| be the result of - converting an Infra value to a JSON-compatible JavaScript value, given |mapValue|. +
Let |mapValueJSValue| be the result of + converting an Infra value to a JSON-compatible JavaScript value, given |mapValue|. -
Perform ! [$CreateDataPropertyOrThrow$](|jsValue|, |mapKey|, |mapValueJSValue|). -
Perform ! [$CreateDataPropertyOrThrow$](|jsValue|, |mapKey|, |mapValueJSValue|). +
Return |jsValue|. - +
Return |jsValue|.
Because it is rarely appropriate to manipulate JavaScript values directly in diff --git a/review-drafts/2025-01.bs b/review-drafts/2025-01.bs new file mode 100644 index 0000000..7be4872 --- /dev/null +++ b/review-drafts/2025-01.bs @@ -0,0 +1,2284 @@ +
+Group: WHATWG +Status: RD +Date: 2025-01-20 +H1: Infra +Shortname: infra +Text Macro: TWITTER infrastandard +Text Macro: LATESTRD 2025-01 +Abstract: The Infra Standard aims to define the fundamental concepts upon which standards are built. +Translation: ja https://triple-underscore.github.io/infra-ja.html +Required IDs: tracking-vector ++ +
+urlPrefix: https://tc39.github.io/ecma262/#; spec: ECMA-262; + type: dfn + text: %JSON.parse%; url: sec-json.parse + text: %JSON.stringify%; url: sec-json.stringify + text: List; url: sec-list-and-record-specification-type + text: The String Type; url: sec-ecmascript-language-types-string-type + text: realm; url: realm + url: sec-ecmascript-language-types-bigint-type + text: is a BigInt + text: is not a BigInt + url: sec-ecmascript-language-types-boolean-type + text: is a Boolean + text: is not a Boolean + url: sec-ecmascript-language-types-number-type + text: is a Number + text: is not a Number + url: sec-ecmascript-language-types-string-type + text: is a String + text: is not a String + url: sec-ecmascript-language-types-symbol-type + text: is a Symbol + text: is not a Symbol + url: sec-object-type + text: is an Object + text: is not an Object + type: method; for: Array; text: sort(); url: sec-array.prototype.sort + type: abstract-op; + text: ArrayCreate; url: sec-arraycreate + text: Call; url: sec-call + text: CreateDataPropertyOrThrow; url: sec-createdatapropertyorthrow + text: Get; url: sec-get-o-p + text: IsArray; url: sec-isarray + text: OrdinaryObjectCreate; url: sec-ordinaryobjectcreate + text: ToLength; url: sec-tolength + text: ToString; url: sec-tostring ++ + + + +
Deduplicate boilerplate in standards. + +
Align standards on conventions, terminology, and data structures. + +
Be a place for concepts used by multiple standards without a good home. + +
Help write clear and readable algorithmic prose by clarifying otherwise ambiguous concepts. +
Suggestions for more goals welcome.
+ + +To make use of this standard in a document titled X, use: + +
X depends on Infra. [[!Infra]] + + +
Additionally, cross-referencing all terminology is strongly encouraged to avoid ambiguity. + + +
All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly +marked non-normative. Everything else is normative. + +
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", +"RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in +RFC 2119. [[!RFC2119]] + +
These keywords have equivalent meaning when written in lowercase and cannot appear in +non-normative content. + +
This is a willful violation of RFC 8174, motivated by legibility and a desire +to preserve long-standing practice in many non-IETF-published pre-RFC 8174 documents. [[RFC8174]] + +
All of the above is applicable to both this standard and any document that uses this standard. +Documents using this standard are encouraged to limit themselves to "must", "must not", "should", +and "may", and to use these in their lowercase form as that is generally considered to be more +readable. + +
For non-normative content "strongly encouraged", "strongly discouraged", "encouraged", +"discouraged", "can", "cannot", "could", "could not", "might", and "might not" can be used instead. + + +
In general, specifications interact with and rely on a wide variety of other specifications. In +certain circumstances, unfortunately, conflicting needs require a specification to violate the +requirements of other specifications. When this occurs, a document using the Infra Standard should +denote such transgressions as a willful violation, and note the reason for that +violation. + +
The previous section, [[#conformance]], documents a +willful violation of RFC 8174 committed by Infra. + + +
The word "or", in cases where both inclusive "or" and exclusive "or" are possible (e.g., "if +either width or height is zero"), means an inclusive "or" (implying "or both"), unless it is called +out as being exclusive (with "but not both"). + +
A user agent is any software entity that acts on behalf +of a user, for example by retrieving and rendering web content and facilitating end user interaction +with it. In specifications using the Infra Standard, the user agent is generally an instance of the client software +that implements the specification. The client software itself is known as an implementation. +A person can use many different [=user agents=] in their day-to-day life, including by configuring an +[=implementation=] to act as several [=user agents=] at once, for example by using multiple profiles +or the implementation's private browsing mode. + +
If something is said to be implementation-defined, the particulars of what is +said to be implementation-defined are up to the implementation. In the absence of such +language, the reverse holds: implementations have to follow the rules laid out in documents using +this standard. + +
Insert U+000A (LF) code points into +input in an implementation-defined manner such that each resulting line has +no more than width code points. For the purposes of this requirement, lines are +delimited by the start of input, the end of input, and +U+000A (LF). + + +
Some features that are defined in documents using the Infra Standard might trade user convenience +for a measure of user privacy. + +
In general, due to the internet's architecture, a user can be distinguished from another by the +user's IP address. IP addresses do not perfectly match to a user; as a user moves from device to +device, or from network to network, their IP address will change; similarly, NAT routing, proxy +servers, and shared computers enable packets that appear to all come from a single IP address to +actually map to multiple users. Technologies such as onion routing can be used to further anonymize +requests so that requests from a single user at one node on the internet appear to come from many +disparate parts of the network. [[RFC791]] + +
However, the IP address used for a user's requests is not the only mechanism by which a user's +requests could be related to each other. Cookies, for example, are designed specifically to enable +this, and are the basis of most of the web's session features that enable you to log into a site +with which you have an account. More generally, any kind of cache mechanism or shared state, +including but not limited to HSTS, the HTTP cache, grouping of connections, storage APIs, can and +ought to be expected to be abused. [[COOKIES]] [[RFC6797]] [[STORAGE]] + +
There are other mechanisms that are more subtle. Certain characteristics of a user's system can +be used to distinguish groups of users from each other. By collecting enough such information, an +individual user's browser's "digital fingerprint" can be computed, which can be better than an IP +address in ascertaining which requests are from the same user. + +
Grouping requests in this manner, especially across multiple sites, can be used for malevolent +purposes, e.g., governments combining information such as the person's home address (determined from +the addresses they use when getting driving directions on one site) with their apparent political +affiliations (determined by examining the forum sites that they participate in) to determine whether +the person should be prevented from voting in an election. + +
Since the malevolent purposes can be remarkably evil, user agent implementors and specification +authors are strongly encouraged to minimize leaking information that could be used to fingerprint or +track a user. + +
Unfortunately, as the first paragraph in this section implies, sometimes there is great benefit +to be derived from exposing APIs that can also be abused for fingerprinting and tracking purposes, +so it's not as easy as blocking all possible leaks. For instance, the ability to log into a site to +post under a specific identity requires that the user's requests be identifiable as all being from +the same user, more or less by definition. More subtly, though, information such as how wide text +is, which is necessary for many effects that involve drawing text onto a canvas (e.g., any effect +that involves drawing a border around the text) also leaks information that can be used to group a +user's requests. (In this case, by potentially exposing, via a brute force search, which fonts a +user has installed, information which can vary considerably from user to user.) + +
Features that are defined in documents using the Infra Standard that can be used +as a tracking vector are marked as this paragraph is. + +
Other features in the platform can be used for the same purpose, including, but not limited to: + +
Algorithms, and requirements phrased in the imperative as part of algorithms (such as "strip any +leading spaces" or "return false") are to be interpreted with the meaning of the keyword (e.g., +"must") used in introducing the algorithm or step. If no such keyword is used, must is implied. + +
For example, were the spec to say:
+ +To eat an orange, the user must: + +
it would be equivalent to the following:
+ +To eat an orange: + +
Here the key word is "must".
+ +Modifying the above example, if the algorithm was introduced only with "To eat + an orange:", it would still have the same meaning, as "must" is implied. +
Conformance requirements phrased as algorithms or specific steps may be implemented in any +manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be +easy to follow, and not intended to be performant.) + +
Performance is tricky to get correct as it is influenced by user perception, computer +architectures, and different types of input that can change over time in how common they are. For +instance, a JavaScript engine likely has many different code paths for what is standardized as a +single algorithm, in order to optimize for speed or memory consumption. Standardizing all those code +paths would be an insurmountable task and not productive as they would not stand the test of time +as well as the single algorithm would. Therefore performance is best left as a field to compete +over. + + +
A document using the Infra Standard generally should not enforce specific limits on algorithm +inputs with regards to their size, resource usage, or equivalent. This allows for competition among +user agents and avoids constraining the potential computing needs of the future. + +
Nevertheless, user agents may impose implementation-defined limits on +otherwise unconstrained inputs. E.g., to prevent denial of service attacks, to guard against running +out of memory, or to work around platform-specific limitations. + +
Global resource limits can be used as side channels through a variant on a resource +exhaustion attack, whereby the attacker can observe whether a victim application reaches the global +limit. Limits could also be used to fingerprint the user agent, but only if they make the user agent +more unique in some manner, e.g., if they are specific to the underlying hardware. + +
An API that allows creating an in-memory bitmap might
+be specified to allow any dimensions, or any dimensions up to some large limit like JavaScript's
+Number.MAX_SAFE_INTEGER
. However, implementations can choose to impose some
+implementation-defined (and thus not specified) limit on the dimensions, instead of
+attempting to allocate huge amounts of memory.
+
+
A programming language might not have a +maximum call stack size specified. However, implementations could choose to impose one for practical +reasons. + +
As code can end up depending on a particular limit, it can be useful to define a limit for +interoperability. Sometimes, embracing that is not problematic for the future, and can make the code +run in more user agents. + +
It can also be useful to constrain an implementation-defined limit with a lower limit. +I.e., ensuring all implementations can handle inputs of a given minimum size. + + +
Algorithm names are usually verb phrases, but sometimes are given names that emphasize their +standalone existence, so that standards and readers can refer to the algorithm more idiomatically. + +
Some algorithm names in the latter category +include "attribute change steps", "internal module script graph fetching procedure", and "overload +resolution algorithm". + +
Declare algorithms by stating their name, parameters, and return type, in the following form: + +
To [algorithm name], given a [type1] +[parameter1], a [type2] [parameter2], …, perform the +following steps. They return a [return type].
+ +(For non-verb phrase algorithm names, use "To perform the +[algorithm name]…". See also [[#algorithm-params]] for more complicated +parameter-declaration forms.) + +
To parse an awesome format given +a [=byte sequence=] bytes, perform the following steps. They return a +[=string=] or null. + +
Algorithms which do not return a value use a shorter form. This same shorter form can be used even +for algorithms that do return a value if the return type is relatively easy to infer from the +algorithm steps: + +
To [algorithm name], given a [type1] +[parameter1], a [type2] [parameter2], …:
+ +To +parse an awesome format given a [=byte sequence=] bytes: + +
Very short algorithms can be declared and specified using a single sentence: + +
To parse an awesome format +given a [=byte sequence=] bytes, return the result of +ASCII uppercasing the isomorphic decoding +of bytes. + +
Types should be included in algorithm declarations, but may be omitted if the parameter name is +clear enough, or if they are otherwise clear from context. (For example, because the algorithm is +a simple wrapper around another one.) + +
To
+load a classic script given url, return the result of performing the
+internal script-loading algorithm given url and "classic
".
+
+
+
Algorithm parameters are usually listed sequentially, in the fashion described in +[[#algorithm-declaration]]. However, there are some more complicated cases.
+ +Algorithm parameters can be optional, in which case the algorithm declaration must list them as +such, and list them after any non-optional parameters. They can either be given a default value, or +the algorithm body can check whether or not the argument was given. Concretely, use the following +forms: + +
… an optional [type] [parameter] … + +
… an optional [type] [parameter] (default [default +value]) … + +
Optional boolean parameters must have a default value specified, and that default must be +false. + +
To navigate to a + resource resource, with an optional string navigationType and + an optional boolean exceptionsEnabled (default false): + +
To call algorithms with such optional positional parameters, the optional argument values can be +omitted, but only the trailing ones. + +
Call sites to the previous example's algorithm would look like one of: + +
form submission
".
+
+ form submission
" and true.
+ But, there would be no way to supply a non-default value for the third + (exceptionsEnabled) argument, while leaving the second + (navigationType) argument as not-given. Additionally, the last of these calls is + fairly unclear for readers, as the fact that "true" means "exceptions enabled" requires going back + to the algorithm's declaration and counting parameters. Read on for how to fix these issues! +
Optional named parameters, instead of positional ones, can be used to increase clarity and +flexibility at the call site. Such parameters are marked up as both variables and definitions, and +linked to from their call sites. + +
To navigate to a + resource resource, with an optional string + navigationType and an + optional boolean + exceptionsEnabled + (default false): + +
Call sites would then look like one of: + +
form-submission
".
+
+ form-submission
" and
+ exceptionsEnabled set to
+ true.
+ Note how within the algorithm steps, the argument value is not linked to the parameter +declaration; it remains just a variable reference. Linking to the parameter declaration is done only +at the call sites. + +
Non-optional named parameters may also be used, using the same convention of marking them up as +both variables and definitions, and linking to them from call sites. This can improve clarity at the +call sites. + +
Boolean +parameters are a case where naming the parameter can be significantly clearer than leaving it as +positional, regardless of optionality. See +The Pitfalls of Boolean +Trap for discussion of this in the context of programming languages. + +
Another complementary technique for improving clarity is to package up related values into a +struct, and pass that struct as a parameter. This is especially applicable when the same set +of related values is used as the input to multiple algorithms. + + +
A variable is declared with "let" and changed with "set". + +
Let |list| be a new list.
+ +Let |value| be null. + +
If |input| is a string, then set |value| to |input|. + +
Otherwise, set |value| to |input|, UTF-8 decoded. + +
Let activationTarget be +target if isActivationEvent is true and target has activation +behavior; otherwise null. + +
Variables must not be used before they are declared. Variables are +block scoped. +Variables must not be declared more than once per algorithm. + +
A multiple assignment syntax can be used to assign multiple variables to the tuple's +items, by surrounding the variable names with parenthesis and separating each +variable name by a comma. The number of variables assigned cannot differ from the number of +items in the tuple. + +
Let |statusInstance| be the status (200, `OK
`).
+
Let (|status|, |statusMessage|) be |statusInstance|. +
Assigning |status| and |statusMessage| could be written as two separate steps that use an index + or name to access the tuple's items. +
The control flow of algorithms is such that a requirement to "return" or "throw" terminates the +algorithm the statement was in. "Return" will hand the given value, if any, to its caller. "Throw" +will make the caller automatically rethrow the given value, if any, and thereby terminate the +caller's algorithm. Using prose the caller has the ability to "catch" the exception and perform +another action. + + +
Sometimes it is useful to stop performing a series of steps once a condition becomes true. + +
To do this, state that a given series of steps will abort when a specific +condition is reached. This indicates that the specified steps must be evaluated, not +as-written, but by additionally inserting a step before each of them that evaluates +condition, and if condition evaluates to true, skips the remaining steps. + +
In such algorithms, the subsequent step can be annotated to run if aborted, in +which case it must run if any of the preceding steps were skipped due to the condition +of the preceding abort when step evaluated to true. + +
The following algorithm + +
Let |result| be an empty list. + +
Run these steps, but abort when the user clicks the "Cancel" button: + +
+If aborted, append "Didn't finish!
" to |result|.
+
is equivalent to the more verbose formulation
+ +Let |result| be an empty list. + +
If the user has not clicked the "Cancel" button, then: + +
Compute the first million digits of π, and append the result + to |result|. + +
If the user has not clicked the "Cancel" button, then: + +
+If the user clicked the "Cancel" button, then append
+ "Didn't finish!
" to |result|.
+
Whenever this construct is used, implementations are allowed to evaluate +condition during the specified steps rather than before and after each step, as long as +the end result is indistinguishable. For instance, as long as |result| in the above example is not +mutated during a compute operation, the user agent could stop the computation. + + +
Algorithms with conditional statements should use the keywords "if", "then", and "otherwise". + +
Let |value| be null. + +
If |input| is a string, then set |value| to |input|. + +
Return |value|. +
Let |value| be null. + +
If |input| is a string, then set |value| to |input|. + +
Otherwise, set |value| to failure. + +
Return |value|. +
There's a variety of ways to repeat a set of steps until a condition is reached. + +
The Infra Standard is not (yet) exhaustive on this; please file an issue if you need +something. + +
An instruction to repeat a set of steps as long as a condition is met. + +
While |condition| is "met
":
+
+
… +
An iteration's flow can be controlled via requirements to +continue or break. +Continue will skip over any remaining steps in an iteration, proceeding to the +next item. If no further items remain, the iteration will stop. Break will skip +over any remaining steps in an iteration, and skip over any remaining items as well, stopping the +iteration. + +
Let |example| be the list « 1, 2, 3, 4 ». The following prose would perform |operation| + upon 1, then 2, then 3, then 4: + +
For each |item| of |example|: +
The following prose would perform |operation| upon 1, then 2, then 4. 3 would be skipped. + +
For each |item| of |example|: +
The following prose would perform |operation| upon 1, then 2. 3 and 4 would be skipped. + +
+To improve readability, it can sometimes help to add assertions to algorithms, stating +invariants. To do this, write "Assert:", followed by a statement that must be +true. If the statement ends up being false that indicates an issue with the document using the Infra +Standard that should be reported and addressed. + +
Since the statement can only ever be true, it has no implications for implementations. + +
Let |x| be "Aperture Science
".
+
Assert: |x| is "Aperture Science
".
+
The value null is used to indicate the lack of a value. It can be used interchangeably with the +JavaScript null value. [[!ECMA-262]] + +
Let element be null. + +
If input is the empty string, then return null. + + +
A boolean is either true or false. + +
Let elementSeen be false. + + +
Numbers are complicated; please see +issue #87. In due course we hope to offer +more guidance here around types and mathematical operations. Help appreciated! + +
An 8-bit unsigned integer is an integer in the range 0 to 255 (0 to +28 − 1), inclusive. + +
A 16-bit unsigned integer is an integer in the range 0 to 65535 (0 to +216 − 1), inclusive. + +
A 32-bit unsigned integer is an integer in the range 0 to 4294967295 (0 to +232 − 1), inclusive. + +
A 64-bit unsigned integer is an integer in the range 0 to 18446744073709551615 +(0 to 264 − 1), inclusive. + +
A 128-bit unsigned integer is an integer in the range 0 to +340282366920938463463374607431768211455 (0 to 2128 − 1), inclusive. + +
An IPv6 address is an +128-bit unsigned integer. + +
An 8-bit signed integer is an integer in the range −128 to 127 +(−27 to 27 − 1), inclusive. + +
A 16-bit signed integer is an integer in the range −32768 to 32767 +(−215 to 215 − 1), inclusive. + +
A 32-bit signed integer is an integer in the range −2147483648 to +2147483647 (−231 to 231 − 1), inclusive. + +
A 64-bit signed integer is an integer in the range −9223372036854775808 +to 9223372036854775807 (−263 to 263 − 1), inclusive. + + +
A byte is a sequence of eight bits and is represented as "0x
"
+followed by two ASCII upper hex digits, in the range 0x00 to 0xFF, inclusive. A byte's
+value is its underlying number.
+
+
0x40 is a byte whose value is 64. + +
An ASCII byte is a byte in the range 0x00 (NUL) to 0x7F (DEL), +inclusive. As illustrated, an ASCII byte, excluding 0x28 and 0x29, may be followed by the +representation outlined in the Standard Code +section of ASCII format for Network Interchange, between parentheses. [[!RFC20]] + +
0x28 may be followed by "(left parenthesis)" and 0x29 by "(right parenthesis)". + +
0x49 (I) when UTF-8 decoded becomes the +code point U+0049 (I). + + +
A byte sequence is a sequence of bytes, represented as a space-separated +sequence of bytes. Byte sequences with bytes in the range 0x20 (SP) to 0x7E (~), inclusive, can +alternately be written as a string, but using backticks instead of quotation marks, to avoid +confusion with an actual string. + +
0x48 0x49 can also be represented as `HI
`.
+
+
Headers, such as `Content-Type
`, are byte sequences.
+
To get a byte sequence out of a string, using UTF-8 encode from +Encoding is encouraged. In rare circumstances isomorphic encode might be needed. +[[ENCODING]] + +
A byte sequence's length is the number of +bytes it contains. + +
To byte-lowercase a byte sequence, increase each byte it +contains, in the range 0x41 (A) to 0x5A (Z), inclusive, by 0x20. + +
To byte-uppercase a byte sequence, subtract each byte it +contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20. + +
A byte sequence A is a byte-case-insensitive match for a +byte sequence B, if the byte-lowercase of A is the +byte-lowercase of B. + +
A byte sequence potentialPrefix is a +prefix of a byte sequence input if the +following steps return true: + +
Let i be 0. + +
While true: + +
If i is greater than or equal to potentialPrefix's + length, then return true. + +
If i is greater than or equal to input's + length, then return false. + +
Let potentialPrefixByte be the ith byte of + potentialPrefix. + +
Let inputByte be the ith byte of input. + +
Return false if potentialPrefixByte is not inputByte. + +
Set i to i + 1. +
"input starts with +potentialPrefix" can be used as a synonym for "potentialPrefix is a +prefix of input". + +
A byte sequence a is byte less than a byte sequence +b if the following steps return true: + +
If b is a prefix of a, then return false. + +
If a is a prefix of b, then return true. + +
Let n be the smallest index such that the nth byte of + a is different from the nth byte of b. (There has to be such an + index, since neither byte sequence is a prefix of the other.) + +
If the nth byte of a is less than the nth byte of + b, then return true. + +
Return false. +
To isomorphic decode a byte sequence input, return a +string whose code point length is equal to input's +length and whose code points have the same +values as the values of input's bytes, in +the same order. +
A code point is a Unicode code point and is +represented as "U+" followed by four-to-six ASCII upper hex digits, in the range U+0000 to +U+10FFFF, inclusive. A code point's value is its +underlying number. + +
A code point may be followed by its name, by its rendered form between parentheses when it +is not U+0028 or U+0029, or by both. Documents using the Infra Standard are encouraged to follow +code points by their name when they cannot be rendered or are U+0028 or U+0029; otherwise, +follow them by their rendered form between parentheses, for legibility. + +
A code point's name is defined in Unicode and represented in +ASCII uppercase. [[!UNICODE]] + +
The code point rendered as 🤔 is represented as U+1F914. + +
When referring to that code point, we might say "U+1F914 (🤔)", to provide extra context. + Documents are allowed to use "U+1F914 THINKING FACE (🤔)" as well, though this is somewhat verbose. +
Code points that are difficult +to render unambigiously, such as U+000A, can be referred to as "U+000A LF". U+0029 can be referred +to as "U+0029 RIGHT PARENTHESIS", because even though it renders, this avoids unmatched parentheses. + +
Code points are sometimes referred to as characters and in certain contexts are +prefixed with "0x" rather than "U+". + +
A leading surrogate is a code point that is in the range U+D800 to +U+DBFF, inclusive. + +
A trailing surrogate is a code point that is in the range U+DC00 to +U+DFFF, inclusive. + +
A surrogate is a leading surrogate or a trailing surrogate. + +
A scalar value is a code point that is not a surrogate. + +
A noncharacter is a code point that is in the range U+FDD0 to U+FDEF, +inclusive, or U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, +U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, +U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, +U+FFFFF, U+10FFFE, or U+10FFFF. + +
An ASCII code point is a code point in the range U+0000 NULL to +U+007F DELETE, inclusive. + +
An ASCII tab or newline is +U+0009 TAB, U+000A LF, or U+000D CR. + +
ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 +SPACE. + +
"Whitespace" is a mass noun. + +
The XML, JSON, and parts of the HTTP specifications exclude U+000C FF in their definition of + whitespace: + +
Prefer using Infra's [=ASCII whitespace=] definition for new features, unless your specification + deals exclusively with XML/JSON/HTTP. +
A C0 control is a code point in the range U+0000 NULL to +U+001F INFORMATION SEPARATOR ONE, inclusive. + +
A C0 control or space is a +C0 control or U+0020 SPACE. + +
A control is a C0 control or a code point in the range +U+007F DELETE to U+009F APPLICATION PROGRAM COMMAND, inclusive. + +
An ASCII digit is a code point in the range U+0030 (0) to U+0039 (9), +inclusive. + +
An ASCII upper hex digit is an ASCII digit or a code point in the +range U+0041 (A) to U+0046 (F), inclusive. + +
An ASCII lower hex digit is an ASCII digit or a code point in the +range U+0061 (a) to U+0066 (f), inclusive. + +
An ASCII hex digit is an ASCII upper hex digit or +ASCII lower hex digit. + +
An ASCII upper alpha is a code point in the range U+0041 (A) to +U+005A (Z), inclusive. + +
An ASCII lower alpha is a code point in the range U+0061 (a) to +U+007A (z), inclusive. + +
An ASCII alpha is an ASCII upper alpha or ASCII lower alpha. + +
An ASCII alphanumeric is an ASCII digit or ASCII alpha. + + +
A string is a sequence of +16-bit unsigned integers, also known as code units. A +string is also known as a JavaScript string. +Strings are denoted by double quotes and monospace font. + +
"Hello, world!
" is a string.
+
+
This is different from how Unicode defines "code unit". In particular it +refers exclusively to how Unicode defines it for Unicode 16-bit strings. [[UNICODE]] + +
A string can also be interpreted as containing code points, per the conversion +defined in The String Type section of the JavaScript specification. [[!ECMA-262]] + +
This conversion process converts surrogate pairs into their corresponding +scalar value and maps any remaining surrogates to their corresponding code point, +leaving them effectively as-is. + +
A string consisting of the +code units 0xD83D, 0xDCA9, and 0xD800, when interpreted as containing code points, +would consist of the code points U+1F4A9 and U+D800. + +
A string's +length +is the number of code units it contains. + +
A string's +code point length is the number +of code points it contains. + +
To signify strings with additional restrictions on the code points they can contain +this specification defines ASCII strings, isomorphic strings, and +scalar value strings. Using these improves clarity in specifications. + +
An ASCII string is a string whose code points are all +ASCII code points. + +
An isomorphic string is a string whose code points are all in the +range U+0000 NULL to U+00FF (ÿ), inclusive. + +
A scalar value string is a string whose code points are all +scalar values. + +
A scalar value string is useful for any kind of I/O or other kind of operation +where UTF-8 encode comes into play. + + +
To convert a +string into a scalar value string, replace any surrogates with U+FFFD (�). + +
The replaced surrogates are never part of surrogate pairs, since the process of interpreting the + string as containing code points will have converted surrogate pairs into + scalar values. + +
A scalar value string can always be used as a string implicitly since every + scalar value string is a string. On the other hand, a string can only be + implicitly used as a scalar value string if it is known to not contain surrogates; + otherwise a conversion is to be performed. + +
An implementation likely has to perform explicit conversion, depending on how it actually ends + up representing strings and scalar value strings. It is fairly typical for + implementations to have multiple implementations of strings alone for performance and memory + reasons. +
A string a is or is +identical to a string b if it consists of the same sequence of +code units. + +
Except where otherwise stated, all string comparisons use is. + +
This type of string comparison was formerly known as a "case-sensitive" +comparison in HTML. Strings that compare as identical to one another +are not only sensitive to case variation (such as UPPER and lower case), but also to other code +point encoding choices, such as normalization form or the order of combining marks. Two strings that +are visually or even canonically equivalent according to Unicode might still not be +identical to each other. [[HTML]] [[UNICODE]] + +
A string potentialPrefix is a code unit prefix of a +string input if the following steps return true: + +
Let i be 0. + +
While true: + +
If i is greater than or equal to potentialPrefix's + length, then return true. + +
If i is greater than or equal to input's length, + then return false. + +
Let potentialPrefixCodeUnit be the ith code unit of + potentialPrefix. + +
Let inputCodeUnit be the ith code unit of input. + +
Return false if potentialPrefixCodeUnit is not inputCodeUnit. + +
Set i to i + 1. +
When it is clear from context that code units are in play, e.g., because one of the +strings is a literal containing only characters that are in the range U+0020 SPACE to U+007E (~), +"input starts with potentialPrefix" can be used +as a synonym for "potentialPrefix is a code unit prefix of input". + +
With unknown values, it is good to be explicit:
+targetString is a code unit prefix of userInput. But with a
+literal, we can use plainer language: userInput starts with
+"!
".
+
+
A string potentialSuffix is a code unit suffix of a +string input if the following steps return true: + +
Let i be 1. + +
While true: + +
Let potentialSuffixIndex be potentialSuffix's + length − i. + +
Let inputIndex be input's length − + i. + +
If potentialSuffixIndex is less than 0, then return true. + +
If inputIndex is less than 0, then return false. + +
Let potentialSuffixCodeUnit be the potentialSuffixIndexth + code unit of potentialSuffix. + +
Let inputCodeUnit be the inputIndexth code unit of + input. + +
Return false if potentialSuffixCodeUnit is not inputCodeUnit. + +
Set i to i + 1. +
When it is clear from context that code units are in play, e.g., because one of the +strings is a literal containing only characters that are in the range U+0020 SPACE to U+007E (~), +"input ends with potentialSuffix" can be used as +a synonym for "potentialSuffix is a code unit suffix of input". + +
With unknown values, it is good to be explicit:
+targetString is a code unit suffix of domain. But with a
+literal, we can use plainer language: domain ends with
+".
".
+
+
A string a is code unit less than a string +b if the following steps return true: + +
If b is a code unit prefix of a, then return false. + +
If a is a code unit prefix of b, then return true. + +
Let n be the smallest index such that the nth code unit of + a is different from the nth code unit of b. (There has to be such + an index, since neither string is a prefix of the other.) + +
If the nth code unit of a is less than the nth code unit of + b, then return true. + +
Return false. +
This matches the ordering used by JavaScript's <
operator, and its
+{{Array/sort()}} method on an array of strings. This ordering compares the 16-bit code units in each
+string, producing a highly efficient, consistent, and deterministic sort order. The resulting
+ordering will not match any particular alphabet or lexicographic order, particularly for
+code points represented by a surrogate pair. [[!ECMA-262]]
+
+
For example, the code point U+FF5E FULLWIDTH +TILDE (~) is obviously less than the code point U+1F600 (😀), but the tilde is composed of a single +code unit 0xFF5E, while the smiley is composed of two code units 0xD83D and 0XDE00, so the smiley is +[=code unit less than=] the tilde. + +
The code unit substring from start with length length +within a string string is determined as follows: + +
Assert: start and length are nonnegative.
Assert: start + length is less than or equal to + string's length.
Let result be the empty string.
For each i in the range from + start to start + length, exclusive: append the ith + code unit of string to result.
Return result.
The code unit substring from +start to end within a string string is the code +unit substring from start with length end − start within +string. + +
The code unit substring from +start to the end of a string string is the +code unit substring from start to +string's length within string. + +
The code unit substring from 1 with
+length 3 within "Hello world
" is "ell
". This can also be expressed as the
+code unit substring from 1 to 4.
+
+
The numbers given to these algorithms are best thought of as positions +between code units, not indices of the code units themselves. The substring returned +is then formed by the code units between these positions. That explains why, for example, the +code unit substring from 0 to 0 within the empty string +is the empty string, even though there is no code unit at index 0 within the empty string. + +
The code point substring within a string string from +start with length length is determined as follows: + +
Assert: start and length are nonnegative.
Assert: start + length is less than or equal to + string's code point length.
Let result be the empty string.
For each i in the range from + start to start + length, exclusive: append the ith + code point of string to result.
Return result.
The code point substring from +start to end within a string string is the +code point substring within string from start with length +end − start. + +
The code point substring +from start to the end of a string string is the +code point substring from start to +string's code point length within string. + +
Generally, code unit substring is used when given developer-supplied positions or + lengths, since that is how string indexing works in JavaScript. See, for example, the methods of + the {{CharacterData}} class. [[DOM]] + +
Otherwise, code point substring is likely to be better. For example, the
+ code point substring from 0 with length 1 within "👽
" is "👽
",
+ whereas the code unit substring from 0 with length 1 within "👽
" is the
+ string containing the single surrogate U+D83B.
+
To isomorphic encode an isomorphic string input: return a +byte sequence whose length is equal to input's +code point length and whose bytes have the same values as +the values of input's code points, in the same order. +
To ASCII lowercase a string, replace all ASCII upper alphas in +the string with their corresponding code point in ASCII lower alpha. + +
To ASCII uppercase a string, replace all ASCII lower alphas in +the string with their corresponding code point in ASCII upper alpha. + +
A string A is an ASCII case-insensitive match for a +string B, if the ASCII lowercase of A is the +ASCII lowercase of B. + + +
To ASCII encode an ASCII string input: return the +isomorphic encoding of input. + +
Isomorphic encode and UTF-8 encode return the same byte sequence +for input. +
To ASCII decode a byte sequence input, run these steps: + +
Assert: all bytes in input are ASCII bytes. + +
Note: This precondition ensures that isomorphic decode and + UTF-8 decode return the same string for this input. + +
Return the isomorphic decoding of input. +
To strip newlines from a string, remove any U+000A LF and U+000D CR +code points from the string. + +
To normalize newlines in a string, replace every U+000D CR U+000A LF +code point pair with a single U+000A LF code point, and then replace every remaining +U+000D CR code point with a U+000A LF code point. + +
To strip leading and trailing ASCII whitespace from a string, remove all +ASCII whitespace that are at the start or the end of the string. + +
To strip and collapse ASCII whitespace in a string, replace any sequence +of one or more consecutive code points that are ASCII whitespace in the string +with a single U+0020 SPACE code point, and then remove any leading and trailing +ASCII whitespace from that string. + +
To collect a sequence of code points meeting a condition condition from +a string input, given a position variable +position tracking the position of the calling algorithm within input:
+ +Let result be the empty string. + +
While position doesn't point past the end of input and the + code point at position within input meets the condition + condition: + +
Append that code point to the end of result. + +
Advance position by 1. +
Return result. +
In addition to returning the collected code points, this algorithm updates the +position variable in the calling algorithm. + +
To skip ASCII whitespace within a string input given a +position variable position, collect a sequence of code points that are +ASCII whitespace from input given position. The collected +code points are not used, but position is still updated. + +
To strictly split a string +input on a particular delimiter code point delimiter:
+ +Let position be a position variable for input, initially + pointing at the start of input. + +
Let token be the result of collecting a sequence of code points that are + not equal to delimiter from input, given position. + +
Append token to tokens. + +
While position is not past the end of input: + +
Assert: the code point at position within input is + delimiter. + +
Advance position by 1. + +
Let token be the result of collecting a sequence of code points that are + not equal to delimiter from input, given position. + +
Append token to tokens. +
Return tokens. +
This algorithm is a "strict" split, as opposed to the commonly-used variants +for ASCII whitespace and +for commas below, which are both more lenient in various ways involving +interspersed ASCII whitespace. + +
To split a +string input on ASCII whitespace: + +
Let position be a position variable for input, initially + pointing at the start of input. + +
Skip ASCII whitespace within input given position. + +
While position is not past the end of input: + +
Let token be the result of collecting a sequence of code points that are + not ASCII whitespace from input, given position. + +
Append token to tokens. + +
Skip ASCII whitespace within input given position. +
Return tokens. +
To split a string +input on commas: + +
Let position be a position variable for input, initially + pointing at the start of input. + +
While position is not past the end of input: + +
Let token be the result of collecting a sequence of code points that are + not U+002C (,) from input, given position. + +
token might be the empty string. +
Append token to tokens. + +
If position is not past the end of input, then: + +
Assert: the code point at position within input is + U+002C (,). + +
Advance position by 1. +
Return tokens. +
To concatenate a list of +strings list, using an optional separator string separator, run +these steps: + +
If list is empty, then return the empty string. + +
If separator is not given, then set separator to the empty string. + +
Return a string whose contents are list's items, in + order, separated from each other by separator. +
To serialize a set set, return the +concatenation of set using U+0020 SPACE. + + +
Represent time using the [=moment=] and [=duration=] specification types. Follow the advice in +[[HR-TIME-3#sec-tools]] when creating these and exchanging them with JavaScript. [[HR-TIME]] + + +
Conventionally, specifications have operated on a variety of vague specification-level data +structures, based on shared understanding of their semantics. This generally works well, but can +lead to ambiguities around edge cases, such as iteration order or what happens when you +append an item to an ordered set that the set already +contains. It has also led to a variety of divergent notation and phrasing, especially +around more complex data structures such as maps. + +
This standard provides a small set of common data structures, along with notation and phrasing +for working with them, in order to create common ground. + + +
A list is a specification type consisting of a finite ordered sequence of +items. + +
For notational convenience, a literal syntax can be used to express lists, by surrounding +the list by « » characters and separating its items with a comma. An indexing syntax +can be used by providing a zero-based index into a list inside square brackets. The index cannot be +out-of-bounds, except when used with exists. + +
Let |example| be the list « "a
",
+"b
", "c
", "a
" ». Then |example|[1] is the string
+"b
".
+
+
For notational convenience, a multiple assignment syntax may be used to assign multiple variables +to the list's items, by surrounding the variables to be assigned by « » +characters and separating each variable name with a comma. The list's size +must be the same as the number of variables to be assigned. Each variable given is then set to the +value of the list's item at the corresponding index. + +
When a list's contents are not fully controlled, as is the case for lists from user input, +the list's size should be checked to ensure it is the expected size before +list multiple assignment syntax is used. + +
If |list|'s size is not 3
, then return failure.
+
Let « |a|, |b|, |c| » be |list|. +
To append to a list that is not an ordered set is to +add the given item to the end of the list. + +
To extend a list |A| with a list |B|, +for each |item| of |B|, append |item| to |A|. + +
To prepend to a list that is not an ordered set is to +add the given item to the beginning of the list. + +
To replace within a list that is not an ordered set is +to replace all items from the list that match a given condition with the given item, +or do nothing if none do. + +
The above definitions are modified when the list is an ordered set; see +below for ordered set append, prepend, and +replace. + +
To insert an item into a list before an +index is to add the given item to the list between the given index − 1 and the given index. If +the given index is 0, then prepend the given item to the list. + +
To remove zero or more items from a list is +to remove all items from the list that match a given condition, or do nothing if none do. + +
Removing |x| from the list « |x|, |y|, |z|, |x| » is to remove all + items from the list that are equal to |x|. The list now is equivalent to « |y|, |z| ». + +
Removing all items that start with the string "a
" from the
+ list « "a
", "b
", "ab
", "ba
" » is to
+ remove the items "a
" and "ab
". The list is now equivalent to «
+ "b
", "ba
" ».
+
To empty a list is to remove +all of its items. + +
A list contains an +item if it appears in the list. We can also denote this by saying that, for a +list |list| and an index |index|, "|list|[|index|] exists". + +
A list's size is the number of +items the list contains. + +
A list is empty if +its size is zero. + +
To get the indices of a +list, return the range from 0 to the list's +size, exclusive. + +
To iterate over a list, performing a +set of steps on each item in order, use phrasing of the form +"For each |item| of list", and then operate on |item| in the +subsequent prose. + +
To clone a list |list| is to create a new +list |clone|, of the same designation, and, for each |item| of |list|, +append |item| to |clone|, so that |clone| contains the same +items, in the same order as |list|. + +
This is a "shallow clone", as the items themselves are not cloned in +any way. + +
Let |original| be the ordered set «
+"a
", "b
", "c
" ». Cloning |original| creates
+a new ordered set |clone|, so that replacing "a
" with
+"foo
" in |clone| gives « "foo
", "b
", "c
" »,
+while |original|[0] is still the string "a
".
+
+
To sort in ascending order +a list |list|, with a less than algorithm |lessThanAlgo|, is to create a new list +|sorted|, containing the same items as |list| but sorted so that according to +|lessThanAlgo|, each item is less than the one following it, if any. For items that sort the same +(i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in +|sorted| must be the same as it was in |list|. + +
To sort in descending order +a list |list|, with a less than algorithm |lessThanAlgo|, is to create a new list +|sorted|, containing the same items as |list| but sorted so that according to +|lessThanAlgo|, each item is less than the one preceding it, if any. For items that sort the same +(i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in +|sorted| must be the same as it was in |list|. + +
Let |original| be the list « (200, "OK
"),
+(404, "Not Found
"), (null, "OK
") ». Sorting |original| in
+ascending order, with |a| being less than |b| if |a|'s second item is
+code unit less than |b|'s second item, gives the result « (404,
+"Not Found
"), (200, "OK
"), (null, "OK
") ».
The list type originates from the JavaScript specification (where it is capitalized, as +List); we repeat some elements of its definition here for ease of reference, +and provide an expanded vocabulary for manipulating lists. Whenever JavaScript expects a +List, a list as defined here can be used; they are the same type. +[[!ECMA-262]] + +
Some lists are designated as stacks. A stack is a list, +but conventionally, the following operations are used to operate on it, instead of using +append, prepend, or remove. + +
To push onto a stack is to append to it. + +
To pop from a stack: if the stack +is not empty, then remove its last item and return +it; otherwise, return nothing. + +
To peek into a stack: if the stack +is not empty, then return its last item; otherwise, return +nothing. + +
Although stacks are lists, for each must not be used with them; +instead, a combination of while and pop is more appropriate. + +
Some lists are designated as queues. A queue is a list, +but conventionally, the following operations are used to operate on it, instead of using +append, prepend, or remove. + +
To enqueue in a queue is to append to it. + +
To dequeue from a queue is to remove its first +item and return it, if the queue is not empty, or to return +nothing if it is. + +
Although queues are lists, for each must not be used with them; +instead, a combination of while and dequeue is more appropriate. + +
Some lists are designated as ordered sets. An +ordered set is a list with the additional semantic that it must not contain the same +item twice. + +
Almost all cases on the web platform require an ordered set, instead of an +unordered one, since interoperability requires that any developer-exposed enumeration of the set's +contents be consistent between browsers. In those cases where order is not important, we still use +ordered sets; implementations can optimize based on the fact that the order is not observable. + +
To create a [=/set=], given a [=/list=] |input|: + +
Let |result| be an empty [=/set=]. +
[=list/For each=] |item| of |input|, [=set/append=] |item| to |result|. +
Return |result|. +
To append to an ordered set: if the set contains +the given item, then do nothing; otherwise, perform the normal list +append operation. + +
To prepend to an ordered set: if the set +contains the given item, then do nothing; otherwise, perform the +normal list prepend operation. + +
To replace within an ordered set +set, given item and replacement: if set +contains item or replacement, then replace the first instance +of either with replacement and remove all other instances. + +
Replacing "a" with "c" within the +ordered set « "a", "b", "c" » gives « "c", "b" ». Within « "c", "b", "a" » it gives +« "c", "b" » as well. + +
An ordered set |set| is a subset of another ordered set +|superset| (and conversely, |superset| is a superset of |set|) if, +for each |item| of |set|, |superset| contains |item|. + +
This implies that an ordered set is both a subset and a +superset of itself. + +
A [=/set=] |A| is equal to a [=/set=] |B| +if |A| is a [=subset=] of |B| and |A| is a [=superset=] of |B|. + +
The intersection of ordered sets |A| and |B|, is the result +of creating a new ordered set |set| and, for each |item| of |A|, if |B| +contains |item|, appending |item| to |set|. + +
The union of ordered sets |A| and |B|, is the result of +cloning |A| as |set| and, for each |item| of |B|, +appending |item| to |set|. + +
The difference of ordered sets |A| and |B|, is the result +of creating a new ordered set |set| and, for each |item| of |A|, if |B| +[=set/contains|does not contain=] |item|, appending |item| to |set|. + +
The range n to m, +inclusive, creates a new ordered set containing all of the integers from n up to +and including m in consecutively increasing order, as long as m is greater +than or equal to n. + +
The range n to m, exclusive, +creates a new ordered set containing all of the integers from n up to and including +m − 1 in consecutively increasing order, as long as m is greater than +n. If m equals n, then it creates an empty ordered set. + +
For each n of the range 1 to +4, inclusive, … + + +
An ordered map, or sometimes just "map", is a +specification type consisting of a finite ordered sequence of tuples, each consisting +of a key and a value, with no key appearing +twice. Each such tuple is called an entry. + + +
As with ordered sets, by default we assume that maps need to be ordered for +interoperability among implementations. + +
A literal syntax can be used to express ordered maps, by surrounding the ordered map with +«[ ]» characters, denoting each of its entries as |key| → |value|, and separating its +entries with a comma. + +
Let |example| be the ordered map «[
+"a
" → `x
`, "b
" → `y
` ]». Then
+|example|["a
"] is the byte sequence `x
`.
+
+
To get the value of an entry in an +ordered map map given a key key: + +
+ +We can also denote getting the value of an entry using an indexing syntax, +by providing a key inside square brackets directly following a map. + +
If map["test
"]
+exists, then return map["test
"].
+
+
To set the value of an entry in an +ordered map to a given value is to update the value of any existing +entry if the map contains an entry with the given key, +or if none such exists, to add a new entry with the given key/value to the end of the map. We can +also denote this by saying, for an ordered map |map|, key |key|, and value |value|, +"set |map|[|key|] to |value|". + +
To remove an entry from an ordered map is to remove +all entries from the map that match a given condition, or do nothing if none do. If +the condition is having a certain key, then we can also denote this by saying, for +an ordered map |map| and key |key|, "remove |map|[|key|]". + +
To clear an ordered map is to remove all entries +from the map. + +
An ordered map contains an +entry with a given key if there exists an entry with that key. +We can also denote this by saying that, for an ordered map |map| and key |key|, "|map|[|key|] +exists". + +
To get the keys of an +ordered map, return a new ordered set whose items are each of the +keys in the map's entries. + +
To get the values of an +ordered map, return a new list whose items are each of the +values in the map's entries. + +
An ordered map's size is the size of the result +of running get the keys on the map. + +
An ordered map is empty if its +size is zero. + +
To iterate over an ordered map, performing +a set of steps on each entry in order, use phrasing of the form +"For each |key| → |value| of |map|", and then operate on |key| and |value| in the +subsequent prose. + +
To clone an ordered map |map| is to create a new +ordered map |clone|, and, for each |key| → |value| of |map|, +set |clone|[|key|] to |value|. + +
This is a "shallow clone", as the keys and values +themselves are not cloned in any way. + +
Let |original| be the ordered map «[
+"a
" → «1, 2, 3», "b
" → «» ]». Cloning |original| creates a
+new ordered map |clone|, so that setting |clone|["a
"] to
+«-1, -2, -3» gives «[ "a
" → «-1, -2, -3», "b
" → «» ]» and leaves
+|original| unchanged. However, appending 4 to |clone|["b
"] will modify
+the corresponding value in both |clone| and |original|, as they both point to the
+same list.
+
+
To sort in ascending order +a map |map|, with a less than algorithm |lessThanAlgo|, is to create a new map +|sorted|, containing the same entries as |map| but sorted so that according to +|lessThanAlgo|, each entry is less than the one following it, if any. For entries that sort the same +(i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in +|sorted| must be the same as it was in |map|. + +
To sort in descending order +a map |map|, with a less than algorithm |lessThanAlgo|, is to create a new map +|sorted|, containing the same entries as |map| but sorted so that according to +|lessThanAlgo|, each entry is less than the one preceding it, if any. For entries that sort the same +(i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in +|sorted| must be the same as it was in |map|. + + +
A struct is a specification type consisting of a finite set of +items, each of which has a unique and immutable +name. An item holds a value of a defined type. + +
An email is an example struct consisting of a + local part (a string) and a host (a host). + +
A nonsense algorithm might use this definition as follows: + +
hostmaster
" and host
+ is infra.example
.
+ A tuple is a struct whose items are ordered. For +notational convenience, a literal syntax can be used to express tuples, by surrounding the +tuple with parenthesis and separating its items with a comma. To use this notation, +the names need to be clear from context. This can be done by preceding the first +instance with the name given to the tuple. An indexing syntax can be used by providing a +zero-based index into a tuple inside square brackets. The index cannot be out-of-bounds. + +
A status is an example tuple consisting of a code (a + number) and text (a byte sequence). + +
A nonsense algorithm that manipulates status tuples for the purpose of demonstrating their + usage is then:
+ +OK
`).
+ FOO BAR
`).
+ The last step could also be written as "If |statusInstance|[0] is 404, then …". This + might be preferable if the tuple names do not have explicit definitions. +
It is intentional that not all structs are tuples. Documents using the +Infra Standard might need the flexibility to add new names to their struct +without breaking literal syntax used by their dependencies. In that case a tuple is not appropriate. + + +
The conventions used in the algorithms in this section are those of the JavaScript +specification. [[!ECMA-262]] + +
To parse a JSON string to a JavaScript value, given a string |string|: + +
Return ? [$Call$](%JSON.parse%, undefined, « |string| »). +
To +parse JSON bytes to a JavaScript value, +given a byte sequence |bytes|: + +
Let |string| be the result of running UTF-8 decode on |bytes|. [[!ENCODING]] + +
Return the result of parsing a JSON string to a JavaScript value given |string|. +
To serialize a JavaScript value to a JSON string, given a JavaScript value +|value|: + +
Let |result| be ? [$Call$](%JSON.stringify%, undefined, « |value| »). + +
Since no additional arguments are passed to %JSON.stringify%, the resulting + string will have no whitespace inserted. + +
If |result| is undefined, then throw a {{TypeError}}. + +
This can happen if |value| does not have a JSON representation, e.g., if + it is undefined or a function. + +
Return |result|. +
To +serialize a JavaScript value to JSON bytes, +given a JavaScript value |value|: + +
Let |string| be the result of serializing a JavaScript value to a JSON string given + |value|. + +
Return the result of running UTF-8 encode on |string|. [[!ENCODING]] +
The above operations operate on JavaScript values directly; in particular, this means that +the involved objects or arrays are tied to a particular JavaScript realm. In +standards, it is often more convenient to convert between JSON and realm-independent maps, +lists, strings, booleans, numbers, and nulls. + +
To +parse a JSON string to an Infra value, +given a string |string|: + +
Let |jsValue| be ? [$Call$](%JSON.parse%, undefined, « |string| »). + +
Return the result of [=converting a JSON-derived JavaScript value to an Infra value=], given + |jsValue|. +
To parse JSON bytes to an Infra value, given a byte sequence |bytes|: + +
Let |string| be the result of running UTF-8 decode on |bytes|. [[!ENCODING]] + +
Return the result of parsing a JSON string to an Infra value given |string|. +
To convert a JSON-derived JavaScript value to an Infra value, given a JavaScript value +|jsValue|: + +
If |jsValue| is
If [$IsArray$](|jsValue|) is true, then: + +
Let |result| be an empty [=list=]. + +
Let |length| be ! [$ToLength$](! [$Get$](|jsValue|, "length
")).
+
+
[=list/For each=] |index| of [=the range=] 0 to |length| − 1, inclusive: + +
Let |indexName| be ! [$ToString$](|index|). + +
Let |jsValueAtIndex| be ! [$Get$](|jsValue|, |indexName|). + +
Let |infraValueAtIndex| be the result of [=converting a JSON-derived JavaScript value to an Infra value=], + given |jsValueAtIndex|. + +
[=list/Append=] |infraValueAtIndex| to |result|. +
Return |result|. +
Let |result| be an empty [=ordered map=]. + +
[=list/For each=] |key| of ! |jsValue|.\[[OwnPropertyKeys]](): + +
Let |jsValueAtKey| be ! [$Get$](|jsValue|, |key|). + +
Let |infraValueAtKey| be the result of [=converting a JSON-derived JavaScript value to an Infra value=], + given |jsValueAtKey|. + +
[=map/Set=] |result|[|key|] to |infraValueAtKey|. +
Return |result|. +
To serialize an Infra value to a JSON string, given a string, +boolean, number, null, list, or string-keyed map |value|: + +
Let |jsValue| be the result of + converting an Infra value to a JSON-compatible JavaScript value, given |value|. + +
Return ! [$Call$](%JSON.stringify%, undefined, « |jsValue| »). + +
Since no additional arguments are passed to %JSON.stringify%, the resulting + string will have no whitespace inserted. +
To serialize an Infra value to JSON bytes, given a string, +boolean, number, null, list, or string-keyed map |value|: + +
Let |string| be the result of serializing an Infra value to a JSON string, given + |value|. + +
Return the result of running UTF-8 encode on |string|. [[!ENCODING]] +
To convert an Infra value to a JSON-compatible JavaScript value, given |value|: + +
If |value| is a string, boolean, number, or null, then return |value|. + +
If |value| is a list, then: + +
Let |jsValue| be ! [$ArrayCreate$](0). + +
Let |i| be 0. + +
For each |listItem| of |value|: + +
Let |listItemJSValue| be the result of + converting an Infra value to a JSON-compatible JavaScript value, given |listItem|. + +
Perform ! [$CreateDataPropertyOrThrow$](|jsValue|, ! [$ToString$](|i|), + |listItemJSValue|). + +
Set |i| to |i| + 1. +
Return |jsValue|. +
Assert: |value| is a map. + +
Let |jsValue| be ! [$OrdinaryObjectCreate$](null). + +
For each |mapKey| → |mapValue| of |value|: + +
Assert: |mapKey| is a string. + +
Let |mapValueJSValue| be the result of + converting an Infra value to a JSON-compatible JavaScript value, given |mapValue|. + +
Perform ! [$CreateDataPropertyOrThrow$](|jsValue|, |mapKey|, |mapValueJSValue|). +
Return |jsValue|. +
Because it is rarely appropriate to manipulate JavaScript values directly in +specifications, prefer using serialize an Infra value to a JSON string or +serialize an Infra value to JSON bytes instead of using this algorithm. Please +file an issue to discuss your use case if +you believe you need to use convert an Infra value to a JSON-compatible JavaScript value. +
To forgiving-base64 encode given a byte sequence data, apply +the base64 algorithm defined in section 4 of RFC 4648 to data and return the result. +[[!RFC4648]] + +
This is named forgiving-base64 encode for symmetry with +forgiving-base64 decode, which is different from the RFC as it defines error handling for +certain inputs. + +
To forgiving-base64 decode given a string data, run these steps:
+ +Remove all ASCII whitespace from data. + + +
If data's code point length divides by 4 leaving no remainder, + then: + +
If data ends with one or two U+003D (=) code points, then remove them + from data. +
If data's code point length divides by 4 leaving a remainder of + 1, then return failure. + +
If data contains a code point that is not one of + +
then return failure. + +
Let output be an empty byte sequence. + +
Let buffer be an empty buffer that can have bits appended to it. + +
Let position be a position variable for data, initially + pointing at the start of data. + +
While position does not point past the end of data: + +
Find the code point pointed to by position in the second column of + Table 1: The Base 64 Alphabet of RFC 4648. Let n be the number given in the first cell + of the same row. [[!RFC4648]] + +
Append the six bits corresponding to n, most significant bit first, to + buffer. + +
If buffer has accumulated 24 bits, interpret them as three 8-bit big-endian + numbers. Append three bytes with values equal to those numbers to output, in the same + order, and then empty buffer. + +
Advance position by 1. +
If buffer is not empty, it contains either 12 or 18 bits. If it contains 12 bits, + then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it + contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian + numbers. Append the one or two bytes with values equal to those one or two numbers to + output, in the same order.
+ +The discarded bits mean that, for instance, "YQ
" and
+ "YR
" both return `a
`.
+
+
Return output. +
The HTML namespace is "http://www.w3.org/1999/xhtml
".
+
+
The MathML namespace is "http://www.w3.org/1998/Math/MathML
".
+
+
The SVG namespace is "http://www.w3.org/2000/svg
".
+
+
The XLink namespace is "http://www.w3.org/1999/xlink
".
+
+
The XML namespace is "http://www.w3.org/XML/1998/namespace
".
+
+
The XMLNS namespace is "http://www.w3.org/2000/xmlns/
".
+
+
+
Many thanks to +Addison Phillips, +Andreu Botella, +Aryeh Gregor, +Ben Kelly, +Chris Rebert, +Daniel Ehrenberg, +Dominic Farolino, +Gabriel Pivovarov, +Ian Hickson, +Jakob Ackermann, +Jake Archibald, +Jeff Hodges, +Jeffrey Yasskin, +Jungkee Song, +Leonid Vasilyev, +Maciej Stachowiak, +Malika Aubakirova, +Martin Thomson, +Michael™ Smith, +Mike West, +Mike Taylor, +Ms2ger, +Pavel "Al Arz" Kurochkin, +Philip Jägenstedt, +Rashaun "Snuggs" Stovall, +Sergey Shekyan, +Simon Pieters, +Tab Atkins, +Tobie Langel, +triple-underscore, +Wolf Lammen, +and Xue Fuqiao +for being awesome! + +
This standard is written by Anne van Kesteren +(Apple, annevk@annevk.nl) and +Domenic Denicola (Google, +d@domenic.me).