-
Notifications
You must be signed in to change notification settings - Fork 864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
range type #689
Comments
Can you give an example where putting a hypothetical range value into a TOML document would make more sense than just defining parameters of a range with the value types already defined in the spec? |
The proper point of reference is the question, why do so many languages (from Python to Ruby to Mathematica to Matlab) provide simple syntax for range construction? The answer is that it is convenient and expressive. As an example of usefulness, consider a simulation model where a TOML file is used to represent a collection simulation experiments. Each experiment is a table, and often a key-value pair in the table will specify a parameter and a range of values. This will be far easier to read in the TOML file if there is a simple syntax for ranges. Additionally, it provides direct guidance (e.g., to a Python parser) to construct the range rather than to construct some object that merely represents the parameters of a range. |
I think ranges can also be great as a shortcut for typical common arrays, they save typing, and add clarity to the intention, and remove typos for cases where you create the range by hand. In scenarios where TOML is used for configuring unit tests, or performance tests, I certainly see the benefit. Also, they might open the door for infinite sequences, if we were to consider syntax that allows unbounded ranges. However, this would pose a potentially heavy burden on implementers, as such a thing is only possible with lazy evaluation of said range. |
Why not use an inline table? E.g. for the simulation model sample: parameters = [
{ name="alpha", first=2, last=10, step=2 },
{ name="beta", first=1, last=100 }, # default step: 1
{ name="gamma", first=50, last=-50, step=-1 }
] Unbounded ranges are not a problem either: range = { first=15, step=3 } Or, if desired, you might specify the number of repetitions (different values) instead of an upper/last value: range = { first=4, repetitions=16, step=4 } # run tests for 4, 8, ..., 60, 64 This use case is too specialized and rare to deserve new syntax (remember what the "M" stands for?), but TOML can easily accomplish it already. |
Hi Christian. Your proposed solution was anticipated in my original post (above). It is not parsed to produce a range of values, which is desirable. Instead, it is parsed to produce an object that can be converted to a range of values. A key feature of the TOML spec is its insistence on useful type inference (despite the "M"). And please remember the "O". The need for ranges is neither specialized nor rare, even if you not need them often. That is why they have been requested in other settings (e.g., YAML, JSON), and that is why they are implemented in MANY programming languages. (I listed some examples above.) |
Basically every I don't think the point is that it is currently impossible. The point is to have a simple, clear, unambiguous way of expressing ranges that is portable. As hoc syntax never is. I personally prefer the |
I don't think we need this -- the provided functionality is not compelling enough, to justify the complexity this brings in the syntax + mental model. "YAML has it" is very much not a good reason to add syntax to TOML. Can someone please point out a real world use case where this is a problem? The premise of this issue seems very hypothetical. |
|
Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.
|
Yes, it's directly understandable for a reader of the configuration. Much less obvious how to type it, or what values are valid:
And herewith lies the problem: each and every application that supports TOML and needs a range, has to fully specify how it deals with all of these situations. Just like with other features that are not necessarily used by everyone (nested arrays, I can't get the support engineers to understand them, but that's also true for the json-like syntax: TOML is certainly not for the average user), it is better to specify once and be clear about it, than let each and every configuration define it for themselves. Even if only 10% is going to use it, it even if it's only useful in a subset of situations, this is true for most features of TOML, rarely will you see config files that use everything. Imo, that shouldn't be the leading argument. Likewise, I can understand the hesitancy, in that you don't just want to extend the syntax on everyone's whim. Personally, I don't think this is a whim, and had wide spread usage in both present and past languages and configuration files. Let's do it right, and help users and designers with a clear addition to the syntax, ready if they need it, ignorable if they don't. PS: for implementors, I think this is a very trivial thing to add. |
This observation is orthogonal to the point. The point is simplicity and expressiveness.
This claim is incorrect. Only a programmer would say such a thing, and even then only a programmer who assumes additional context (i.e., this conversation). Arithmetic sequences using dots are introduced in grade school. The notation is notation exactly the same, but it is close. This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative. I won't say more because Abel has said it much better than I could. |
Numeric sequences are introduced in school, the notation is like
This is not a point at all. Configuration files should be handy for those who read and write them by hand. Shiny parser API cannot be an excuse for increase of amount of syntax features that user must learn. |
Like @lmna said earlier: programming languages have tons of stuff which TOML neither has nor needs, since it's not a programming language. More relevant to the issue at hand would be whether other commonly used data serialization or configuration file formats have a built-in syntax for range types. As far as I can tell, that's not the case. Not even YAML (whose M could well mean "Maximal") seems to support it. I don't doubt that this feature has been "requested" from time to time, but the fact that these requests have apparently all been rejected should tell us something. As for obviousness: In Ruby, |
I agree, so instead of requiring users to learn the individual specifications of each and every usage of TOML, let's give both readers and writers something they can work with and that's easy to understand and easy to write. Learn once, apply everywhere.
Not at all, it's good to learn from other's mistakes, and precisely the reason why we should keep it simple and explicit. One syntax, with an obvious meaning. |
Yes. That is precisely the point. I am very confident that if the syntax |
Meaning of
Is it worth it to describe it all in the TOML spec? Will users be truly happy and enthusiastic about reading and remembering all that stuff? Will it be obvious for those who dont bother to even read the spec? |
Certainly more obvious than local time, arrays of tables, or dot notation for supertable generation. |
I do disagree with "learn" part. In the ideal world, you should learn a lot about a program that you are writing configuration for, but the syntax of configuration file should require no learning at all. I see this as an ultimate goal for evolution of TOML. In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further. |
TOML is not yet at 1.0. Will you propose to remove arrays of tables before the 1.0 release? Why or why not? How about local time notation? Keep or discard? And why? |
This is a great question. Here is a possible notation for that: |
Official goal for version 1.0.0 is to be backwards compatible (as much as humanly possible) with version 0.5.0. So removal of existing syntax is not an option any more.
This could be done for 2.0, if someone comes up with an exellent alternative to current arrays-of-tables.
The whole date&time thing, not only the "local" aspect, was a very controversial feature. I believe that first-class date&time is not worth its complexity. |
If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use. Yes, this is always the correct criteria. (Just fyi, I am pleased to have date-time functionality, although I wish times required a clarifying T prefix.) |
Complexity of arrays-of-tables is justified by expressive power. An alternative should reduce the complexity (make things more obvious & trivial), but not at cost of readability and expressiveness. Important thing to note is that first-class date&time and first-class ranges do not add anything to readability and expressiveness. You can encode them as TOML strings and then interprete those strings at application level (just like you interprete any other configuration parameter). No sacrifices here. |
This claim is obviously incorrect. Prove it to yourself by typing out any long range without ever checking to see if you made an error. A good syntax for ranges add readability, expressiveness, and ease of use. (Which is exactly why this exists in so many programming languages.) Of course if I just want to parse everything myself, I could use an INI parser and handle the string values. A key piece of the value added by TOML is elimination of this need in config files. |
Okay, lets do it once again. |
So in fact the meaning is not obvious at all to a reader who is not in this conversation. You are simply making the point that there are available workarounds, although without any supporting standard. Yes, we all know that. That's what we're doing now. The request is for something less tiresome and more communicative. |
Just a quick reminder: it is NOT the case that the party with the highest number of comments wins 😉 |
Even more explicit would be |
Frankly? No. Just suggest a syntax that would work for you, instead of pontificating and complaining about what doesn't/can't. I was trying to help you - I think a range syntax would be useful - but... ugh. Good luck, I guess. |
@marzer I'm again confused; you asked for an actual example of current usage, which I provided. I also included a syntax that would work for me. It is the same one discussed multiple times above. In response to your question, I mentioned the syntax Just to be clear, the syntax you suggested ( Whatever the team decides is most suitable will be perfectly fine with me. I care about the functionality much more than the syntax. |
@marzer So which of the syntaxes that I've just mentioned would you choose? |
@alan-isaac My impression is that you're not just hoping for a range type in TOML – which would conceptually, regardless of the syntax chosen, encode a triple of the form: range(start at x, stop at y, proceed in steps of z) – but you're also expecting TOML to evaluate the range for you. So instead of, say, range(start at 1, stop at 10, proceed in steps of 3) you're hoping to get the array [1, 4, 7, 10]. Is that correct? |
@ChristianSi, I've always thought that was the main aim of this thread. Otherwise, it's essentially the same as using a json style object (apart from the advantage of an non ambiguous syntax). There have been questions of 'how do you do it now' and how it would change. The answers in this same thread coming down to: you can't do it now, so there's no example. Well, here's how I do it currently.
Obviously, there are other ways of achieving the same effect, but at the time, this seemed simplest. I looked at some existing parsers to amend them for this purpose, until I stumbled upon this thread. So I waited, in case an a agreement could be reached. I guess implementations could choose to statically expand into an array, or could choose to give an enumerator, or both, depending on their interface. But that's true already for the existing syntax of arrays, though an enumerator may be more applicable in some scenarios. But that's of course an implementation detail, irrelevant for TOML itself. |
tl;dr: Yes. The answer by @abelbraaksma nicely captures the core issue. The job of the TOML spec is just to provide an unambiguous meaning to the syntax, not to determine the parser implementation details. (Although, recommendations could be made, course.) For example, for TOML tables, the popular C parser for TOML naturally produces a struct rather than a hash table. The important thing is that I can send a file to a C user or a Python user and just say "use a TOML parser to extract the configuration of this experiment". The goal is simply to have an obvious syntax that unambiguously indicates that a range of values is produced by a TOML parser, not to constrain how a particular parser might produce that (e.g., as a list, a tuple, an array, or a range object). In fact, Abel's examples have persuaded me (against my original thought) that the type of syntax he describes would be most useful to others (even though I just (!) need ranges). The printer configuration example is what really persuaded me. To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced |
I see, but let's be honest: that will never happen, since, as pointed out much earlier in this tread, TOML is not a programming language. A TOML parser will parse date strings into date objects and number strings into numbers, but it will never evaluate stuff like "10 days after 2019-12-23" (regardless of the syntax used). I even doubt that stuff like On the plus side, you might be able to solve your problem be sending the file through a template engine before parsing it as TOML. |
@ChristianSi Are you offering a false dichotomy? I doubt that you can come up with any coherent way of distinguishing parsing |
@alan-isaac I'll easily parse |
@ChristianSi As I also said in my previous comment, I personally just need ranges. If you are saying that you would be happy to have a range synatx 0..100 that say the Python or Ruby parser would parse to a range object, then yes please! That would be extremely helpful! The rest of the discussion appears separate to me. In that separate discussion, I still believe you are drawing an untenable line between parsing and transforming, as illustrated most nicely by the date-time type. TOML has lots of syntax that provides convenient ways to say what values should be produced by a TOML parser. Indeed, this is a key feature of TOML over INI (where standard parsers produce only strings as keys and values). So I still invite you to try to make concrete the reasoning for rejecting say the printer-configuration syntax, with date-time parsing being the point of reference for the purposes of the discussion. But again, I would be be delighted by the addition of a simple range syntax, and the Haskell influenced double-dot notation would be great. |
Perhaps because often in this discussion we refer to how things are done in programming languages, we lose sight of a key thing: a concise way of expressing arrays, that are a natural sequence of numbers. Just like a time-span is a range of time. Having such expression doesn't diminish the declarative nature of TOML, in my opinion, not does it magically turn it into a programming language, far from it. It's basically just a different way of writing the same thing, but clearer than reading or writing, say, 100 numbers. The confusion comes perhaps from the idea that ranges are often used in loops in programming languages, but the proposal here does not intent to apply the range result. It is not a branch or loop instruction. To summarize: foo = [1,2,3,4,5,6,7,8,9,10] Is exactly equal to (assuming one variant of the syntax): foo = [1..10] The main difference being esthetics, number of keystrokes, being prone or not to errors, clarity of intent, and readability. |
I read the following issue and I'd want to add my 2 cents. As a toml user, if we can say that, I really like toml because it does what it claim to do and does it very well. Being so minimal allows it to be able to be used as a drop in replacement for json/ini/... without much issues. From my point of view, having a range/slice type is a bit similar to having a datetime type. The datetime type is a pain to implement because we live in a world with timezones, offset and dst. There are so many ways to represent a date and as many ways to do it wrong that can end up very badly. That said, this limitation of JSON doesn't prevent developers to use json. Special types can be handled as substructure of a json object. The same thing can be done in toml and as for the range/slice type. Saying that you want to have range being understood in any language is a nice thing but a file format contains data and how the data is read / interpreted by an application is a whole different thing. So taking this example:
There are at least 2 possibilities to interpret this:
If you generate a list from 1 to 10, you open toml with side effects like a file having a wrong range definition that would expand to a few terabytes ram being used. That's not really nice... Also it would make it a bit difficult for storage. So a range should be a dumb object that can be used to create a generator like in python3. In order to create a list from 1 to 10 in python3 you'd have to create a range for In python3, I could handle the range issue with this format for
But yes, if you need to use it in an other language, you'll have to know what which parameter should be so if 1 and 11 are inclusive or not but as you define the format of your file, it should be in the file format (application level) not file format (transport level). But nothing prevents you from having helper methods like:
This way you can have consistent way to parse substructures in toml which are specific to your application. Having it part of the file format, could be useful but it's so trivial to implement that I wonder if it's worth the hassle to have it being part of the language as it also comes with incompatibilities and sacrifices. For example Rust doesn't seem to support a step like python3, in ruby you can include or exclude the last element, in C#, the range is a start:count parameters so 2,10 would yield [2,3...10,11] , so it means we can't have negative values for the second parameter. Java support for range seems to exists in many form but I couldn't find one supposed to be used to generate a list as an interator so it might be implementation specific like javascript. On other thing is that choice design on how to implement the range could be influenced by the use, is is threadsafe, can it be used as an async iterator? I think it should be left to the user to implement it for their application. Saying that you should be able to read file between languages is barely an argument as if I open a file x.json in one python app, an other ruby app won't comprehend what to do with the file even if json is correctly supported by ruby. So even if toml supported ranges, it wouldn't make all apps magically understand your file, you'd still have to implement your app to expect a range or to parse a range. Handling the conversion in your app is putting the maintenance burden on yourself and having it inside toml is putting the burden on every implementer of the toml format. It would suck to be unable to open a file on a certain language because implementer couldn't decide how to implement ranges in a way it pleases everyone. When you can just put an array or a object and call it a day. |
@llacroix I suggest that the most basic question is simpler than you appear to believe. The question is simply whether TOML files should have a syntax to more simply describe a certain simple and common type of array, which would otherwise have to be typed out explicitly. For this basic question, your most powerful argument is that the syntax is so flexible that a TOML file might cause a parser to generate a dismayingly large array. This will not affect parsers that choose to return a range object of some type, rather than explicitly constructing the array. Worrying about such parser decisions is like worrying about whether the current Python parser should return a list, a tuple, or an array. I think it is out of scope? The need for this is not for sharing across instances of a single application but for sharing certain kinds of configurations across diverse applications. A good point of reference is thinking about how TOML files could be used to configure print jobs by providing an array of page numbers. The configuration should be parsed to the sequence of page numbers, not to an object that could be turned by an app here or an app there by not all printer apps into this sequence. But again, the question is just one of convenient syntax. Just as you do not insist that TOML users represent number literals with hex notation, why should they not more simply represent certain common and simple array literals? |
Because In other words, range is the functional version of
It's not out of scope as if the parser can return different types, it would make it difficult or even impossible to load certain file in some languages as range != array. Let say you have a file that looks like this:
In this case,
To be able to do something like this:
But lets put aside the memory limit and OOM killer issue and let say we want to explode range and join list together to have this With that file:
Let's imagine you do that:
If those things are loaded as array literals, they're going to be saved as array of int. Thought the data didn't really change but you start from a file with a few bytes to a couple thousands of bytes just because expansion loose the data it stores because it can't know what was the range previously stored in the file if it expand it. And that would be difficult to handle correctly because if you can expand it, you're either breaking the ram (you're certainly going to go out of memory) or you need to keep it as a range and handle it as completely different type as they're not array literals but in that case you're getting hit by platform limitations in implementation specific ways. Also it's not very typical to manipulate ranges directly in code. For example, I don't see why in code I'd build something like this:
The only way I see how it could make sense is if you received an input text as toml to be parsed to
It would always store a list of int, and your output file would always have the exploded version of an array literal. So in order for a software to output ranges you'd have to manually do something like this
How is that easier to write than:
Programming wise, it's not particularly different, a range is really just a start stop and possibly increment. |
|
Similar to #428 ? |
I think this is the case where a good feature might be missed due to some partisan entrenchment that has bubbled up over the course of the discussion. As someone with no horse in this race, I think that a range should only be added if it is truly a datatype. That is, the TOML parser should not evaluate the range and provide an array of numbers. It should provide a range object, which the implementation would then know how to deal with. If this was to be the case, the discussion reduces to whether TOML should provide a clearly defined range type, much like it provides a datetime type. Assuming that range is now a type that is not evaluated by TOML, but merely parsed, this also requires the implementation to recognize that in almost all cases the field could be EITHER a range or an array of numbers. This likely wouldn't be much of an issue, but does add a bit of complexity to the language. I think it then also becomes important to distinguish between bounded and unbounded ranges. These would be two separate types, ideally. It is important that an application can know whether a range is bound or unbounded, since in many cases an unbounded range might not be appropriate and lead to a non-terminating execution. |
@JeppeKlitgaard Bounded ranges would be great. Personally, I have no need for unbounded ranges or non-integer ranges. |
I've just read through the whole discussion for a third time. I don't think a range type is particularly useful to add on its own -- it's just not as common -- and I don't think that adding a mechanism to have a shorthand for I still think this is best solved on a per-application basis, since this is a fairly niche concern and the inline table syntax is totally up to the task at hand here. |
Thanks for the surprisingly passionate discussion on this folks, as well as for your patience on this! :) |
This is a perfunctory assessment which seems to be no more than the following: "I don't need this feature, so without further evidence I'm going to call it 'niche' and close this discussion without seriously addressing any of the issues that have been raised." Disappointing, to say the least. I try to imagine someone offering a similar case against ranges in any of the many languages where they are a core language feature. Please reopen this issue for a more serious and objective consideration. |
It's been discussed extensively, and the overall concuss seems fairly clear to me. This is not necessarily a democratic vote, but it wasn't "just closed" on the whims of one person. As far as I'm concerned it's up to you to demonstrate that there is a demand for this feature, not for the TOML maintainers to demonstrate there is no demand for it. And with "demonstrate that there is a demand" I don't mean "it might be useful in this hypothetical scenario" but "here are a bunch of popular projects with TOML configurations that would be made better by this", and similar more concrete stuff. Remember: almost every single feature that has every been added to any configuration file, programming language, or other piece of software was useful to someone, at some point. I don't find "it's useful in scenario X" on its own to be a very good argument in these types of discussions, as it can be used for everything. Personally it seems to me the need for such a feature is too rare; I thought about this for a few minutes and the only use-case I can think of is Vim's |
@arp242 Your comments suggest that you have not understood the discussion. They also weirdly suggest that the only use of TOML you can imagine is for project configuration files. I'm not going to rehash the whole discussion for you, but note that when you say "it can be worked around quite easily" you completely overlook that TOML does not include any facilities for communicating semantics (e.g., a grammar language, like JSON schema). This issue is already raised above, in detail. Just imagine if someone proposed that TOML tables could "quite easily" be replaced with strings or nested lists of strings. These comments signal only a desire to close this issue for lack of progress, not a desire to actually come to grips with it. |
I said what's needed to move things forward as far as I'm concerned: show popular projects with TOML configurations that would be made better by this. So if you really want this badly I suggest working on that.
That is its goal. |
@arp242 Here is the actual goal statement: This statement encompasses the configuration of simulations. |
@alan-isaac I appreciate that you feel strongly about this, and also understand that you're disappointed about the answer/responses here (from the broader group of individuals who've contributed to the discussion here as well as myself). I appreciate and empathize that this would be beneficial for your use case, if this were added. However, I don't think the arguments made here are compelling and I certainly disagree that this is as broadly applicable as has been claimed in this thread at various points. Sure, programming languages do have range syntaxes or mechanisms to generate ranges. However, TOML is not a programming language. I'm not aware of any major configuration language that has ranges. Even YAML doesn't have them and YAML somewhat famously has too-many-things.
A broadly-applicable programming language? Oh, I'd be very opposed to not having ranges in a programming language. However, TOML is not a programming language. |
@pradyunsg Thanks for your reply. You are correct that I am disappointed, but the reason is different. I am disappointed when those who reply have made little effort to understand the use case, which resulted in the justification for closing this issue being completely off base. This lack of understanding was a problem encountered early in this thread and eventually somewhat resolved. Misunderstanding of the use case and mischaracterization of the request leads to proposals of supposedly easy workarounds that entirely miss the point. Your "TOML is not a programming language" critique, also found above, again misses the point. Nobody has proposed adding say control flow. The only proposal is the addition of a range literal. So far, the primary critique that has merit is the claim that "there is no obvious syntax". However, the closed integer interval notation from mathematics (e.g., |
I'm new to TOML and really liking it. The one thing I'd really find helpful is a range type, which implementations could interpret either as a range object (e.g., Python) or as an explicit array, depending on the language. I anticipate "just use a 3-array" or "just provide start, stop, and step attributes" as responses, but if you search you'll find that YAML and JSON users also request ranges from time to time. So I think there is a desirable feature here. I'm not going to suggest syntax but array syntax without commas
[0 10 1]
or doubled periods0..10..1
or even Mathematica span style0;;10;;1
pop into mind.The text was updated successfully, but these errors were encountered: