Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

range type #689

Closed
alan-isaac opened this issue Dec 11, 2019 · 67 comments
Closed

range type #689

alan-isaac opened this issue Dec 11, 2019 · 67 comments

Comments

@alan-isaac
Copy link

I'm new to TOML and really liking it. The one thing I'd really find helpful is a range type, which implementations could interpret either as a range object (e.g., Python) or as an explicit array, depending on the language. I anticipate "just use a 3-array" or "just provide start, stop, and step attributes" as responses, but if you search you'll find that YAML and JSON users also request ranges from time to time. So I think there is a desirable feature here. I'm not going to suggest syntax but array syntax without commas [0 10 1] or doubled periods 0..10..1 or even Mathematica span style 0;;10;;1 pop into mind.

@eksortso
Copy link
Contributor

Can you give an example where putting a hypothetical range value into a TOML document would make more sense than just defining parameters of a range with the value types already defined in the spec?

@alan-isaac
Copy link
Author

The proper point of reference is the question, why do so many languages (from Python to Ruby to Mathematica to Matlab) provide simple syntax for range construction? The answer is that it is convenient and expressive.

As an example of usefulness, consider a simulation model where a TOML file is used to represent a collection simulation experiments. Each experiment is a table, and often a key-value pair in the table will specify a parameter and a range of values. This will be far easier to read in the TOML file if there is a simple syntax for ranges. Additionally, it provides direct guidance (e.g., to a Python parser) to construct the range rather than to construct some object that merely represents the parameters of a range.

@abelbraaksma
Copy link
Contributor

I think ranges can also be great as a shortcut for typical common arrays, they save typing, and add clarity to the intention, and remove typos for cases where you create the range by hand.

In scenarios where TOML is used for configuring unit tests, or performance tests, I certainly see the benefit.

Also, they might open the door for infinite sequences, if we were to consider syntax that allows unbounded ranges. However, this would pose a potentially heavy burden on implementers, as such a thing is only possible with lazy evaluation of said range.

@ChristianSi
Copy link
Contributor

Why not use an inline table? E.g. for the simulation model sample:

parameters = [
  { name="alpha", first=2, last=10, step=2 },
  { name="beta", first=1, last=100 },  # default step: 1
  { name="gamma", first=50, last=-50, step=-1 }
]

Unbounded ranges are not a problem either:

range = { first=15, step=3 }

Or, if desired, you might specify the number of repetitions (different values) instead of an upper/last value:

range = { first=4, repetitions=16, step=4 }  # run tests for 4, 8, ..., 60, 64

This use case is too specialized and rare to deserve new syntax (remember what the "M" stands for?), but TOML can easily accomplish it already.

@alan-isaac
Copy link
Author

alan-isaac commented Dec 11, 2019

Hi Christian. Your proposed solution was anticipated in my original post (above). It is not parsed to produce a range of values, which is desirable. Instead, it is parsed to produce an object that can be converted to a range of values. A key feature of the TOML spec is its insistence on useful type inference (despite the "M"). And please remember the "O".

The need for ranges is neither specialized nor rare, even if you not need them often. That is why they have been requested in other settings (e.g., YAML, JSON), and that is why they are implemented in MANY programming languages. (I listed some examples above.)

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Dec 11, 2019

Basically every for.. in... loop, many while loops, for i=x to y loops etc are inherently ranges. So I wouldn't call it 'rare'. In addition, languages like Java, C#, F#, PHP, Perl, Python, and even XPath all have specific syntax for ranges for arrays, linked lists and/or sequences, splicing and steps in ranges. Just to say that these things wouldn't be so abundant if it was 'rare'. ;).

I don't think the point is that it is currently impossible. The point is to have a simple, clear, unambiguous way of expressing ranges that is portable. As hoc syntax never is. I personally prefer the .. syntax, as it is clear to the casual reader, even without a programmer's background.

@pradyunsg
Copy link
Member

pradyunsg commented Dec 12, 2019

I don't think we need this -- the provided functionality is not compelling enough, to justify the complexity this brings in the syntax + mental model. "YAML has it" is very much not a good reason to add syntax to TOML.

Can someone please point out a real world use case where this is a problem? The premise of this issue seems very hypothetical.

@alan-isaac
Copy link
Author

alan-isaac commented Dec 12, 2019

  1. I think that a burden falls on those who say things like "the provided functionality is not compelling enough" no to rely on personal habits but to consider why so many languages have found it compelling to provide a special syntax for ranges. (Related: see Abel's comments.)
  2. I've looked around a bit and think that haskell's notation is simple and obvious (i.e., easy to understand). As Abel emphasizes, obviousness (the "O" in TOML) is a compelling consideration here. In haskell notation, the user provides the first two terms of the sequence and an upper limit. So a sequence from 1 to 9 by 2s becomes [1,3..9]. I think this looks good for TOML because it resembles array syntax but nevertheless parses without ambiguity.
  3. As for real-world use cases, these arise whenever value ranges are needed. Abel mentioned unit testing. My example of simulation modeling is not at all hypothetical: TOML is now in use for the specification of simulation models. And again, range notation is much more obvious to a human reader than an actual list of sequence terms or the kinds of indirect workarounds described by Christian.

@lmna
Copy link

lmna commented Dec 12, 2019

many languages have found it compelling to provide a special syntax for ranges

Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.

range notation is much more obvious to a human reader than an actual list of sequence terms or the kinds of indirect workarounds described by Christian

range = { first=4, repetitions=16, step=4 } <-- This one is instantly understandable because it is explicit (kinda self-documented).

[1,3..9] <-- This one is cryptic because average human is not used to this exact notation.

@abelbraaksma
Copy link
Contributor

range = { first=4, repetitions=16, step=4 }

Yes, it's directly understandable for a reader of the configuration. Much less obvious how to type it, or what values are valid:

  • can I leave something out?
  • is it case sensitive? Note that 'mere mortals' often assume case insensitivity, without knowing it exists
  • are decimals allowed?
  • what happens with negative steps, or negative other values?
  • is the range inclusive?
  • do you start at the beginning, or is the first step added?
  • what happens if the step doesn't end exactly on the range end, is the last step included, or not?
  • what with these commas and curlies, is that necessary?
  • can I use this syntax on that other app, does it understand it, or do I need to learn new syntax?

And herewith lies the problem: each and every application that supports TOML and needs a range, has to fully specify how it deals with all of these situations.

Just like with other features that are not necessarily used by everyone (nested arrays, I can't get the support engineers to understand them, but that's also true for the json-like syntax: TOML is certainly not for the average user), it is better to specify once and be clear about it, than let each and every configuration define it for themselves.

Even if only 10% is going to use it, it even if it's only useful in a subset of situations, this is true for most features of TOML, rarely will you see config files that use everything. Imo, that shouldn't be the leading argument.

Likewise, I can understand the hesitancy, in that you don't just want to extend the syntax on everyone's whim. Personally, I don't think this is a whim, and had wide spread usage in both present and past languages and configuration files. Let's do it right, and help users and designers with a clear addition to the syntax, ready if they need it, ignorable if they don't.

PS: for implementors, I think this is a very trivial thing to add.

@alan-isaac
Copy link
Author

Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.

This observation is orthogonal to the point. The point is simplicity and expressiveness.

range = { first=4, repetitions=16, step=4 } <-- This one is instantly understandable because it is explicit (kinda self-documented).

[1,3..9] <-- This one is cryptic because average human is not used to this exact notation.

This claim is incorrect. Only a programmer would say such a thing, and even then only a programmer who assumes additional context (i.e., this conversation). Arithmetic sequences using dots are introduced in grade school. The notation is notation exactly the same, but it is close. This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative.

I won't say more because Abel has said it much better than I could.

@lmna
Copy link

lmna commented Dec 12, 2019

Arithmetic sequences using dots are introduced in grade school.

Numeric sequences are introduced in school, the notation is like (a1, a2, ..., aN, ...), and the semantics does not by any means imply arithmetic progression. For instance, (1, 3, 9) can describe first terms of geometric progression, or just some arbitrary sequence. Semantics of well-known school notation is pretty far from what you suggest.

This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative.

This is not a point at all. Configuration files should be handy for those who read and write them by hand. Shiny parser API cannot be an excuse for increase of amount of syntax features that user must learn.

@ChristianSi
Copy link
Contributor

Like @lmna said earlier: programming languages have tons of stuff which TOML neither has nor needs, since it's not a programming language. More relevant to the issue at hand would be whether other commonly used data serialization or configuration file formats have a built-in syntax for range types. As far as I can tell, that's not the case. Not even YAML (whose M could well mean "Maximal") seems to support it.

I don't doubt that this feature has been "requested" from time to time, but the fact that these requests have apparently all been rejected should tell us something.

As for obviousness: In Ruby, 1..10 creates an inclusive range (from 1 to 10), while 1...10 creates an exclusive range (actually from 1 to 9). That's obvious? Really?

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Dec 12, 2019

cannot be an excuse for increase of amount of syntax features that user must learn.

I agree, so instead of requiring users to learn the individual specifications of each and every usage of TOML, let's give both readers and writers something they can work with and that's easy to understand and easy to write. Learn once, apply everywhere.

That's obvious? Really?

Not at all, it's good to learn from other's mistakes, and precisely the reason why we should keep it simple and explicit. One syntax, with an obvious meaning.

@alan-isaac
Copy link
Author

Configuration files should be handy for those who read and write them by hand.

Yes. That is precisely the point.

I am very confident that if the syntax [first,next..max], nobody will ever complain that is is hard to read or write. I am also very confident that not a single person will ever complain about writing or reading [1,2..20] instead of [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] -- during the typing of which I head to correct two errors and then double check that there were no others.

@lmna
Copy link

lmna commented Dec 12, 2019

One syntax, with an obvious meaning.

Meaning of [1,3..9]syntax (well, if you manage to guess that the whole construct is about arithmetic progression) is not really obvious because of the following questions:

  • can I leave something out?
  • are decimals allowed?
  • what happens with negative values?
  • is the range inclusive?
  • what happens if the step doesn't end exactly on the range end, is the last step included, or not?
  • what with these commas and brackets and dots, is that necessary?
  • does number of dots really matter?
  • is it allowed to specify more than 3 values?
  • how do i use this feature to configure a print job for pages 3, 7, 12-15, 21-24?

Is it worth it to describe it all in the TOML spec? Will users be truly happy and enthusiastic about reading and remembering all that stuff? Will it be obvious for those who dont bother to even read the spec?

@alan-isaac
Copy link
Author

Will it be obvious for those who dont bother to even read the spec?

Certainly more obvious than local time, arrays of tables, or dot notation for supertable generation.

@lmna
Copy link

lmna commented Dec 12, 2019

Learn once, apply everywhere.

I do disagree with "learn" part.

In the ideal world, you should learn a lot about a program that you are writing configuration for, but the syntax of configuration file should require no learning at all. I see this as an ultimate goal for evolution of TOML.

In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further.

@alan-isaac
Copy link
Author

In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further.

TOML is not yet at 1.0. Will you propose to remove arrays of tables before the 1.0 release? Why or why not? How about local time notation? Keep or discard? And why?

@alan-isaac
Copy link
Author

alan-isaac commented Dec 12, 2019

how do i use this feature to configure a print job for pages 3, 7, 12-15, 21-24

This is a great question. Here is a possible notation for that: [3, 7, 12-15, 21-24]. What if you want only every other page in the first range? Then [3, 7, 12-15 by 2, 21-24]. What about the example you are discussing? It becomes [1-9 by 2]. I would have no problem with such proposals.

@lmna
Copy link

lmna commented Dec 12, 2019

TOML is not yet at 1.0. Will you propose to remove

Official goal for version 1.0.0 is to be backwards compatible (as much as humanly possible) with version 0.5.0. So removal of existing syntax is not an option any more.

Will you propose to remove arrays of tables before the 1.0 release?

This could be done for 2.0, if someone comes up with an exellent alternative to current arrays-of-tables.

How about local time notation?

The whole date&time thing, not only the "local" aspect, was a very controversial feature. I believe that first-class date&time is not worth its complexity.

@alan-isaac
Copy link
Author

if someone comes up with an exellent alternative to current arrays-of-tables

If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use. Yes, this is always the correct criteria. (Just fyi, I am pleased to have date-time functionality, although I wish times required a clarifying T prefix.)

@lmna
Copy link

lmna commented Dec 12, 2019

If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use.

Complexity of arrays-of-tables is justified by expressive power. An alternative should reduce the complexity (make things more obvious & trivial), but not at cost of readability and expressiveness.

Important thing to note is that first-class date&time and first-class ranges do not add anything to readability and expressiveness. You can encode them as TOML strings and then interprete those strings at application level (just like you interprete any other configuration parameter). No sacrifices here.

@alan-isaac
Copy link
Author

first-class ranges do not add anything to readability and expressiveness

This claim is obviously incorrect. Prove it to yourself by typing out any long range without ever checking to see if you made an error. A good syntax for ranges add readability, expressiveness, and ease of use. (Which is exactly why this exists in so many programming languages.)

Of course if I just want to parse everything myself, I could use an INI parser and handle the string values. A key piece of the value added by TOML is elimination of this need in config files.

@lmna
Copy link

lmna commented Dec 12, 2019

Prove it to yourself by typing out any long range without ever checking to see if you made an error.

Okay, lets do it once again. range = { first=4, repetitions=16, step=4 } Hope, it is long enough?

@alan-isaac
Copy link
Author

alan-isaac commented Dec 12, 2019

lets do it once again

  1. You can only easily interpret the meaning of this because you are in this conversation. So, it lacks clear meaning to a reader. (This is a really important point that you are skipping over repeatedly.) This is especially true when readers need not be programmers.
  2. It is not standardized. You simply made up the keys to help you know what on earth you are talking about, which even so would not be evident if you were not in this conversation.
  3. It is parsed to an object that must be converted to an array by a knowledgeable user. So it has reduced functionality.

So in fact the meaning is not obvious at all to a reader who is not in this conversation. You are simply making the point that there are available workarounds, although without any supporting standard. Yes, we all know that. That's what we're doing now. The request is for something less tiresome and more communicative.

@ChristianSi
Copy link
Contributor

Just a quick reminder: it is NOT the case that the party with the highest number of comments wins 😉

@alan-isaac
Copy link
Author

What about the example you are discussing? It becomes [1-9 by 2]. I would have no problem with such proposals.

Even more explicit would be [1-9 by +2].

@marzer
Copy link
Contributor

marzer commented Dec 18, 2019

Am I responding to the question now?

Frankly? No. Just suggest a syntax that would work for you, instead of pontificating and complaining about what doesn't/can't.

I was trying to help you - I think a range syntax would be useful - but... ugh. Good luck, I guess.

@alan-isaac
Copy link
Author

alan-isaac commented Dec 18, 2019

@marzer I'm again confused; you asked for an actual example of current usage, which I provided. I also included a syntax that would work for me. It is the same one discussed multiple times above. In response to your question, I mentioned the syntax [0-1000 by +10] because that (or some variant appeared to have some support, particularly since it is tied to printer configuration syntax. My own preference is [0,10..1000], taken straight from Haskell, which I also mentioned above, but there were some objections to that (i.e., claims it was not "obvious" enough). Nobody has claimed the printer configuration syntax is not "obvious", and nobody has claimed it would be hard to parse.

Just to be clear, the syntax you suggested (0-1000;10) would also work just fine for me. But I anticipate objections that it is not obvious enough. Also, if I understood correctly, Abel proposed [0-1000 +10]. This would also be just fine. So would Scala syntax: (0 to 1000 by 10).

Whatever the team decides is most suitable will be perfectly fine with me. I care about the functionality much more than the syntax.

@alan-isaac
Copy link
Author

@marzer So which of the syntaxes that I've just mentioned would you choose?

@ChristianSi
Copy link
Contributor

@alan-isaac My impression is that you're not just hoping for a range type in TOML – which would conceptually, regardless of the syntax chosen, encode a triple of the form: range(start at x, stop at y, proceed in steps of z) – but you're also expecting TOML to evaluate the range for you. So instead of, say, range(start at 1, stop at 10, proceed in steps of 3) you're hoping to get the array [1, 4, 7, 10]. Is that correct?

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Dec 19, 2019

@ChristianSi, I've always thought that was the main aim of this thread. Otherwise, it's essentially the same as using a json style object (apart from the advantage of an non ambiguous syntax).

There have been questions of 'how do you do it now' and how it would change. The answers in this same thread coming down to: you can't do it now, so there's no example.

Well, here's how I do it currently.

  • I have a project with thousands of small performance tests, they're logically numbered for ease of calling
  • when I'm working on a particular area, instead of running all (which takes long), I want to run a subset
  • for that, I created illegal syntax, much like [42-120, 1200-1800].
  • thanks to the nature of TOML, I have now different sections in the config for different types of test runs, the combination TOML + ranges feels very natural when using it, and simple to write without errors
  • I pre-process this TOML file and simply expand the ranges to be [42,43,44,45...],you get the idea. This makes a valid TOML file.
  • I then process the expanded TOML as normal

Obviously, there are other ways of achieving the same effect, but at the time, this seemed simplest. I looked at some existing parsers to amend them for this purpose, until I stumbled upon this thread.

So I waited, in case an a agreement could be reached.

I guess implementations could choose to statically expand into an array, or could choose to give an enumerator, or both, depending on their interface. But that's true already for the existing syntax of arrays, though an enumerator may be more applicable in some scenarios. But that's of course an implementation detail, irrelevant for TOML itself.

@alan-isaac
Copy link
Author

alan-isaac commented Dec 19, 2019

@ChristianSi

tl;dr: Yes.

The answer by @abelbraaksma nicely captures the core issue. The job of the TOML spec is just to provide an unambiguous meaning to the syntax, not to determine the parser implementation details. (Although, recommendations could be made, course.) For example, for TOML tables, the popular C parser for TOML naturally produces a struct rather than a hash table. The important thing is that I can send a file to a C user or a Python user and just say "use a TOML parser to extract the configuration of this experiment".

The goal is simply to have an obvious syntax that unambiguously indicates that a range of values is produced by a TOML parser, not to constrain how a particular parser might produce that (e.g., as a list, a tuple, an array, or a range object). In fact, Abel's examples have persuaded me (against my original thought) that the type of syntax he describes would be most useful to others (even though I just (!) need ranges). The printer configuration example is what really persuaded me. To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced [1,3, 10-20, 50-100 +2] or the Scala influenced [1, 3, 10 to 20, 50 to 100 by 2]. In each case a list (or other sequence datatype) would be expected to result from parsing.

@ChristianSi
Copy link
Contributor

ChristianSi commented Dec 23, 2019

@alan-isaac:

To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced [1,3, 10-20, 50-100 +2] or the Scala influenced [1, 3, 10 to 20, 50 to 100 by 2]. In each case a list (or other sequence datatype) would be expected to result from parsing.

I see, but let's be honest: that will never happen, since, as pointed out much earlier in this tread, TOML is not a programming language. A TOML parser will parse date strings into date objects and number strings into numbers, but it will never evaluate stuff like "10 days after 2019-12-23" (regardless of the syntax used). I even doubt that stuff like num = 3.2*10^20 + 17 will ever be evaluated by a TOML parser. TOML will never have readable and writeable variables, for loops, or conditionals -- and what you're asking for is essentially of the same scope. It's a programming language construct, and those are outside of TOML's feature set.

On the plus side, you might be able to solve your problem be sending the file through a template engine before parsing it as TOML.

@alan-isaac
Copy link
Author

@ChristianSi Are you offering a false dichotomy? I doubt that you can come up with any coherent way of distinguishing parsing 1979-05-27T07:32:00-08:00 to a data-time object and parsing [0-100] to a range. Please suggest how to understand the distinction as you are trying to draw it. Thanks.

@ChristianSi
Copy link
Contributor

@alan-isaac I'll easily parse [0-100] (or, as I would certainly prefer to avoid confusion with the subtraction operation, [0..100]) into "list containing one value: range(from=0, to=100)" for you. But that was decidedly not what you wanted in your preceding comment.

@alan-isaac
Copy link
Author

@ChristianSi As I also said in my previous comment, I personally just need ranges. If you are saying that you would be happy to have a range synatx 0..100 that say the Python or Ruby parser would parse to a range object, then yes please! That would be extremely helpful!

The rest of the discussion appears separate to me. In that separate discussion, I still believe you are drawing an untenable line between parsing and transforming, as illustrated most nicely by the date-time type. TOML has lots of syntax that provides convenient ways to say what values should be produced by a TOML parser. Indeed, this is a key feature of TOML over INI (where standard parsers produce only strings as keys and values). So I still invite you to try to make concrete the reasoning for rejecting say the printer-configuration syntax, with date-time parsing being the point of reference for the purposes of the discussion.

But again, I would be be delighted by the addition of a simple range syntax, and the Haskell influenced double-dot notation would be great.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Dec 23, 2019

Perhaps because often in this discussion we refer to how things are done in programming languages, we lose sight of a key thing: a concise way of expressing arrays, that are a natural sequence of numbers. Just like a time-span is a range of time.

Having such expression doesn't diminish the declarative nature of TOML, in my opinion, not does it magically turn it into a programming language, far from it. It's basically just a different way of writing the same thing, but clearer than reading or writing, say, 100 numbers.

The confusion comes perhaps from the idea that ranges are often used in loops in programming languages, but the proposal here does not intent to apply the range result. It is not a branch or loop instruction.

To summarize:

foo = [1,2,3,4,5,6,7,8,9,10]

Is exactly equal to (assuming one variant of the syntax):

foo = [1..10]

The main difference being esthetics, number of keystrokes, being prone or not to errors, clarity of intent, and readability.

@llacroix
Copy link

I read the following issue and I'd want to add my 2 cents. As a toml user, if we can say that, I really like toml because it does what it claim to do and does it very well.

Being so minimal allows it to be able to be used as a drop in replacement for json/ini/... without much issues. From my point of view, having a range/slice type is a bit similar to having a datetime type. The datetime type is a pain to implement because we live in a world with timezones, offset and dst. There are so many ways to represent a date and as many ways to do it wrong that can end up very badly.

That said, this limitation of JSON doesn't prevent developers to use json. Special types can be handled as substructure of a json object. The same thing can be done in toml and as for the range/slice type.

Saying that you want to have range being understood in any language is a nice thing but a file format contains data and how the data is read / interpreted by an application is a whole different thing.

So taking this example:

foo = [1..10]

There are at least 2 possibilities to interpret this:

  1. It's generating a list from 1 to 10
  2. It's generating an Range type that lets you iterate a value from 1 to 10

If you generate a list from 1 to 10, you open toml with side effects like a file having a wrong range definition that would expand to a few terabytes ram being used. That's not really nice... Also it would make it a bit difficult for storage. So a range should be a dumb object that can be used to create a generator like in python3. In order to create a list from 1 to 10 in python3 you'd have to create a range for range(1, 11) but in python2 (while it's supposed to be unsupported as of today if I'm not mistaken), a range is a function or a generator. In javascript, it doesn't exist so it would have to be implemented and in other languages there may be some support for range but chances are a parser would have to define a custom type in many languages in order to support the extension. Just that means that the file format wouldn't be a file format anymore but starting to get into the "programming language" extension where it would have to define foreign types.
That's one good way of ending with flacky support of toml in different languages. Having multiple version of toml on a same language because a design choice in implementation didn't please someone else.
It's not like the Date type which is hardly a foreign type to any language out there.

In python3, I could handle the range issue with this format for [1,10]:

foo = [1, 11, 1]
ids = [x for x in range(*data['foo'])]

But yes, if you need to use it in an other language, you'll have to know what which parameter should be so if 1 and 11 are inclusive or not but as you define the format of your file, it should be in the file format (application level) not file format (transport level).

But nothing prevents you from having helper methods like:

array_to_range(array) -> range
range_to_array(range) -> array

This way you can have consistent way to parse substructures in toml which are specific to your application. Having it part of the file format, could be useful but it's so trivial to implement that I wonder if it's worth the hassle to have it being part of the language as it also comes with incompatibilities and sacrifices.

For example Rust doesn't seem to support a step like python3, in ruby you can include or exclude the last element, in C#, the range is a start:count parameters so 2,10 would yield [2,3...10,11] , so it means we can't have negative values for the second parameter. Java support for range seems to exists in many form but I couldn't find one supposed to be used to generate a list as an interator so it might be implementation specific like javascript.

On other thing is that choice design on how to implement the range could be influenced by the use, is is threadsafe, can it be used as an async iterator? I think it should be left to the user to implement it for their application.

Saying that you should be able to read file between languages is barely an argument as if I open a file x.json in one python app, an other ruby app won't comprehend what to do with the file even if json is correctly supported by ruby. So even if toml supported ranges, it wouldn't make all apps magically understand your file, you'd still have to implement your app to expect a range or to parse a range. Handling the conversion in your app is putting the maintenance burden on yourself and having it inside toml is putting the burden on every implementer of the toml format. It would suck to be unable to open a file on a certain language because implementer couldn't decide how to implement ranges in a way it pleases everyone. When you can just put an array or a object and call it a day.

@alan-isaac
Copy link
Author

@llacroix I suggest that the most basic question is simpler than you appear to believe. The question is simply whether TOML files should have a syntax to more simply describe a certain simple and common type of array, which would otherwise have to be typed out explicitly.

For this basic question, your most powerful argument is that the syntax is so flexible that a TOML file might cause a parser to generate a dismayingly large array. This will not affect parsers that choose to return a range object of some type, rather than explicitly constructing the array. Worrying about such parser decisions is like worrying about whether the current Python parser should return a list, a tuple, or an array. I think it is out of scope?

The need for this is not for sharing across instances of a single application but for sharing certain kinds of configurations across diverse applications. A good point of reference is thinking about how TOML files could be used to configure print jobs by providing an array of page numbers. The configuration should be parsed to the sequence of page numbers, not to an object that could be turned by an app here or an app there by not all printer apps into this sequence.

But again, the question is just one of convenient syntax. Just as you do not insist that TOML users represent number literals with hex notation, why should they not more simply represent certain common and simple array literals?

@llacroix
Copy link

But again, the question is just one of convenient syntax. Just as you do not insist that TOML users represent number literals with hex notation, why should they not more simply represent certain common and simple array literals?

Because range aren't literal for arrays, they're pretty much literal for control flow. They are used to prevent loading a complete data structure in memory. You can accumulate all the values or reduce them to a single one.

In other words, range is the functional version of

for(i=start; i<stop; i+=step)

This will not affect parsers that choose to return a range object of some type, rather than explicitly constructing the array. Worrying about such parser decisions is like worrying about whether the current Python parser should return a list, a tuple, or an array. I think it is out of scope?

It's not out of scope as if the parser can return different types, it would make it difficult or even impossible to load certain file in some languages as range != array.

Let say you have a file that looks like this:

[job.a]
pages = [1:3]

[job.b]
pages = [1:2, 5:7]

[job.c]
pages = [1:2, 10]

In this case, job.c.pages would have a list of [range, int] Which is incompatible if you load a List<Range> for example. The other would be a list of ranges only. If you wanted to explode them into list you'd have a List<List> and in the last example you'd still have an int in conflict so you'd have to write this instead.

[job.c]
pages = [1:2, [10]]

To be able to do something like this:

for page_ranges in data['pages']:
   for page in page_ranges:
      do_something(...)

But lets put aside the memory limit and OOM killer issue and let say we want to explode range and join list together to have this [1:2, 5, 7:10] explode into [1,2,5,7,8,9,10] then we could in theory loop over all the elements as if it was a list... But like I said earlier if you do that, you're not capable of serializing it back into a range.

With that file:

[stars]
indices = [1:100000000000000000]

Let's imagine you do that:

# load the file and explode as list
x = toml.load("file.toml")
x['stars']['good'] = [1, 2, 3]
toml.dump(open('file.toml', 'w'), x)

If those things are loaded as array literals, they're going to be saved as array of int. Thought the data didn't really change but you start from a file with a few bytes to a couple thousands of bytes just because expansion loose the data it stores because it can't know what was the range previously stored in the file if it expand it.

And that would be difficult to handle correctly because if you can expand it, you're either breaking the ram (you're certainly going to go out of memory) or you need to keep it as a range and handle it as completely different type as they're not array literals but in that case you're getting hit by platform limitations in implementation specific ways.
I mean even if we could set a limit on the parser to limit 1 expansion to 1000 elements, it doesn't prevent a malicious user to input 1000 ranges expanding to 1000 elements. And all those checks add complexity to a parser and you only need to forget one case to hope it was never implemented.

Also it's not very typical to manipulate ranges directly in code. For example, I don't see why in code I'd build something like this:

pages = [range(1, 10), 1, range(13, 45)]

The only way I see how it could make sense is if you received an input text as toml to be parsed to [range(1, 10), 1, range(13, 45)] but as you suggest, it would return a list of int anyway so if you had

pages = pages_from_toml() 

It would always store a list of int, and your output file would always have the exploded version of an array literal. So in order for a software to output ranges you'd have to manually do something like this

pages = []
for rarr in ranges:
    pages.append(range(rarr.start, rarr.stop))

How is that easier to write than:

pages = []
for rarr in ranges:
    pages.append([rarr.start, rarr.stop])

Programming wise, it's not particularly different, a range is really just a start stop and possibly increment.

@alan-isaac
Copy link
Author

  1. Your (@llacroix) first comment completely misses the point. A range syntax will be an array literal if TOML says so. It is that simple. This is a very simple point. I am not understanding why it is repeatedly ignored. @abelbraaksma has explained this multiple times, and this observation is no more than a standard CS use of the term "literal". It is just a matter of convenient syntax.
  2. I will repeat the other simple but apparently misunderstood point. The goal is not to have a TOML file plus schema that together can be used to produce a configuration. (I am not just noting the lack of a TOML schema framework.) The goal is to have a TOML file that directly represents the configuration. That is, it is to facilitate the direct use of handwritten TOML files as configuration files, which is a common use for them. Each suggested workaround completely misses the point. Both @abelbraaksma and I have tried very hard to draw this distinction, but neither you nor @ChristianSi have offered any indication of understanding what we are trying to say. That may be the fault of our communication skills -- I for one do not have formal CS training -- but surely you can overcome our shortcomings in that area.
  3. The worry about exploding array sizes is something of a red herring. Right now, a TOML file has no restriction on array sizes, so I can already send a file will enormous arrays. Of course the difference is that right now the TOML file would have to be correspondingly large. If this difference is seen as significant, then the size of arrays specified with a range notation could be limited (e.g., to 1000 items). But seriously, if array size is a concern, then parsers should protect against that no matter what the source, so that discussion should be entirely separate.

@RedHatTurtle
Copy link

Similar to #428 ?

@JeppeKlitgaard
Copy link

I think this is the case where a good feature might be missed due to some partisan entrenchment that has bubbled up over the course of the discussion.

As someone with no horse in this race, I think that a range should only be added if it is truly a datatype.

That is, the TOML parser should not evaluate the range and provide an array of numbers. It should provide a range object, which the implementation would then know how to deal with. If this was to be the case, the discussion reduces to whether TOML should provide a clearly defined range type, much like it provides a datetime type.

Assuming that range is now a type that is not evaluated by TOML, but merely parsed, this also requires the implementation to recognize that in almost all cases the field could be EITHER a range or an array of numbers. This likely wouldn't be much of an issue, but does add a bit of complexity to the language.

I think it then also becomes important to distinguish between bounded and unbounded ranges. These would be two separate types, ideally. It is important that an application can know whether a range is bound or unbounded, since in many cases an unbounded range might not be appropriate and lead to a non-terminating execution.

@alan-isaac
Copy link
Author

@JeppeKlitgaard Bounded ranges would be great. Personally, I have no need for unbounded ranges or non-integer ranges.

@pradyunsg
Copy link
Member

I've just read through the whole discussion for a third time.

I don't think a range type is particularly useful to add on its own -- it's just not as common -- and I don't think that adding a mechanism to have a shorthand for [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100] is particularly useful.

I still think this is best solved on a per-application basis, since this is a fairly niche concern and the inline table syntax is totally up to the task at hand here.

@pradyunsg
Copy link
Member

Thanks for the surprisingly passionate discussion on this folks, as well as for your patience on this! :)

@alan-isaac
Copy link
Author

This is a perfunctory assessment which seems to be no more than the following: "I don't need this feature, so without further evidence I'm going to call it 'niche' and close this discussion without seriously addressing any of the issues that have been raised."

Disappointing, to say the least.

I try to imagine someone offering a similar case against ranges in any of the many languages where they are a core language feature.

Please reopen this issue for a more serious and objective consideration.

@arp242
Copy link
Contributor

arp242 commented Mar 6, 2022

This is a perfunctory assessment which seems to be no more than the following: "I don't need this feature, so without further evidence I'm going to call it 'niche' and close this discussion without seriously addressing any of the issues that have been raised."

It's been discussed extensively, and the overall concuss seems fairly clear to me. This is not necessarily a democratic vote, but it wasn't "just closed" on the whims of one person.

As far as I'm concerned it's up to you to demonstrate that there is a demand for this feature, not for the TOML maintainers to demonstrate there is no demand for it. And with "demonstrate that there is a demand" I don't mean "it might be useful in this hypothetical scenario" but "here are a bunch of popular projects with TOML configurations that would be made better by this", and similar more concrete stuff.

Remember: almost every single feature that has every been added to any configuration file, programming language, or other piece of software was useful to someone, at some point. I don't find "it's useful in scenario X" on its own to be a very good argument in these types of discussions, as it can be used for everything.

Personally it seems to me the need for such a feature is too rare; I thought about this for a few minutes and the only use-case I can think of is Vim's iskeyword setting. Although I'm sure there are other use-cases, they don't seem common. Plus it can be worked around quite easily with rng = "1..10 20..30", rng = [[1, 10], [20, 30]], or the inline table syntax mentioned in the 2019 discussion. This is perhaps a bit suboptimal, but seems workable enough.

@alan-isaac
Copy link
Author

@arp242 Your comments suggest that you have not understood the discussion. They also weirdly suggest that the only use of TOML you can imagine is for project configuration files.

I'm not going to rehash the whole discussion for you, but note that when you say "it can be worked around quite easily" you completely overlook that TOML does not include any facilities for communicating semantics (e.g., a grammar language, like JSON schema). This issue is already raised above, in detail. Just imagine if someone proposed that TOML tables could "quite easily" be replaced with strings or nested lists of strings.

These comments signal only a desire to close this issue for lack of progress, not a desire to actually come to grips with it.

@arp242
Copy link
Contributor

arp242 commented Mar 6, 2022

I said what's needed to move things forward as far as I'm concerned: show popular projects with TOML configurations that would be made better by this. So if you really want this badly I suggest working on that.

They also weirdly suggest that the only use of TOML you can imagine is for project configuration files.

That is its goal.

@alan-isaac
Copy link
Author

@arp242 Here is the actual goal statement:
"TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics."

This statement encompasses the configuration of simulations.

@pradyunsg
Copy link
Member

pradyunsg commented Mar 6, 2022

@alan-isaac I appreciate that you feel strongly about this, and also understand that you're disappointed about the answer/responses here (from the broader group of individuals who've contributed to the discussion here as well as myself). I appreciate and empathize that this would be beneficial for your use case, if this were added.

However, I don't think the arguments made here are compelling and I certainly disagree that this is as broadly applicable as has been claimed in this thread at various points.

Sure, programming languages do have range syntaxes or mechanisms to generate ranges. However, TOML is not a programming language. I'm not aware of any major configuration language that has ranges. Even YAML doesn't have them and YAML somewhat famously has too-many-things.

I try to imagine someone offering a similar case against ranges in any of the many languages where they are a core language feature.

A broadly-applicable programming language? Oh, I'd be very opposed to not having ranges in a programming language. However, TOML is not a programming language.

@alan-isaac
Copy link
Author

@pradyunsg Thanks for your reply. You are correct that I am disappointed, but the reason is different. I am disappointed when those who reply have made little effort to understand the use case, which resulted in the justification for closing this issue being completely off base. This lack of understanding was a problem encountered early in this thread and eventually somewhat resolved. Misunderstanding of the use case and mischaracterization of the request leads to proposals of supposedly easy workarounds that entirely miss the point.

Your "TOML is not a programming language" critique, also found above, again misses the point. Nobody has proposed adding say control flow. The only proposal is the addition of a range literal.

So far, the primary critique that has merit is the claim that "there is no obvious syntax". However, the closed integer interval notation from mathematics (e.g., [1..10]) is in fact quite obvious. And in any case, if obviousness were the real objection, then it should set off a search for an acceptable syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests