-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for specifying a data type "kind" in astype
#848
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting the ball rolling on this @lucascolley! Some comments inline.
For ``dtype_or_kind`` a data type, an array having the specified data type. | ||
For ``dtype_or_kind`` a kind of data type: | ||
- If ``x.dtype`` is already of that kind, the data type is maintained. | ||
- Otherwise, an attempt is made to convert to the specified kind, according to the type promotion rules (see :ref:`type-promotion`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "an attempt"? That seems ambiguous. We have to be clear about what must work. Which I think is:
- float to complex
- unsigned to signed integer
Anything else doesn't I think? There's no point allowing 'bool'
I think, since there is only one boolean dtype so dtype=xp.bool
will be cleaner.
For 'signed integer' and
'real floating-point'` there are also no promotion rules to follow, so they can be left out - or do you see a use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reduced this down to just 'complex floating'
(use-case: mixed float/complex to complex) and 'signed integer'
(use-case: mixed signed/unsigned to signed).
I think "an attempt" would still be accurate for an implementation of this? xp.astype(some_int8_array, 'complex floating')
would attempt a conversion, whose success will depend on the implementation-specific type promotion rules, right?
Unless you think that this function should always error when the type promotion is not defined by the standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "an attempt" would still be accurate for an implementation of this?
I think you have the right idea in mind here, it's just a "language we use to specify things" thing. We specify which behavior has to be supported - 'complex floating'
has type promotion rules defined in the standard, so it's expected to always work for a compliant implementation. Then, if we expect other input types to raise, then we specify that by "must raise ..." or "input type must be ...". In this case there's no reason to do that (implementors are free to suppport more types, it's just not standardized), so we then say "input type should be ...".
Your "attempt to ..." seems to be the same as "should be ...", it's just language we want to write in a uniform way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about the wording now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick update: I took the liberty to update the wording. I also made the call to broaden the list of data type kinds. I think there are reasonable arguments for providing any one of the numeric data type kinds (e.g., int32
and real floating
=> float64
, etc), and it is possible to delineate a set of clearly defined rules in terms of which data type should be returned. Leaving bool
out seems somewhat arbitrary, especially when the semantics are clearly specified and all other kinds can be, IMO, reasonably provided (note: even including "numeric"; i.e., convert anything provided to me to numbers so I can compute the sum, etc).
Gentle reminder of the 2024 milestone here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one nit, otherwise this new wording LGTM, thanks @kgryte !
When casting a boolean input array to a real-valued data type, a value of ``True`` must cast to a real-valued number equal to ``1``, and a value of ``False`` must cast to a real-valued number equal to ``0``. | ||
- When applying type promotion rules, the returned array must have the lowest-precision data type belonging to the specified data type kind to which the data type of ``x`` promotes (e.g., if ``x`` is ``float32`` and the data type kind is ``'complex floating'``, then the returned array must have the data type ``complex64``; if ``x`` is ``uint16`` and the data type kind is ``'signed integer'``, then the returned array must have the data type ``int32``). | ||
- When type promotion rules are not specified between the data type of ``x`` and the specified data type kind (e.g., ``int16`` and ``'real floating'``) and there exists one or more data types belonging to the specified data kind in which the elements in ``x`` can be represented exactly (e.g., ``int32`` and ``float64``), the function must return an array having the smallest data type (in terms of range of values) capable of precisely representing the elements of ``x`` (e.g., if ``x`` is ``int16`` and the data type kind is ``'complex floating'``, then the returned array must have the data type ``complex64``; if ``x`` is `bool`` and the data type kind is ``integral``, then the returned array must have the data type ``int8``). | ||
- When type promotion rules are not specified between the data type of ``x`` and the specified data type kind and there neither exists a data type belonging to the specified data type in which the elements of ``x`` can be represented exactly (e.g., ``uint64`` and ``'real floating'``) nor are there applicable casting rules documented below, behavior is unspecified and thus implementation-defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When type promotion rules are not specified between the data type of ``x`` and the specified data type kind and there neither exists a data type belonging to the specified data type in which the elements of ``x`` can be represented exactly (e.g., ``uint64`` and ``'real floating'``) nor are there applicable casting rules documented below, behavior is unspecified and thus implementation-defined. | |
- When type promotion rules are not specified between the data type of ``x`` and there neither exists a data type belonging to the specified data type in which the elements of ``x`` can be represented exactly (e.g., ``uint64`` and ``'real floating'``) nor are there applicable casting rules documented below, behavior is unspecified and thus implementation-defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this suggestion makes sense atm, as the removed phrase is part of
"between the data type of x
and the specified data type kind"
Removing "the specified data type kind" removes the y
in between x and y
.
As discussed during the workgroup meeting on January 23, 2025, given that this is new behavior without any downstream implementations, we decided to do two things:
@ev-br Would you mind adding to |
astype
: accept a kind of data typeastype
This PR:
dtype
fromuint
toint
andfloat
tocomplex
? #859 by adding data type "kind" support inastype
.