Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: data types of the columns #321

Open
airvzxf opened this issue Apr 14, 2024 · 0 comments
Open

Discussion: data types of the columns #321

airvzxf opened this issue Apr 14, 2024 · 0 comments

Comments

@airvzxf
Copy link
Contributor

airvzxf commented Apr 14, 2024

Discussion: data types of the columns

Discussion

My question is whether there should be a mixed-type column instead of automatically deciding what data type the column is. In this case, “tabulate” would try to identify the data type for each cell in that column and treat it as such rather than the overall type of the column.

Quick comment

In GitHub, you could add the discussion feature: https://github.com/features/discussions. With this feature, your community can create a discussion, if a specific discussion is relevant, it could move to the issues section.

Not all the final users (community) use the discussion feature, instead the repositories or projects have enabled. But, the discussion feature appears to be useful in terms of administration.

Context

I noticed that “tabulate” reviews all the rows for each column and automatically assigns a type of column. It is fabulous, but I am concerned when the rows in the column are mixed.

In the mixed cases, I discovered that the order is as follows:

  • If one row in the column contains one or more strings, it is considered a string.
  • Otherwise, if at least one float is detected, the column will be a float type.
  • Finally, if the column contains only integers, the type is the same.

Evidence

All the results were taken, adding debug lines for the function _format(). It prints the val type and the valtype value to compare both.

def _format(val, valtype, floatfmt, intfmt, missingval="", has_invisible=True):
    print(f'    val: {val}')
    print(f'   type: {type(val)}')
    print(f'valtype: {valtype}')
    print()

For this instruction: tabulate([[82000.38], ["abcd"], [92165]], tablefmt="plain") the valtype is <class 'str'>.

The result is below.

    val: 82000.38
   type: <class 'float'>
valtype: <class 'str'>

    val: abcd
   type: <class 'str'>
valtype: <class 'str'>

    val: 92165
   type: <class 'int'>
valtype: <class 'str'>

For this instruction: tabulate([[12013], [210], [15.24], [92165]], tablefmt="plain") the valtype is <class 'float'>.

The result is below.

    val: 12013
   type: <class 'int'>
valtype: <class 'float'>

    val: 210
   type: <class 'int'>
valtype: <class 'float'>

    val: 15.24
   type: <class 'float'>
valtype: <class 'float'>

    val: 92165
   type: <class 'int'>
valtype: <class 'float'>

For this instruction: tabulate([[12013], [210], [92165]], tablefmt="plain") the valtype is <class 'int'>.

The result is below.

    val: 12013
   type: <class 'int'>
valtype: <class 'int'>

    val: 210
   type: <class 'int'>
valtype: <class 'int'>

    val: 92165
   type: <class 'int'>
valtype: <class 'int'>

Expectation

Based on this discussion, I expected this output for the _format() function.

Solution 1

For this instruction: tabulate([[82000.38], ["abcd"], [92165]], tablefmt="plain") the valtype should be Mixed or something like this.

The result is below.

    val: 82000.38
   type: <class 'float'>
valtype: <class 'Mixed'>

    val: abcd
   type: <class 'str'>
valtype: <class 'Mixed'>

    val: 92165
   type: <class 'int'>
valtype: <class 'Mixed'>

Then, in the logic for the _format() function, we can check that it is mixed and take the real value for the val to perform all the actions for formatting.

Solution 2

Always ignore the valtype and take the type of the val. Except if some parameter was passed which indicates that the user specified the format of the column. Something like this: tabulate([[82000.38], ["abcd"], [92165]], coltypes=(int), tablefmt="plain"); which will treat all the cells in the column as integers.

Final note

I arrived to this package because I was using the Pandas package, specific to the function “to_markdown”. Maybe, could be a good idea to add the Pandas people to see this discussion and have additional feedback.

By the way, Pandas wraps a limited version of tabulate for the function to_markdown. Outside this discussion, it could be nice to Pandas wrap the full parameters and functionality of tabulate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant