-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving tables #65
Comments
I'm delighted that you're interested in taking this on. It's one of the top priority improvements for pandoc, but it has been hard to get it done because (a) it's a big change and (b) it's hard to decide what the best type is. I don't think we should let the perfect be the enemy of the good: we should discuss (b), but we should set a limit to how long we discuss it before just moving ahead with something that will be better than what we have currently. (If needed, we can make further incremental changes in the future.) More later... |
The first approach allows any cell to be a header cell. That might be an advantage for representing tables where the left column is the header (not common) -- such tables can't be represented in the second approach -- but it has the disadvantage that many table formats can't represent arbitrary header cells. (HTML is an exception obviously.) So I'm leaning more to the second approach. I don't know how important it is to represent tables where the header is a column rather than a row, and I'm not sure what the cost would be of unrepresentable tables on the first approach. |
The layout I have in my mind, incidentally, is this:
since I realized that the second representation might suggest that the row headers are not under the table head. This is also the implicit layout of the first approach. When you say that "the left column is the header", do you mean that the table is transposed during writing so that the table head rows become columns? Otherwise I think that the row head section could be used as the header. The only oddity would be that a single header line would be split up among multiple rows. In the first approach, I suppose that after separating out as many sections as the writer supports (table head, foot, row head) the writer would forget about the cell type and simply write the cell content as-is. |
I wasn't understanding what you mean by Row Head. Now I see you mean a header cell in the left position in a row. And now I notice that you have |
I'm wondering whether it would make things easier if the types were a bit more uniform. Rendering a header row will often be almost the same as rendering a body row. What if we just had data Row = HeaderRow Attr [Cell] {- row heads -} [Cell] {- other cells -}
data Block =
...
| Table Attr Caption ShortCaption [(Alignment, ColWidth)] [Row] {- header -} [Row] {- body -} [Row] {- footer -} The drawback is that this allows you to represent distinctions that are irrelevant in the header and footer rows. The advantage is that it makes it easier to deal with rows in a uniform way in the code. I'm not really sure about this tradeoff. If we do go with your original approach, we'll need a different type constructor: data HeaderRow = HeaderRow Attr [Cell] Another approach might be: data Row a = Row a Attr [Cell]
data HeaderRow
data FooterRow
data BodyRow = BodyRow [Cell]
...
| Table Attr Caption ShortCaption [(Alignment, ColWidth)]
[Row HeaderRow] [Row BodyRow] [Row FooterRow] |
Writers that can't represent row headers might find it easier to concatenate the row head and body and operate on an If there were a uniform row type, then the table picture could be
I am not sure if this is a useful distinction, but it does give the row header in the table head some meaning. |
If I'm understanding you correctly, you are now suggesting a uniform type
to be used for the header, body, and footer? That sounds good to me. It's conceivable that some formats could treat "TH above row head" specially. |
That was my interpretation of the first |
OK, to summarize then: type RowSpan = Int
type ColSpan = Int
type Caption = [Block]
type ShortCaption = [Inline]
type ColWidth = Maybe Double
data Cell = Cell Attr (Maybe Aligment) RowSpan ColSpan [Block]
type RowHead = [Cell]
type RowBody = [Cell]
data Row = Row Attr RowHead RowBody
type TableHead = [Row]
type TableBody = [Row]
type TableFoot = [Row]
data Block =
...
| Table Attr Caption ShortCaption [(Alignment, ColWidth)] TableHead TableBody TableFoot @tarleb - what do you think of this? |
Looks good to me! Maybe we could group the arguments to |
If we want to compress things, I'd prefer something like data Caption = Caption (Maybe [Inline]) [Block] -- short caption, full caption
type ColSpec = (Alignment, Maybe Double)
data Block =
...
| Table Attr Caption [Colspec] TableHead TableBody TableFoot And should we consider using |
Having the caption components bundled together would be good. That bundling might happen anyway with a new I'm not sure how great the benefit of newtyping would be. The |
Advantage of a |
That said, we use type aliases all over the place in pandoc-types now (e.g. |
Perhaps it isn't the time to change It sounds like the most recent summary, with the modified |
Sounds good to me! |
This implements issue jgm#65 for the library itself. The tests do not compile. The Legacy modules are hidden until a way of dealing with them has been decided.
Hello, I just wanted to share that I'm in the process of submitting a Google Summer of Code 2020 proposal to provide a library with similar functionality, as it seems to be something many Haskell packages could benefit from, not least of all pandoc. The exact API is not finalized, but the proposal is in rough draft form at the moment. I do hope this is something that can benefit this project and many others. |
@Mercerenies - the proposal looks quite interesting. How were you thinking it intersects with pandoc? Do have any suggestions about to the proposal above, or does it seem reasonable to you? |
The timing ended up being quite inconvenient, as I reached out to @tarleb about the proposal a few days before this issue was opened. That being said, I do still feel like a dedicated library for this kind of thing would be very nice to have, for several reasons, even if pandoc has its own type as well. In terms of the above proposal, I share the concern about |
I completely agree that a dedicated library could be useful even if pandoc has its own type -- and there could be glue code converting between pandoc tables and this library's type. |
This implements issue jgm#65 for the library itself. The tests do not compile. The Legacy modules are hidden until a way of dealing with them has been decided.
See the main todo list and the relevant issue. I would like to start implementing better table handling in Pandoc. Specifically, I would implement all but the last of these bullet points using one of the designs below (or a modified version of one of them).
I think something like this recently outlined approach is a good way forward for now. The representation is a little loose (any table in the intermediate representation is valid, so there are multiple ways to write a given table, but only one normalized way), but it should allow the readers and writers to be switched more easily. This is slightly modified version of that approach:
The
Maybe Alignment
on the individual cells allows the cells to override the alignment of the column(s) in which they reside. This makes it easier to specify one's intentions when a cell spans multiple columns with conflicting alignments, and has the advantage of allowing better\multicolumn
and\multirow
support in the LaTeX reader and writer. It also comes up naturally when one thinks of possible extensions to the supported markdown table formats.A similar design has the following modifications:
This has the advantage of making explicit the table head/body/foot and row head/body structure that seems to be assumed in the first approach, where the first entirely header rows become the table head, and the last such rows become the table foot. Cells in the head and foot sections would correspond to
th
cells, and cells in body section would correspond totd
cells. It does not require aCellType
, but one could still be added, making these even more similar to HTML tables. This approach has the disadvantage of making the table representation more complex.I assume that the tables are normalized (laid on a grid with a given width so that overlapping cells and empty spaces can be dealt with in the table) like so, informally:
ColSpan
) would be lowered to fit. If it would extend past the bottom of the grid, its height (RowSpan
) would be lowered to fit.The table head, table foot, row head (the list of row head sections without the row body), and row body (the list of row body sections without the row head) should be normalized independently in any design where these exist (implicitly in the first, or explicitly in the second). The overall table width would be the length of the
[(Alignment, ColWidth)]
list, and the row head/body width would add to that width. (The row head width would be the width of the first row in the row head).The text was updated successfully, but these errors were encountered: