-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML writer - support new table features #6314
Comments
I'm starting to work on this. |
subscribing |
Is this related or is there a ticket for html reader to support jgm/pandoc-types#66 ? |
The ticket for the HTML reader is #6312. It has a "Good first issue" label, but I'm not sure if that really applies. |
Progress notes: The hardest problem I'm facing is that, in order to apply the correct alignment to each cell, and row number to each row, we need to have a good idea of how the table grid will look. This is has become much more difficult with the new table structure. My current line of attack is building a separate T.P.Writers.Tables module which allows to create a rectangular -- | Table row offset (i.e., the number of rows which have to be
-- moved up to find the topmost row belonging to a cell).
newtype RowOffset = RowOffset Int
deriving (Eq, Num, Enum)
-- | Table column offset (i.e., the number of column which have to
-- be moved left to find the leftmost column belonging to a cell).
newtype ColOffset = ColOffset Int
deriving (Eq, Num, Enum)
-- | Rectangular table which makes it easy to match table cells
-- with their column and row numbers.
newtype GridTable = GridTable [GridRow]
-- | Single row of a 'GridTable'.
newtype GridRow = GridRow [GridCell]
-- | Single cell of a 'GridTable'. Usually, only cells with zero
-- offsets should be rendered. Other cells serve as placeholders.
data GridCell = GridCell RowOffset ColOffset Cell
toGridTable :: [Row] -> GridTable
toGridTable = undefined -- WIP This should be useful for other writers as well. I'd love to learn about alternative approaches and ideas. |
It would be great to get feedback from @despresc on this. |
I wrote an experimental HTML table writer as a Lua filter in the past few weeks, just for the heck of it, knowing it would be useless once there was an official writer for that; so didn't publish it, but maybe now could give ideas. (Sorry if a bit messy.) I encountered a similar problem to know the In particular, it's in the "Table Environment" function/pseudoclass Basically, I keep track of which cells are occupied across rows if there are row spans; as I go along reading each cell info, I write their row spans across their col spans, and as I advanced through the rows, there is a decrement to indicate where we are in the current cell occupancy at each row start; an occupancy of 0 means it is safe to write to it. For example, we could have at the beginning of processing a row (for five cell row):
which means that the first 3 cells and the fifth are still "occupied" by row spanning cells from above, and only the 4th cell can be used to write to.
where now the 3rd and 4th cells are occupied by above cell's row span, all other are "free". So I didn't need to know in advance how the grid would look like, keeping track as above was sufficient. No idea how this would translate to Haskell. /Edit: oops, my code has a problem when a particular row has only cells from row spans... so there's need for a "true row" also... |
Thanks, this is helpful. The algorithm I came up with looks similar, with the difference that I'm trying to avoiding mutation. I'm considering to switch to your approach, the occupancy idea is nice. I'd like to reuse your test files, but you'd have to license them as GPL2-or-later. Would you be so kind to make a PR to add them to the repo? |
I've seen later that this "occupancy idea" is probably similar to what desprec originally did he calls those "overhangs". Probably a better starting place for you in Haskell. As for the test files, I've tried to give a very permissive license (public domain) so one could use them as they will. And since most of them are transformed from others' source, I'm not sure if I can impose a stricter license. Maybe a double license "public domain/gpl2" would do, in pandoc-lua-filters? |
Thanks. I used the "planets" table, which is indeed CC0. |
Yes, I found the occupancy/overhang method to be the easiest for dealing with these tables. You could use it to build up a I can't remember what we decided for determining cell alignments when the cell had an One thing I should note is that I assumed that no cells would be moved upward or downward while the table was being laid onto the grid. They just get clipped or dropped to fit in the available space. That means that the row number of a cell should be the same as the row number of its parent row (its index in the |
To clarify, I think that that cell placement behaviour on grid rows agrees with the HTML spec on well-formed tables (no overlapping cells, no empty rows, etc.). For invalid tables I think they simply call it a "table model error" and decline to specify what to do. |
So how would you render the following (abridged) native format
(see A "direct translation" would be this:
but the resulting table as seen in a browser is clearly wrong (Firefox, Vivaldi, IE, Edge).
and in that case the last row doesn't have the same (HTML) index as the one from the /Edit, ah, but I do see what you mentioned in the w3.org's Tabular data:
|
On the face of it the table you gave is this:
Right now the code lays the table on a grid with height equal to the length of the
They would drop all the cells in the last row, leaving a |
I suppose after laying out a row you could check if the overhang in each column is |
/Edit: oops, published while you were writing your previous comment...
Anyway, all this is about testing edge cases, and I only stumbled on that case after posting here.
|
I did it because that is how the browser accepted it, so I thought it was how it was supposed to be. Now I see such cases are malformed tables. So the question is whether to do as browsers do, and accept and create those (formally invalid) empty rows to produce what was visually intended, or reject the table as malformed... |
You are right. The native/json/lua readers and writers take in and emit the tables directly, without other processing. All the other readers and writers do actually transform the tables like I described, or at least they did formerly. The pandoc |
Sorry, that's not quite all of it. The readers and writers do perform those transformations, but a lot of it isn't apparent because they don't yet support row and column spans. |
So for readers, table handling currently looks like
and for writers that process runs in reverse. |
So, the native form should be the result of a "normalization", |
More or less. The non-native readers all try to produce internal tables that are reasonably nice if they aren't already (free from cell overlap errors, at least), and the non-native writers do not assume that they will be given nice internal tables, so they will also try to make sense of them as best they can. They do this consistently by interpreting the internal |
Re-adding this comment after I had first misplaced it in the issue for the HTML reader: Colspans and rowspans have been added in #6644. Table features which have not been added yet:
|
Add support for new table features introduced in
jgm/pandoc-types#66
including table attributes (including identifier), rowspan, colspan, table head and foot, multiple header lines, row headers, captions that allow block-level content and include an optional short caption.
The text was updated successfully, but these errors were encountered: