Click here to flash read.
Language models pretrained on large collections of tabular data have
demonstrated their effectiveness in several downstream tasks. However, many of
these models do not take into account the row/column permutation invariances,
hierarchical structure, etc. that exist in tabular data. To alleviate these
limitations, we propose HYTREL, a tabular language model, that captures the
permutation invariances and three more structural properties of tabular data by
using hypergraphs - where the table cells make up the nodes and the cells
occurring jointly together in each row, column, and the entire table are used
to form three different types of hyperedges. We show that HYTREL is maximally
invariant under certain conditions for tabular data, i.e., two tables obtain
the same representations via HYTREL iff the two tables are identical up to
permutations. Our empirical results demonstrate that HYTREL consistently
outperforms other competitive baselines on four downstream tasks with minimal
pretraining, illustrating the advantages of incorporating the inductive biases
associated with tabular data into the representations. Finally, our qualitative
analyses showcase that HYTREL can assimilate the table structures to generate
robust representations for the cells, rows, columns, and the entire table.
No creative common's license