Click here to flash read.
There has been a recent surge of interest in automating software engineering
tasks using deep learning. This paper addresses the problem of code generation
where the goal is to generate target code given source code in a different
language or a natural language description. Most of the state-of-the-art deep
learning models for code generation use training strategies primarily designed
for natural language. However, understanding and generating code requires a
more rigorous comprehension of the code syntax and semantics. With this
motivation, we develop an encoder-decoder Transformer model where both the
encoder and decoder are explicitly trained to recognize the syntax and data
flow in the source and target codes, respectively. We not only make the encoder
structure-aware by leveraging the source code's syntax tree and data flow
graph, but we also support the decoder in preserving the syntax and data flow
of the target code by introducing two novel auxiliary tasks: AST (Abstract
Syntax Tree) paths prediction and data flow prediction. To the best of our
knowledge, this is the first work to introduce a structure-aware Transformer
decoder that models both syntax and data flow to enhance the quality of
generated code. The proposed StructCoder model achieves state-of-the-art
performance on code translation and text-to-code generation tasks in the
CodeXGLUE benchmark, and improves over baselines of similar size on the APPS
code generation benchmark. Our code is publicly available at
https://github.com/reddy-lab-code-research/StructCoder/.
No creative common's license