GAST: A generic AST representation for language-independent source code analysis

Authors

DOI:

https://doi.org/10.29019/enfoqueute.957

Keywords:

Code transformation, Generic Abstract Syntax Tree, Generic Language, Code Analysis

Abstract

Organizations use various programming languages to develop their systems. These aim to take advantage of the most appropriate features of each language for a given domain and require programmers to command different languages and also to face the growing complexity of software development and maintenance. So, they need tools to help them analyze programs to identify relationships between their internal elements, uncover patterns, and calculate quality metrics. However, most tools have limited support for parsing multiple programming languages and high acquisition costs. Therefore, there is a need for new methods to analyze code written in multiple programming languages. This article describes the design of a method to automatically transform the syntax of various programming languages into a universal language with a generic syntax. The function of the generic language is to encapsulate the specificities of each specific language, so that the analysis of programs is facilitated in a single programming syntax and not in multiple syntaxes. The advantage of this approach is that only one analysis engine is required, not multiple code analyzers, to study the programs.

Downloads

Download data is not yet available.

References

S. Nanz and C. A. Furia, “A Comparative Study of Programming Languages in Rosetta Code,” p. 778–788, 2015, https://doi.org/10.1109/ICSE.2015.90.

Z. Mushtaq, G. Rasool, and B. Shehzad, “Multilingual Source Code Analysis: A Systematic Literature Review,” IEEE Access, vol. 5, pp. 11 307–11 336, 2017, https://doi.org/10.1109/ACCESS.2017.2710421.

M. D’Ambros, H. Gall, M. Lanza, and M. Pinzger, “Analyzing software repositories to understand software evolution,” pp. 37–67, 2008, ISBN: 978-3-540-76440-3.

Sonar, “Sonarqube,” Electronic, Jun 2023. [Online]. Available: https://www.sonarsource.com.

O. Nierstrasz, S. Ducasse, and T. Gˇırba, “The Story of Moose: an Agile Reengineering Environment,” ACM SIGSOFT Software Engineering Notes, vol. 30, no. 5, pp. 1–10, 2005, https://doi.org/10.1145/1095430.1081707.

M. Papadakis, M. Delamaro, and Y. Le Traon, “Mitigating the Effects of Equivalent Mutants with Mutant Classification Strategies,” Science of Computer Programming, vol. 95, pp. 298–319, 2014, https://doi.org/10.1016/j.scico.2014.05.012.

F. A. Bastidas and M. Pe´rez, “A Systematic Review on Transpiler Usage for Transaction-Oriented Applications,” in 2018 IEEE Third Ecuador Technical Chapters Meeting (ETCM). IEEE, 2018, pp. 1–6, https://doi.org/10.1109/ETCM.2018.8580312.

D. L. Whitfield and M. L. Soffa, “An Approach for Exploring Code Improving Transformations,” ACM Transactions on Programming Languages Systems, vol. 19, no. 6, p. 1053–1084, Nov 1997, https://doi.org/10.1145/267959.267960.

K. An, N. Meng, and E. Tilevich, “Automatic Inference of Java-to-Swift Translation Rules for Porting Mobile Applications,” in Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, 2018, pp. 180–190, https://doi.org/10.1145/3197231.

T. Dirgahayu, S. N. Huda, Z. Zukhri, and C. I. Ratnasari, “Automatic Translation from Pseudocode to Source Code: A Conceptual-Metamodel Approach,” in 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom). IEEE, 2017, pp. 122–128, https://doi.org/10.1109/CYBERNETICSCOM.2017.8311696.

N. Shetty, N. Saldanha, and M. Thippeswamy, “CRUST: AC/C++ to Rust Transpiler Using a “Nano-parser Methodology” to Avoid C/C++ Safety Issues in Legacy Code,” pp. 241–250, 2019, ISBN:978-981-16-1344-9.

M. Drissi, O. Watkins, A. Khant, V. Ojha, P. Sandoval, R. Segev, E. Weiner, and R. Keller, “Program Language Translation Using a Grammar-Driven Tree-to-Tree Model,” arXiv preprint arXiv:1807.01784, 2018, https://doi.org/10.48550/arXiv.1807.01784.

“Tree-to-Tree Neural Networks for Program Translation, author=Chen, Xinyun and Liu, Chang and Song, Dawn, journal=arXiv preprint arXiv:1802.03691, year=2018, note=”https://doi.org/10.48550/arxiv.1802.03691”.

M.-A. Lachaux, B. Roziere, L. Chanussot, and G. Lample, “Unsupervised Translation of Programming Languages,” arXiv preprint arXiv:2006.03511, 2020, https://doi.org/10.48550/arXiv.2006.03511.

P. Yin and G. Neubig, “A Syntactic Neural Model for General-Purpose Code Generation,” arXiv preprint arXiv:1704.01696, 2017, https://doi.org/10.48550/arXiv.1704.01696.

M. Rabinovich, M. Stern, and D. Klein, “Abstract Syntax Networks for Code Generation and Semantic Parsing,” arXiv preprint arXiv:1704.07535, 2017, https://doi.org/10.48550/arXiv.1704.07535.

Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, and S. Nakamura, “Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation,” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 574–584, https://doi.org/10.1109/ASE.2015.36.

A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Migrating Code with Statistical Machine Translation,” in Companion Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 544–547, https://doi.org/10.1145/2591062.2591072.

OMG. (2016) About the Meta Object Facility Specification, Version 2.5.1. [Online]. Available: https://www.omg.org/spec/MOF.

T. Parr, The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013, ISBN=978-1-934356-99-9.

Mapstruct. (2018, Jul) Mapstruct Java Bean. [Online]. Available: https://maven.apache.org/.

W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 261–271, https://doi.org/10.1109/SANER48275.2020.9054857.

J. Navas-Sú and A. González-Torres, “A Method to Extract Indirect Coupling and Measure its Complexity,” in 2018 International Conference on Information Systems and Computer Science (INCISCOS). IEEE, 2018, pp. 186–192, https://doi.org/10.1109/INCISCOS.2018.00034.

Downloads

Published

2023-10-01

How to Cite

Leiton-Jimenez, J., Barboza-Artavia, L., Gonzalez-Torres, A., Brenes-Jimenez, P., Pacheco-Portuguez, S., Navas-Su, J., … Arce-Orozco, A. (2023). GAST: A generic AST representation for language-independent source code analysis. Enfoque UTE, 14(4), 9–18. https://doi.org/10.29019/enfoqueute.957

Issue

Section

Miscellaneous