GAST: A generic AST representation for language-independent source code analysis
DOI:
https://doi.org/10.29019/enfoqueute.957Keywords:
Code transformation, Generic Abstract Syntax Tree, Generic Language, Code AnalysisAbstract
Organizations use various programming languages to develop their systems. These aim to take advantage of the most appropriate features of each language for a given domain and require programmers to command different languages and also to face the growing complexity of software development and maintenance. So, they need tools to help them analyze programs to identify relationships between their internal elements, uncover patterns, and calculate quality metrics. However, most tools have limited support for parsing multiple programming languages and high acquisition costs. Therefore, there is a need for new methods to analyze code written in multiple programming languages. This article describes the design of a method to automatically transform the syntax of various programming languages into a universal language with a generic syntax. The function of the generic language is to encapsulate the specificities of each specific language, so that the analysis of programs is facilitated in a single programming syntax and not in multiple syntaxes. The advantage of this approach is that only one analysis engine is required, not multiple code analyzers, to study the programs.
Downloads
References
S. Nanz and C. A. Furia, “A Comparative Study of Programming Languages in Rosetta Code,” p. 778–788, 2015, https://doi.org/10.1109/ICSE.2015.90.
Z. Mushtaq, G. Rasool, and B. Shehzad, “Multilingual Source Code Analysis: A Systematic Literature Review,” IEEE Access, vol. 5, pp. 11 307–11 336, 2017, https://doi.org/10.1109/ACCESS.2017.2710421.
M. D’Ambros, H. Gall, M. Lanza, and M. Pinzger, “Analyzing software repositories to understand software evolution,” pp. 37–67, 2008, ISBN: 978-3-540-76440-3.
Sonar, “Sonarqube,” Electronic, Jun 2023. [Online]. Available: https://www.sonarsource.com.
O. Nierstrasz, S. Ducasse, and T. Gˇırba, “The Story of Moose: an Agile Reengineering Environment,” ACM SIGSOFT Software Engineering Notes, vol. 30, no. 5, pp. 1–10, 2005, https://doi.org/10.1145/1095430.1081707.
M. Papadakis, M. Delamaro, and Y. Le Traon, “Mitigating the Effects of Equivalent Mutants with Mutant Classification Strategies,” Science of Computer Programming, vol. 95, pp. 298–319, 2014, https://doi.org/10.1016/j.scico.2014.05.012.
F. A. Bastidas and M. Pe´rez, “A Systematic Review on Transpiler Usage for Transaction-Oriented Applications,” in 2018 IEEE Third Ecuador Technical Chapters Meeting (ETCM). IEEE, 2018, pp. 1–6, https://doi.org/10.1109/ETCM.2018.8580312.
D. L. Whitfield and M. L. Soffa, “An Approach for Exploring Code Improving Transformations,” ACM Transactions on Programming Languages Systems, vol. 19, no. 6, p. 1053–1084, Nov 1997, https://doi.org/10.1145/267959.267960.
K. An, N. Meng, and E. Tilevich, “Automatic Inference of Java-to-Swift Translation Rules for Porting Mobile Applications,” in Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, 2018, pp. 180–190, https://doi.org/10.1145/3197231.
T. Dirgahayu, S. N. Huda, Z. Zukhri, and C. I. Ratnasari, “Automatic Translation from Pseudocode to Source Code: A Conceptual-Metamodel Approach,” in 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom). IEEE, 2017, pp. 122–128, https://doi.org/10.1109/CYBERNETICSCOM.2017.8311696.
N. Shetty, N. Saldanha, and M. Thippeswamy, “CRUST: AC/C++ to Rust Transpiler Using a “Nano-parser Methodology” to Avoid C/C++ Safety Issues in Legacy Code,” pp. 241–250, 2019, ISBN:978-981-16-1344-9.
M. Drissi, O. Watkins, A. Khant, V. Ojha, P. Sandoval, R. Segev, E. Weiner, and R. Keller, “Program Language Translation Using a Grammar-Driven Tree-to-Tree Model,” arXiv preprint arXiv:1807.01784, 2018, https://doi.org/10.48550/arXiv.1807.01784.
“Tree-to-Tree Neural Networks for Program Translation, author=Chen, Xinyun and Liu, Chang and Song, Dawn, journal=arXiv preprint arXiv:1802.03691, year=2018, note=”https://doi.org/10.48550/arxiv.1802.03691”.
M.-A. Lachaux, B. Roziere, L. Chanussot, and G. Lample, “Unsupervised Translation of Programming Languages,” arXiv preprint arXiv:2006.03511, 2020, https://doi.org/10.48550/arXiv.2006.03511.
P. Yin and G. Neubig, “A Syntactic Neural Model for General-Purpose Code Generation,” arXiv preprint arXiv:1704.01696, 2017, https://doi.org/10.48550/arXiv.1704.01696.
M. Rabinovich, M. Stern, and D. Klein, “Abstract Syntax Networks for Code Generation and Semantic Parsing,” arXiv preprint arXiv:1704.07535, 2017, https://doi.org/10.48550/arXiv.1704.07535.
Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, and S. Nakamura, “Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation,” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 574–584, https://doi.org/10.1109/ASE.2015.36.
A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Migrating Code with Statistical Machine Translation,” in Companion Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 544–547, https://doi.org/10.1145/2591062.2591072.
OMG. (2016) About the Meta Object Facility Specification, Version 2.5.1. [Online]. Available: https://www.omg.org/spec/MOF.
T. Parr, The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013, ISBN=978-1-934356-99-9.
Mapstruct. (2018, Jul) Mapstruct Java Bean. [Online]. Available: https://maven.apache.org/.
W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 261–271, https://doi.org/10.1109/SANER48275.2020.9054857.
J. Navas-Sú and A. González-Torres, “A Method to Extract Indirect Coupling and Measure its Complexity,” in 2018 International Conference on Information Systems and Computer Science (INCISCOS). IEEE, 2018, pp. 186–192, https://doi.org/10.1109/INCISCOS.2018.00034.
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Los Autores
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The articles and research published by the UTE University are carried out under the Open Access regime in electronic format. This means that all content is freely available without charge to the user or his/her institution. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, without asking prior permission from the publisher or the author. This is in accordance with the BOAI definition of open access. By submitting an article to any of the scientific journals of the UTE University, the author or authors accept these conditions.
The UTE applies the Creative Commons Attribution (CC-BY) license to articles in its scientific journals. Under this open access license, as an author you agree that anyone may reuse your article in whole or in part for any purpose, free of charge, including commercial purposes. Anyone can copy, distribute or reuse the content as long as the author and original source are correctly cited. This facilitates freedom of reuse and also ensures that content can be extracted without barriers for research needs.
This work is licensed under a Creative Commons Attribution 3.0 International (CC BY 3.0).
The Enfoque UTE journal guarantees and declares that authors always retain all copyrights and full publishing rights without restrictions [© The Author(s)]. Acknowledgment (BY): Any exploitation of the work is allowed, including a commercial purpose, as well as the creation of derivative works, the distribution of which is also allowed without any restriction.