GAST: A generic AST representation for language-independent source code analysis

Jason Leiton-Jimenez; Luis Barboza-Artavia; Antonio Gonzalez-Torres; Pablo Brenes-Jimenez; Steven Pacheco-Portuguez; Jose Navas-Su; Marco Hernández-Vasquez; Jennier Solano-Cordero; Franklin Hernandez-Castro; Ignacio Trejos-Zelaya; Armando Arce-Orozco

doi:10.29019/enfoqueute.957

Authors

Jason Leiton-Jimenez Department of Computer Engineering at the Costa Rica Institute of Technology https://orcid.org/0000-0002-6271-6595
Luis Barboza-Artavia Department of Computer Engineering at the Costa Rica Institute of Technology https://orcid.org/0009-0000-7524-8068
Antonio Gonzalez-Torres Department of Computer Engineering at the Costa Rica Institute of Technology https://orcid.org/0000-0001-5427-0637
Pablo Brenes-Jimenez School of Computing at the Costa Rica Institute of Technology https://orcid.org/0000-0001-8394-3853
Steven Pacheco-Portuguez School of Computing at the Costa Rica Institute of Technology https://orcid.org/0000-0001-7505-1644
Jose Navas-Su School of Computing at the Costa Rica Institute of Technology https://orcid.org/0000-0003-3431-0122
Marco Hernández-Vasquez Department of Computer Engineering at the Costa Rica Institute of Technology https://orcid.org/0000-0002-9432-721X
Jennier Solano-Cordero Department of Computer Engineering at the Costa Rica Institute of Technology https://orcid.org/0000-0002-0983-6512
Franklin Hernandez-Castro School of Industrial Design at the Costa Rica Institute of Technology https://orcid.org/0000-0003-3589-4588
Ignacio Trejos-Zelaya School of Computing at the Costa Rica Institute of Technology https://orcid.org/0000-0003-4361-8444
Armando Arce-Orozco School of Computing at the Costa Rica Institute of Technology https://orcid.org/0000-0001-5005-5745

DOI:

https://doi.org/10.29019/enfoqueute.957

Keywords:

Code transformation, Generic Abstract Syntax Tree, Generic Language, Code Analysis

Abstract

Organizations use various programming languages to develop their systems. These aim to take advantage of the most appropriate features of each language for a given domain and require programmers to command different languages and also to face the growing complexity of software development and maintenance. So, they need tools to help them analyze programs to identify relationships between their internal elements, uncover patterns, and calculate quality metrics. However, most tools have limited support for parsing multiple programming languages and high acquisition costs. Therefore, there is a need for new methods to analyze code written in multiple programming languages. This article describes the design of a method to automatically transform the syntax of various programming languages into a universal language with a generic syntax. The function of the generic language is to encapsulate the specificities of each specific language, so that the analysis of programs is facilitated in a single programming syntax and not in multiple syntaxes. The advantage of this approach is that only one analysis engine is required, not multiple code analyzers, to study the programs.

Downloads

Download data is not yet available.

References

S. Nanz and C. A. Furia, “A Comparative Study of Programming Languages in Rosetta Code,” p. 778–788, 2015, https://doi.org/10.1109/ICSE.2015.90.

Z. Mushtaq, G. Rasool, and B. Shehzad, “Multilingual Source Code Analysis: A Systematic Literature Review,” IEEE Access, vol. 5, pp. 11 307–11 336, 2017, https://doi.org/10.1109/ACCESS.2017.2710421.

M. D’Ambros, H. Gall, M. Lanza, and M. Pinzger, “Analyzing software repositories to understand software evolution,” pp. 37–67, 2008, ISBN: 978-3-540-76440-3.

Sonar, “Sonarqube,” Electronic, Jun 2023. [Online]. Available: https://www.sonarsource.com.

O. Nierstrasz, S. Ducasse, and T. Gˇırba, “The Story of Moose: an Agile Reengineering Environment,” ACM SIGSOFT Software Engineering Notes, vol. 30, no. 5, pp. 1–10, 2005, https://doi.org/10.1145/1095430.1081707.

M. Papadakis, M. Delamaro, and Y. Le Traon, “Mitigating the Effects of Equivalent Mutants with Mutant Classification Strategies,” Science of Computer Programming, vol. 95, pp. 298–319, 2014, https://doi.org/10.1016/j.scico.2014.05.012.

F. A. Bastidas and M. Pe´rez, “A Systematic Review on Transpiler Usage for Transaction-Oriented Applications,” in 2018 IEEE Third Ecuador Technical Chapters Meeting (ETCM). IEEE, 2018, pp. 1–6, https://doi.org/10.1109/ETCM.2018.8580312.

D. L. Whitfield and M. L. Soffa, “An Approach for Exploring Code Improving Transformations,” ACM Transactions on Programming Languages Systems, vol. 19, no. 6, p. 1053–1084, Nov 1997, https://doi.org/10.1145/267959.267960.

K. An, N. Meng, and E. Tilevich, “Automatic Inference of Java-to-Swift Translation Rules for Porting Mobile Applications,” in Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, 2018, pp. 180–190, https://doi.org/10.1145/3197231.

T. Dirgahayu, S. N. Huda, Z. Zukhri, and C. I. Ratnasari, “Automatic Translation from Pseudocode to Source Code: A Conceptual-Metamodel Approach,” in 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom). IEEE, 2017, pp. 122–128, https://doi.org/10.1109/CYBERNETICSCOM.2017.8311696.

N. Shetty, N. Saldanha, and M. Thippeswamy, “CRUST: AC/C++ to Rust Transpiler Using a “Nano-parser Methodology” to Avoid C/C++ Safety Issues in Legacy Code,” pp. 241–250, 2019, ISBN:978-981-16-1344-9.

M. Drissi, O. Watkins, A. Khant, V. Ojha, P. Sandoval, R. Segev, E. Weiner, and R. Keller, “Program Language Translation Using a Grammar-Driven Tree-to-Tree Model,” arXiv preprint arXiv:1807.01784, 2018, https://doi.org/10.48550/arXiv.1807.01784.

“Tree-to-Tree Neural Networks for Program Translation, author=Chen, Xinyun and Liu, Chang and Song, Dawn, journal=arXiv preprint arXiv:1802.03691, year=2018, note=”https://doi.org/10.48550/arxiv.1802.03691”.

M.-A. Lachaux, B. Roziere, L. Chanussot, and G. Lample, “Unsupervised Translation of Programming Languages,” arXiv preprint arXiv:2006.03511, 2020, https://doi.org/10.48550/arXiv.2006.03511.

P. Yin and G. Neubig, “A Syntactic Neural Model for General-Purpose Code Generation,” arXiv preprint arXiv:1704.01696, 2017, https://doi.org/10.48550/arXiv.1704.01696.

M. Rabinovich, M. Stern, and D. Klein, “Abstract Syntax Networks for Code Generation and Semantic Parsing,” arXiv preprint arXiv:1704.07535, 2017, https://doi.org/10.48550/arXiv.1704.07535.

Y. Oda, H. Fudaba, G. Neubig, H. Hata, S. Sakti, T. Toda, and S. Nakamura, “Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation,” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 574–584, https://doi.org/10.1109/ASE.2015.36.

A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Migrating Code with Statistical Machine Translation,” in Companion Proceedings of the 36th International Conference on Software Engineering, 2014, pp. 544–547, https://doi.org/10.1145/2591062.2591072.

OMG. (2016) About the Meta Object Facility Specification, Version 2.5.1. [Online]. Available: https://www.omg.org/spec/MOF.

T. Parr, The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013, ISBN=978-1-934356-99-9.

Mapstruct. (2018, Jul) Mapstruct Java Bean. [Online]. Available: https://maven.apache.org/.

W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 261–271, https://doi.org/10.1109/SANER48275.2020.9054857.

J. Navas-Sú and A. González-Torres, “A Method to Extract Indirect Coupling and Measure its Complexity,” in 2018 International Conference on Information Systems and Computer Science (INCISCOS). IEEE, 2018, pp. 186–192, https://doi.org/10.1109/INCISCOS.2018.00034.

JCR(JCI)	Q4(0.13)
REDIB Journals Ranking	Q2(16.594)
Google Scholar Citations	3405
Google Scholar h-index	25
Google Scholar i10-index	103
OAJI Impact Factor	0.351
MIAR ICDS	7.5
Index Copernicus Value	94.81
Dimensions	351(0.93)

	2022	2021
Submissions Received	92	65
Submissions Accepted	29	22
Submissions Declined	53	41
Submissions Published	24	24
Days to Accept (x̄)	104	131
Days to Reject (x̄)	35	34
Acceptance Rate	24%	35%
Rejection Rate	76%	65%

GAST: A generic AST representation for language-independent source code analysis

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

followus

indices

impacto

stats

Current Issue