Publications

Published works directly discussed in the thesis

01 :: Learning to predict software testability

Software testability is the propensity of code to reveal its existing faults, particularly during automated testing. Testing success depends on the testability of the program under test. On the other hand, testing success relies on the coverage of the test data provided by a given test data generation algorithm. However, little empirical evidence has been shown to clarify whether and how software testability affects test coverage. In this article, we propose a method to shed light on this subject. Our proposed framework uses the coverage of Software Under Test (SUT), provided by different automatically generated test suites, to build machine learning models, determining the testability of programs based on many source code metrics. The resultant models can predict the code coverage provided by a given test data generation algorithm before running the algorithm, reducing the cost of additional testing. The predicted coverage is used as a concrete proxy to quantify source code testability. Experiments show an acceptable accuracy of 81.94% in measuring and predicting software testability.

Read more at 2021 26th International Computer Conference, Computer Society of Iran (CSICC)
Note: This paper is based on my oral presentation at 26th International Computer Conference, Computer Society of Iran (CSICC), K. N. Toosi University of Technology.
Note: The slides and presentation will be available upon request.
Related chapters: Ch. 4, Ch. 5

02 :: Learning to predict test effectiveness

The high cost of the test can be dramatically reduced, provided that the coverability as an inherent feature of the code under test is predictable. This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability. The prediction model consists of an ensemble of four regression models. The learning samples consist of feature vectors, where features are source code metrics computed for a class. The samples are labeled by the Coverageability values computed for their corresponding classes. We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class. We extend the size of the feature space by introducing a new approach to define submetrics in terms of existing source code metrics. Using feature importance analysis on the learned prediction models, we sort sources code metrics in the order of their impact on the test effectiveness. As a result of which we found the class strict cyclomatic complexity as the most influential source code metric. Our experiments with our prediction models on a large corpus of Java projects containing about 23,000 classes demonstrate the Mean Absolute Error (MAE) of 0.032, Mean-Squared Error (MSE) of 0.004, and an R2 score of 0.855. Compared with the state-of-the-art coverage prediction models, our models improve MAE, MSE, and an R2 score by 5.78%, 2.84%, and 20.71%, respectively.

Read more at 2022 International Journal of Intelligent Systems (JCR IF: 8.993, SJR Quartile: Q1)
Note: This paper was selected as the Front Cover of International Journal of Intelligent Systems, Volume 37, Issue 8, August 2022
Related chapters: Ch. 4, Ch. 5

03 :: An ensemble meta-estimator to predict source code testability

Unlike most other software quality attributes, testability cannot be evaluated solely based on the characteristics of the source code. The effectiveness of the test suite and the budget assigned to the test highly impact the testability of the code under test. The size of a test suite determines the test effort and cost, while the coverage measure indicates the test effectiveness. Therefore, testability can be measured based on the coverage and number of test cases provided by a test suite, considering the test budget. This paper offers a new equation to estimate testability regarding the size and coverage of a given test suite. The equation has been used to label 23,000 classes belonging to 110 Java projects with their testability measure. The labeled classes were vectorized using 262 metrics. The labeled vectors were fed into a family of supervised machine learning algorithms, regression, to predict testability in terms of the source code metrics. Regression models predicted testability with an R2 of 0.68 and a mean squared error of 0.03, suitable in practice. Fifteen software metrics highly affecting testability prediction were identified using a feature importance analysis technique on the learned model. The proposed models have improved mean absolute error by 38% due to utilizing new criteria, metrics, and data compared with the relevant study on predicting branch coverage as a test criterion. As an application of testability prediction, it is demonstrated that automated refactoring of 42 smelly Java classes targeted at improving the 15 influential software metrics could elevate their testability by an average of 86.87%.

Read more at 2022 Applied Soft Computing (JCR IF: 8.7, SJR Quartile: Q1)
Note: This paper was published with a reproducible Capsule publicly available at CodeOcean
Related chapters: Ch. 5

04 :: Method name recommendation based on source code metrics

Method naming is a critical factor in program comprehension, affecting software quality. State-of-the-art naming techniques use deep learning to compute the methods’ similarity considering their textual contents. They highly depend on identifiers’ names and do not compute semantical interrelations among methods’ instructions. Source code metrics compute such semantical interrelations. This article proposes using source code metrics to measure semantical and structural cross-project similarities. The metrics constitute features of a KNN model, determining the k-most similar methods to a given method. Experiments with 4000000 Java methods on the proposed model demonstrate improvements in precision and recall of state-of-the-arts with 4.25 and 12.08%.

Read more at 2023 Journal of Computer Languages (JCR IF: 2.2, SJR Quartile: Q2)
Note: This paper was published in the Special Issue on Machine Learning in Programming Languages and Compilers, Journal of Computer Languages, Volume 74, January 2023.
Related chapters: Ch. 4

05 :: A Systematic literature review on the code smells datasets and validation mechanisms

The accuracy reported for code smell-detecting tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.

Read more at 2023 ACM Computing Surveys (JCR IF: 16.6, SJR Quartile: Q1)
Related chapters: Ch. 3, Ch. 4, Ch. 5

06 :: A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges

Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.

Read more at 2023 Journal of Systems and Software (JCR IF: 3.5, SJR Quartile: Q1)
Related chapters: Ch. 3, Ch. 4

Published works relevant to the thesis

07 :: An automated extract method refactoring approach to correct the long method code smell

Long Method is amongst the most common code smells in software systems. Despite various attempts to detect the long method code smell, few automated approaches are presented to refactor this smell. Extract Method refactoring is mainly applied to eliminate the Long Method smell. However, current approaches still face serious problems such as insufficient accuracy in detecting refactoring opportunities, limitations on correction types, the need for human intervention in the refactoring process, and lack of attention to object-oriented principles, mainly single responsibility and cohesion–coupling principles. This paper aims to automatically identify and refactor the long method smells in Java codes using advanced graph analysis techniques, addressing the aforementioned difficulties. First, a graph representing project entities is created. Then, long method smells are detected, considering the methods’ dependencies and sizes. All possible refactorings are then extracted and ranked by a modularity metric, emphasizing high cohesion and low coupling classes for the detected methods. Finally, a proper name is assigned to the extracted method based on its responsibility. Subsequently, the best destination class is determined such that design modularity is maximized. Experts’ opinion is used to evaluate the proposed approach on five different Java projects. The results show the applicability of the proposed method in establishing the single responsibility principle with a 21% improvement compared to the state-of-the-art extract method refactoring approaches.

Read more at 2022 Journal of Systems and Software (JCR IF: 3.5, SJR Quartile: Q1)
Related chapters: Ch. 5, Ch. 6

08 :: Format-aware learn&fuzz: deep test data generation for efficient fuzzing

Appropriate test data are a crucial factor to succeed in fuzz testing. Most of the real-world applications, however, accept complex structure inputs containing data surrounded by meta-data which is processed in several stages comprising of the parsing and rendering (execution). The complex structure of some input files makes it difficult to generate efficient test data automatically. The success of deep learning to cope with complex tasks, specifically generative tasks, has motivated us to exploit it in the context of test data generation for complicated structures such as PDF files. In this respect, a neural language model (NLM) based on deep recurrent neural networks (RNNs) is used to learn the structure of complex inputs. To target both the parsing and rendering steps of the software under test (SUT), our approach generates new test data while distinguishing between data and meta-data that significantly improve the input fuzzing. To assess the proposed approach, we have developed a modular file format fuzzer, IUST-DeepFuzz. Our experimental results demonstrate the relatively high coverage of MuPDF code by our proposed fuzzer, IUST-DeepFuzz, in comparison with the state-of-the-art tools such as learn&fuzz, AFL, Augmented-AFL, and random fuzzing. In summary, our experiments with many deep learning models revealed the fact that the simpler the deep learning models applied to generate test data, the higher the code coverage of the software under test will be.

Read more at 2021 Neural Computing and Applications (JCR IF: 6.0, SJR Quartile: Q1)
Related chapters: Ch. 1

Preprints and under review works

09 :: Supporting single responsibility through automated extract method refactoring

(Related chapters: Ch. 5, Ch. 6)

10 :: Measuring and improving software testability at the design level

(Related chapters: Ch. 6)

11 :: Flipped boosting of automatic test data generation frameworks through a many-objective program transformation approach

(Related chapters: Ch. 7)

12 :: Natural language requirements testability measurement based on requirement smells

(Related chapters: Ch. 8)

13 :: Testability-driven development: An improvement to the TDD efficiency

(Related chapters: Ch. 8)