Profile Picture

Greta Dolcetti

PhD Student @ Ca' Foscari University of Venice

SSV Research Group

Short Bio

I obtained my Bachelor's Degree in Computer Science in September 2021 and my Master's Degree in July 2023, both from the University of Parma. During my Master's thesis, titled "Abstract Compilation Techniques for Static Analysis", I began working in the field of static analysis by abstract interpretation. Under the supervision of Professors Arceri and Zaffanella, I implemented two abstract compilation techniques focusing on the accuracy/efficiency trade-off. During my Master's degree, I also collaborated with the Department of Food and Drugs at the University of Parma on a research project that involved using constraint programming and molecular modeling methods to predict possible protein mutations of the SARS-CoV-2 virus.

I am currently a Ph.D. student at Ca’ Foscari University of Venice, where my research interests include static analysis, abstract interpretation, and, more broadly, formal methods for program verification with a major focus on data science and artificial intelligence.

Education

Ph.D. Student in Computer Science

Ca' Foscari University of Venice
09/2023 – Current
Venice, Italy

Master's Degree in Computer Science

Thesis: Abstract Compilation Techniques for Static Analysis

Advisor: Prof. Enea Zaffanella

University of Parma
09/2021 – 13/07/2023
Parma, Italy
Final grade: 110L/110 | Level in EQF: 7

The main contribution of the thesis was the implementation of two instances of abstract compilation techniques in the Crab/Clam static analysis framework: the split operator and the likely unconstrained variable oracles.

The former provides a new operator and approach for the filter operation on a predicate on abstract domains, avoiding the inefficiencies of the default procedure, and improving the efficiency of the analysis without losing precision.

The latter proposes four different implementations of an oracle whose purpose is to suggest which variables are likely to be unconstrained in the CFG's statements. By systematically propagating the guessed lack of information, the oracle guides the simplification of the abstract program, leading to remarkable speedups.

Bachelor's Degree in Computer Science

Thesis: Development of an Interface for Explainable AI for CAT Scan Analysis

Advisor: Prof. Alessandro Dal Palù

University of Parma
09/2018 – 09/2021
Parma, Italy
Final grade: 110/110 | Level in EQF: 6

The main contribution of the thesis is the implementation of a framework that allows the user to understand clearly, in natural language, the motivations behind the result (i.e., the arteries retrieved inside the CAT scan and their connections) produced by the Answer Set Programming reasoner. The framework allows the user to also add new rules, expressed in natural language and then translated in AnsProlog, to the knowledge base used by the reasoner.

High School Diploma (Maturità Classica)

Liceo Classico Enrico Fermi
09/2013 – 06/2018
Salò, Italy
Final grade: 89/100

Publications

Dolcetti G., Cortesi A., Urban C. & Zaffanella E. (2024). Towards a High Level Linter for Data Science. Proceedings of 10th International Workshop on Numerical and Symbolic Abstract Domains (NSAD 2024), 22 October 2024, Pasadena, California, United States.

TBA

Due to its interdisciplinary nature, the development of data science code is subject to a wide range of potential mistakes that can easily compromise the final results. Several tools have been proposed that can help the data scientist in identi- fying the most common, low level programming issues. We discuss the steps needed to implement a tool that is rather meant to focus on higher level errors that are specific of the data science pipeline. To this end, we propose a static analysis assigning ad hoc abstract datatypes to the program variables, which are then checked for consistency when call- ing functions defined in data science libraries. By adopting a descriptive (rather than prescriptive) abstract type system, we obtain a linter tool reporting data science related code smells. While being still work in progress, the current proto- type is able to identify and report the code smells contained in several examples of questionable data science code.

Arceri V., Merenda S. M., Dolcetti G., Negrini L., Olivieri L., & Zaffanella E. (2024). Towards a Sound Construction of EVM Bytecode Control-flow Graphs. Proceedings of Formal Techniques for Java-like Programs (FTfJP 2024), 20 September 2024, Vienna, Austria.

@inproceedings{DBLP:conf/ftfjp/ArceriMDNOZ24, author = {Vincenzo Arceri and Saverio Mattia Merenda and Greta Dolcetti and Luca Negrini and Luca Olivieri and Enea Zaffanella}, editor = {Luca Di Stefano}, title = {Towards a Sound Construction of {EVM} Bytecode Control-Flow Graphs}, booktitle = {Proceedings of the 26th {ACM} International Workshop on Formal Techniques for Java-like Programs, FTfJP 2024, Vienna, Austria, 20 September 2024}, pages = {11--16}, publisher = {{ACM}}, year = {2024}, url = {https://doi.org/10.1145/3678721.3686227}, doi = {10.1145/3678721.3686227}, timestamp = {Tue, 22 Oct 2024 21:07:21 +0200}, biburl = {https://dblp.org/rec/conf/ftfjp/ArceriMDNOZ24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

Ethereum enables the creation and execution of decentralized applications through smart contracts, that are compiled to Ethereum Virtual Machine (EVM) bytecode. Once deployed in the blockchain, the bytecode is immutable; hence, ensuring that smart contracts are bug-free before their deployment is of utmost importance. A crucial preliminary step for any effective static analysis of EVM bytecode is the extraction of the control-flow graph (CFG): this presents significant challenges due to dynamically computed jump destinations. In this paper we present a novel approach, based on abstract intepretation, aiming at building a sound CFG from EVM bytecode smart contracts. Our analysis, which is implemented in our static analyzer EVMLiSA, is based on a parametric abstract domain that approximates concrete execution stacks at each program point as a $l$-sized set of abstract stacks of maximal height $h$; the results of the analysis are then used to resolve the jump destinations at jump nodes. On our preliminary experiments, by fine-tuning the analysis parameters, EVMLiSA achives sound CFGs for all smart contracts where permantent storage-related opcodes do not influence jump destinations.

Arceri V., Dolcetti G. & Zaffanella E. (2024). Speeding up Static Analysis with the Split Operator. International Journal on Software Tools for Technology Transfer, Special Issue for SOAP 2023.

@article{DBLP:journals/sttt/ArceriDZ24, author = {Vincenzo Arceri and Greta Dolcetti and Enea Zaffanella}, title = {Speeding up static analysis with the split operator}, journal = {Int. J. Softw. Tools Technol. Transf.}, volume = {26}, number = {5}, pages = {573--588}, year = {2024}, url = {https://doi.org/10.1007/s10009-024-00761-2}, doi = {10.1007/S10009-024-00761-2}, timestamp = {Mon, 07 Oct 2024 22:58:44 +0200}, biburl = {https://dblp.org/rec/journals/sttt/ArceriDZ24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

In the context of abstract interpretation-based static analysis, we propose a new abstract operator modeling the split of control flow paths: the goal of the operator is to enable a more efficient analysis when using abstract domains that are computationally expensive, having no negative effect on precision, and occasionally resulting in a more precise analysis. We focus on the case of conditional branches guarded by numeric linear constraints, including implicit numerical branches. We provide an experimental evaluation of real-world test cases, showing that by using the split operator we can achieve significant efficiency improvements with respect to the classical approach for a static analysis based on the domain of convex polyhedra. We also briefly discuss the applicability of this new operator to different, possibly non-numeric abstract domains.

Arceri V., Dolcetti G. & Zaffanella E. (2023). Speeding up Static Analysis with the Split Operator. Proceedings of 12th ACM SIGPLAN International Workshop on the State of the Art in Program Analysis, SOAP@PLDI 2023, June 17, 2023, Orlando, Florida, United States.

@inproceedings{DBLP:conf/pldi/ArceriDZ23, author = {Vincenzo Arceri and Greta Dolcetti and Enea Zaffanella}, editor = {Pietro Ferrara and Liana Hadarean}, title = {Speeding up Static Analysis with the Split Operator}, booktitle = {Proceedings of the 12th {ACM} {SIGPLAN} International Workshop on the State Of the Art in Program Analysis, {SOAP} 2023, Orlando, FL, USA, 17 June 2023}, pages = {14--19}, publisher = {{ACM}}, year = {2023}, url = {https://doi.org/10.1145/3589250.3596141}, doi = {10.1145/3589250.3596141}, timestamp = {Sun, 12 Nov 2023 02:12:56 +0100}, biburl = {https://dblp.org/rec/conf/pldi/ArceriDZ23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

In the context of static analysis based on Abstract Interpretation, we propose a new abstract operator modeling the split of control flow paths: the goal of the operator is to enable a more efficient analysis when using abstract domains that are computationally expensive, having no effect on precision. Focusing on the case of conditional branches guarded by numeric linear constraints, we provide a preliminary experimental evaluation showing that, by using the split operator, we can achieve significant efficiency improvements for a static analysis based on the domain of convex polyhedra. We also briefly discuss the applicability of this new operator to different, possibly non-numeric abstract domains.

Arceri V., Dolcetti G. & Zaffanella E. (2023). Unconstrained Variable Oracles for Faster Static Analyses. Proceedings of 30th Static Analysis Symposium, SAS 2023, Sun 22 - Tue 24 October 2023 Cascais, Portugal.

@inproceedings{DBLP:conf/sas/ArceriDZ23, author = {Vincenzo Arceri and Greta Dolcetti and Enea Zaffanella}, editor = {Manuel V. Hermenegildo and Jos{\'{e}} F. Morales}, title = {Unconstrained Variable Oracles for Faster Numeric Static Analyses}, booktitle = {Static Analysis - 30th International Symposium, {SAS} 2023, Cascais, Portugal, October 22-24, 2023, Proceedings}, series = {Lecture Notes in Computer Science}, volume = {14284}, pages = {65--83}, publisher = {Springer}, year = {2023}, url = {https://doi.org/10.1007/978-3-031-44245-2\_5}, doi = {10.1007/978-3-031-44245-2\_5}, timestamp = {Thu, 09 Nov 2023 21:13:04 +0100}, biburl = {https://dblp.org/rec/conf/sas/ArceriDZ23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

In the context of static analysis based on Abstract Interpretation, we propose a lightweight pre-analysis step which is meant to suggest, at each program point, which program variables are likely to be unconstrained for a specific class of numeric abstract properties. Using the outcome of this pre-analysis as an oracle, we simplify the statements of the program being analyzed by propagating this lack of information, aiming at fine-tuning the precision/efficiency trade-off of the analysis. A preliminary experimental evaluation shows that the idea underlying the approach is promising, as it improves the efficiency of the more costly analysis while having a limited effect on its precision.

Cozzini, P., Agosta, F., Dolcetti, G., & Dal Palù, A. (2023). A Computational Workflow to Predict Biological Target Mutations: The Spike Glycoprotein Case Study. Molecules, 28(20), 7082.

@Article{molecules28207082, AUTHOR = {Cozzini, Pietro and Agosta, Federica and Dolcetti, Greta and Dal Palù, Alessandro}, TITLE = {A Computational Workflow to Predict Biological Target Mutations: The Spike Glycoprotein Case Study}, JOURNAL = {Molecules}, VOLUME = {28}, YEAR = {2023}, NUMBER = {20}, ARTICLE-NUMBER = {7082}, URL = {https://www.mdpi.com/1420-3049/28/20/7082}, PubMedID = {37894561}, ISSN = {1420-3049}, DOI = {10.3390/molecules28207082} }

The biological target identification process, a pivotal phase in the drug discovery workflow, becomes particularly challenging when mutations affect proteins’ mechanisms of action. COVID-19 Spike glycoprotein mutations are known to modify the affinity toward the human angiotensin-converting enzyme ACE2 and several antibodies, compromising their neutralizing effect. Predicting new possible mutations would be an efficient way to develop specific and efficacious drugs, vaccines, and antibodies. In this work, we developed and applied a computational procedure, combining constrained logic programming and careful structural analysis based on the Structural Activity Relationship (SAR) approach, to predict and determine the structure and behavior of new future mutants. “Mutations rules” that would track statistical and functional types of substitutions for each residue or combination of residues were extracted from the GISAID database and used to define constraints for our software, having control of the process step by step. A careful molecular dynamics analysis of the predicted mutated structures was carried out after an energy evaluation of the intermolecular and intramolecular interactions using the HINT (Hydrophatic INTeraction) force field. Our approach successfully predicted, among others, known Spike mutants.

Cozzini, P., Agosta, F., Dolcetti, G., & Righi, G. (2022). How a Blockchain Approach Can Improve Data Reliability in the COVID-19 Pandemic. ACS Medicinal Chemistry Letters, 13(4), 517-519.

@article{cozzini2022blockchain, title={How a Blockchain Approach Can Improve Data Reliability in the COVID-19 Pandemic}, author={Cozzini, Pietro and Agosta, Federica and Dolcetti, Greta and Righi, Gianfranco}, journal={ACS Medicinal Chemistry Letters}, volume={13}, number={4}, pages={517--519}, year={2022}, publisher={ACS Publications} }

The rapid spread of COVID-19 made it necessary to quickly collect and share viral genomic sequences, sometimes making quantity prevail over the quality of information. Can research pay this price? Blockchain technology, based on the concept of a ledger that guarantees the authenticity and traceability of information, could be the best applicable solution.

Work Experience & Teaching

Research Internship

Inria
Paris, France
09/2024 – 12/2024

Visiting Ph.D. student at Inria in the ANTIQUE research group, hosted at the École normale supérieure in Paris, France.
Topic: static analysis by abstract interpretation of data science and machine learning software.

Research Scholarship

University of Parma
Parma, Italy
12/2021 – 03/2022

Research by constraint programming and molecular modeling methods of possible protein mutations of the SARS-CoV-2 virus.

Teaching Assistant

University of Parma
09/2022 – 06/2023
Parma, Italy

Course: Fondamenti di Programmazione A+B

Professor: Prof. Arceri Vincenzo

Hours: 48

Tutoring

University of Parma
09/2022 – 06/2023
Parma, Italy

Course: Computer Science B.Sc.

Hours: 48

Teaching Assistant

Ragazze Digitali project
06/2022 – 07/2022
Parma, Italy

Hours: 70

Teaching Assistant

Ragazze Digitali project
08/2023 – 09/2023
Parma, Italy

Hours: 26

Skills

Languages

python cplusplus java sql matlab linux latex

Frameworks and Tools

flask docker kubernetes pytorch scikit-learn pandas numpy matplotlib seaborn git

Projects

Integration of PPLite Domains in Crab/Clam Static Analysis Framework

Contributed to the Crab/Clam static analysis framework by integrating PPLite domains, enhancing the framework's capabilities for abstract interpretation.

Integration of the Decoupling Feature and Domains in Crab/Clam Static Analysis Framework

Implemented the decoupling feature and integrated additional domains into the Crab/Clam static analysis framework, improving its modularity and efficiency.

Implementation of Split Operator and Variable Oracles in Crab/Clam Static Analysis Framework

Developed and implemented the split operator and variable oracles in the Crab/Clam static analysis framework, focusing on enhancing the accuracy/efficiency trade-off in static analysis.

Implementation of Likely Unconstrained Variable Oracles in Crab/Clam Static Analysis Framework

Implemented likely unconstrained variable oracles in the Crab/Clam static analysis framework, providing a more efficient approach to handling unconstrained variables during analysis.

Implementation of a High-Level Linter for Data Science Based on Abstract Data Types in Lyra

Developed a high-level linter for data science applications based on abstract data types within the Lyra framework, aimed at improving code quality and adherence to best practices in data science projects.