Inlining in Soufflé

Soufflé now supports the ability to manually select relations to inline. A talk was presented on the process at the University of Sydney’s Programming Languages and Compiler research group meeting. Slides for the talk can be found here.

Soufflé 1.2.0 Released

Hello! We have the third open-source release of Soufflé for you. Key features are

  1. Interactive provenance browser (taipan-snake)
  2. Compilation to subprograms for separate execution (lyndonhenry)
  3. Magic Sets (azreika)
  4. Sum aggregator fixed
  5. gcc7 optimisations
  6. File IO optimisations
  7. Pragmas to enable argument options (e.g., .pragma "-mtest")
  8. Various minor bug fixes and usability enhancements

PLDI Tutorial

We ran our second tutorial Engineering Static Analyzers with Soufflé on the 23rd of June, 2017, at PLDI’17. We had approximately 25 participants. The tutorial was given by Bernhard Scholz, Pavle Subotic, and Alexander Jordan. You find some of the slides here.

An efficient tunable selective points-to analysis for large codebases

This is the paper that uses Souffle. It was published in SOAP@PLDI’17.


Points-to analysis is a fundamental static program analysis technique for tools including compilers and bug-checkers. Although object-based context sensitivity is known to improve precision of points-to analysis, scaling it for large Java codebases remains a challenge.

In this work, we develop a tunable, client-independent, object-sensitive points-to analysis framework where heap cloning is applied selectively. This approach is aimed at large codebases where standard analysis is typically expensive. Our design includes a pre-analysis that determines program points that contribute to the cost of an object-sensitive points-to analysis. A subsequent analysis then determines the context depth for each allocation site. While our framework can run standalone, it is also possible to tune it – the user of the framework can use the knowledge of the codebase being analysed to influence the selection of expensive program points as well as the process to differentiate the required context-depth. Overall, the approach determines where the cloning is beneficial and where the cloning is unlikely to be beneficial.

We have implemented our approach using Souffle (a Datalog compiler) and an extension of the DOOP framework. Our experiments on large programs, including OpenJDK, show that our technique is efficient and precise. For the OpenJDK, our analysis reduces 27% of runtime and 18% of memory usage in comparison with 2O1H points-to analysis for a negligible loss of precision, while for Jython from the DaCapo benchmark suite, the same analysis reduces 91% of runtime for no loss of precision.

Soufflé 1.1.0 Released

Hello! We have the second open-source release of Soufflé for you. Key features are

  1. Configurable I/O System with more options and support for new language extensions (mmcgr).
  2. Support of equivalence relation using union/find data-structures (pnappa) New profiling tool rewritten in C++ with HTML/Javascript output (DominicRomanowski).
  3. Replacing the Boost C-prepocessor wave by mcpp (pnappa)
  4. Adding ternary functors (b-scholz)
  5. JNI interface (psubotic)
  6. Memory optimisations (lyndonhenry)
  7. Numerous bug fixes.

First conference tutorial

We ran our first tutorial on the 5th of December, 2016, at APSEC’16. We had 11 participants. The tutorial was run by Bernhard Scholz, Raghavendra K. R., and Alexander Jordan. You find some of the slides here.

Apsec'16 Tutorial

Soufflé: On Synthesis of Program Analyzers

This is the tool paper about Souffle. It was published in CAV’16.


Soufflé is an open source programming framework that performs static program analysis expressed in Datalog on very large code bases, including points-to analysis on OpenJDK7 (1.4M program variables, 350K objects, 160K methods) in under a minute. Soufflé is being successfully used for Java security analyses at Oracle Labs due to (1) its high-performance, (2) support for rapid program analysis development, and (3) customizability. Soufflé incorporates the highly flexible Datalog-based program analysis paradigm while exhibiting performance results that are on-par with manually developed state-of-the-art tools. In this tool paper, we introduce the Soufflé architecture, usage and demonstrate its applicability for large-scale code analysis on the OpenJDK7 library as a use case.

Soufflé 1.0.0 Released

Hello! We have the first open-source release of Soufflé for you. Key features are

  1. Continuous Integration / Travis support for Souffle

  2. Automatic packaging for Debian and MAC OS X platform.

  3. Multiple Header clauses, and disjunctions in bodies of clauses.

  4. BOOST’s C-preprocessor called wave adapted so that MAC OS X port is functional without a GCC installation.

  5. Nullary relations (i.e., relations with no attributes become attributes).

  6. Liberal identifiers in Souffle programs, e.g., A(?x,?y) :- B(?y,?x).

  7. Enable type declarations in Souffle’s components.

  8. Added bitwise and logical functors, and binary, and hexadecimal constants.

  9. Configuration files for Doxygen documentation.

  10. Numerous bug fixes in all parts of the system.

Soufflé is open-source!

Oracle Labs made Soufflé open-source on the 8th of March, 2016. Soufflé is a Datalog-like language that compiles programs to efficient parallel C++ code. Currently, Soufflé is used as a domain-specific language for static program analysis, over large code bases with millions of lines of code.

Features of Soufflé

  • Efficient translation to parallel C++ of Datalog programs
  • Extended semantics of Safe Datalog, e.g., permitting unbounded recursions with numbers
  • Simple component model for Datalog specifications
  • Recursively defined record types for tuples
  • Static type system for attributes

On fast large-scale program analysis in Datalog

This is the system paper about the architecture of Souffle. It was published in CC’16.


Designing and crafting a static program analysis is challenging due to the complexity of the task at hand. Among the challenges are modelling the semantics of the input language, finding suitable abstractions for the analysis, and handwriting efficient code for the analysis in a traditional imperative language such as C++. Hence, the development of static program analysis tools is costly in terms of development time and resources for real world languages. To overcome, or at least alleviate the costs of developing a static program analysis, Datalog has been proposed as a domain specific language (DSL). With Datalog, a designer expresses a static program analysis in the form of a logical specification. While a domain specific language approach aids in the ease of development of program analyses, it is commonly accepted that such an approach has worse runtime performance than handcrafted static analysis tools. In this work, we introduce a new program synthesis methodology for Datalog specifications to produce highly efficient monolithic C++ analyzers. The synthesis technique requires the re-interpretation of the semi-naïve evaluation as a scaffolding for translation using partial evaluation. To achieve high-performance, we employ staged- compilation techniques and specialize the underlying relational data structures for a given Datalog specification. Experimentation on benchmarks for large-scale program analysis validates the superior performance of our approach over available Datalog tools and demonstrates our competitiveness with state-of-the-art handcrafted tools.