Scalable Anytime Algorithms for Learning Fragments of Linear Temporal Logic

Linear temporal logic (LTL) is a specification language for finite sequences (called traces) widely used in program verification, motion planning in robotics, process mining, and many other areas. We consider the problem of learning LTL formulas for classifying traces; despite a growing interest of the research community, existing solutions suffer from two limitations: they do not scale beyond small formulas, and they may exhaust computational resources without returning any result. We introduce a new algorithm addressing both issues: our algorithm is able to construct formulas an order of magnitude larger than previous methods, and it is anytime, meaning that it in most cases successfully outputs a formula, albeit possibly not of minimal size. We evaluate the performances of our algorithm using an open source implementation against publicly available benchmarks.


Introduction
Linear Temporal Logic (LTL) is a prominent logic for specifying temporal properties [23] over infinite traces, and recently introduced over finite traces [7].In this paper, we consider finite traces but, in a small abuse of notations, call this logic LTL as well.It has become a de facto standard in many fields such as model checking, program analysis, and motion planning for robotics.Over the past five to ten years learning temporal logics (of which LTL is the core) has become an active research area and identified as an important goal in artificial intelligence: it formalises the difficult task of building explainable models from data.Indeed, as we will see in the examples below and as argued in the literature, e.g., by [5] and [26], LTL formulas are typically easy to interpret by human users and therefore useful as explanations.The variable free syntax of LTL and its natural inductive semantics make LTL a natural target for building classifiers separating positive from negative traces.
The fundamental problem we study here, established in [27], is to build an explainable model in the form of an LTL formula from a set of positive and negative traces.More formally (we refer to the next section for formal definitions), given a set u 1 , . . ., u n of positive traces and a set v 1 , . . ., v n of negative traces, the goal is to construct a formula ϕ of LTL which satisfies all u i 's and none of the v i 's.In that case, we say that ϕ is a separating formula or-using machine learning terminology-a classifier.
To make things concrete let us introduce our running example, a classic motion planning problem in robotics and inspired by [17].A robot collects wastebin contents in an office-like environment and empties them in a trash container.Let us assume that there is an office o, a hallway h, a container c and a wet area w.The following are possible traces obtained in experimentation with the robot (for instance, through simulation): In LTL learning we start from these labelled data: given u 1 as positive and v 1 as negative, what is a possible classifier including u 1 but not v 1 ?Informally, v 1 being negative implies that the order is fixed: o must be visited before c.We look for classifiers in the form of separating formulas, for instance where the F-operator stands for "finally" and X for "next".Note that this formula requires to visit the office first and only then visit the container.
Assume now that two more negative traces were added: Then the previous separating formula is no longer correct, and a possible separating formula is F(o ∧ F X c) ∧ G(¬w), which additionally requires the robot to never visit the wet area.Here the Goperator stands for "globally".
Let us emphasise at this point that for the sake of presentation, we consider only exact classifiers: a separating formula must satisfy all positive traces and none of the negative traces.However, our algorithm naturally extends to the noisy data setting where the goal is to construct an approximate classifier, replacing 'all' and 'none' by 'almost all' and 'almost none'.
State of the art.A number of different approaches have been proposed, leveraging SAT solvers [22], automata [5], and Bayesian inference [19], and extended to more expressive logics such as Property Specification Language (PSL) [26] and Computational Tree Logic (CTL) [10].
Applications include program specification [20], anomaly and fault detection [4], robotics [6], and many more: we refer to [5], Section 7, for a list of practical applications.An equivalent point of view on LTL learning is as a specification mining question.The ARSENAL [15] and FRET [16] projects construct LTL specifications from natural language, we refer to [21] for an overview.
Existing methods do not scale beyond formulas of small size, making them hard to deploy for industrial cases.A second serious limitation is that they often exhaust computational resources without returning any result.Indeed theoretical studies [12] have shown that constructing the minimal LTL formula is NP-hard already for very small fragments of LTL, explaining the difficulties found in practice.
Our approach.To address both issues, we turn to approximation and anytime algorithms.Here approximation means that the algorithm does not ensure minimality of the constructed formula: it does ensure that the output formula separates positive from negative traces, but it may not be the smallest one.On the other hand, an algorithm solving an optimisation problem is called anytime if it finds better and better solutions the longer it keeps running.In other words, anytime algorithms work by refining solutions.As we will see in the experiments, this implies that even if our algorithm timeouts it may yield some good albeit non-optimal formula.
Our algorithm targets a strict fragment of LTL, which does not contain the Until operator (nor its dual Release operator).It combines two ingredients: -Searching for directed formulas: we define a space efficient dynamic programming algorithm for enumerating formulas from a fragment of LTL that we call Directed LTL.-Combining directed formulas: we construct two algorithms for combining formulas using Boolean operators.The first is an off-the-shelf decision tree algorithm, and the second is a new greedy algorithm called Boolean subset cover.
The two ingredients yield two subprocedures: the first one finds directed formulas of increasing size, which are then fed to the second procedure in charge of combining them into a separating formula.This yields an anytime algorithm as both subprocedures can output separating formulas even with a low computational budget and refine them over time.
Let us illustrate the two subprocedures in our running example.The first procedure enumerates so-called directed formulas in increasing size; we refer to the corresponding section for a formal definition.The directed formulas F(o ∧ F X c) and G(¬w) have small size hence will be generated early on.The second procedure constructs formulas as Boolean combinations of directed formulas.Without getting into the details of the algorithms, let us note that both F(o ∧ F X c) and G(¬w) satisfy u 1 .The first does not satisfy v 1 and the second does not satisfy v 2 and v 3 .Hence their conjunction F(o∧F X c)∧G(¬w) is separating, meaning it satisfies u 1 but none of v 1 , v 2 , v 3 .
Outline.The mandatory definitions and the problem statement we deal with are described in Section 2. Section 3 shows a high-level overview of our main idea in the algorithm.The next two sections, Section 4 and Section 5 describe the two phases of our algorithm in details, in one section each.We discuss the theoretical guarantees of our algorithm in Section 6.We conclude with an empirical evaluation in Section 7.

Preliminaries
Traces.Let P be a finite set of atomic propositions.An alphabet is a finite non-empty set Σ = 2 P , whose elements are called symbols.A finite trace over Σ is a finite sequence t = a 1 a 2 . . .a n such that for every 1 ≤ i ≤ n, a i ∈ Σ.We say that t has length n and write |t| = n.For example, let P = {p, q}, in the trace t = {p, q} • {p} • {q} both p and q hold at the first position, only p holds in the second position, and q in the third position.Note that, throughout the paper, we only consider finite traces.
A trace is a word if exactly one atomic proposition holds at each position: we used words in the introduction example for simplicity, writing h Given a trace t = a 1 a 2 . . .a n and 1 ≤ i ≤ j ≤ n, let t[i, j] = a i . . .a j be the infix of t from position i up to and including position j.Moreover, t[i] = a i is the symbol at the i th position.
Linear Temporal Logic.The syntax of Linear Temporal Logic (LTL, in short) is defined by the following grammar We use the standard formulas: true = p ∨ ¬p, false = p ∧ ¬p and last = ¬ X true, which denotes the last position of the trace.As a shorthand, we use The size of a formula is the size of its underlying syntax tree.Formulas in LTL are evaluated over finite traces.To define the semantics of LTL we introduce the notation t, i |= ϕ, which reads 'the LTL formula ϕ holds over trace t from position i'.We say that t satisfies ϕ and we write t |= ϕ when t, 1 |= ϕ.The definition of |= is inductive on the formula ϕ: The LTL Learning Problem.The LTL exact learning problem studied in this paper is the following: given a set P of positive traces and a set N of negative traces, construct a minimal LTL separating formula ϕ, meaning such that t |= ϕ for all t ∈ P and t |= ϕ for all t ∈ N .There are two relevant parameters for a sample: its size, which is the number of traces, and its length, which is the maximum length of all traces.
The problem is naturally extended to the LTL noisy learning problem where the goal is to construct an ε-separating formula, meaning such that ϕ satisfies all but an ε proportion of the traces in P and none but an ε proportion of the traces in N .For the sake of simplicity we present an algorithm for solving the LTL exact learning problem, and later sketch how to extend it to the noisy setting.

High-level view of the algorithm
Let us start with a naive algorithm for the LTL Learning Problem.We can search through all LTL formulas in some order and check whether they are separating for our sample or not.Checking whether an LTL formula is separating can be done using standard methods (for e.g. using bit vector operations [2]).However, the major drawback of this idea is that we have to search through all LTL formulas, which is hard as the number of LTL formulas grows very quickly 5 .
To tackle this issue, instead of the entire LTL fragment, our algorithm (as outlined in Algorithm 1) performs an iterative search through a fragment of LTL, which we call Directed LTL (Line 4).We expand upon this in Section 4. In that section, we also describe how we can iteratively generate these Directed LTL formulas in a particular "size order" (not the usual size of an LTL formula) and evaluate these formulas over the traces in the sample efficiently using dynamic programming techniques.
To include more formulas in our search space, we generate and search through Boolean combinations of the most promising formulas of Directed LTL formulas (Line 11), which we describe in detail in Section 5. Note that, the fragment of LTL that our algorithm searches through ultimately does not include formulas with U operator.Thus, for readability, we use LTL to refer to the fragment LTL \ U in the rest of the paper.
During the search of formulas, our algorithm searches for smaller separating formulas at each iteration than the previously found ones, if any.In fact, as a heuristic, once a separating formula is found, we only search through formulas that are smaller than the found separating formula.Such a heuristic, along with aiding the search for minimal formulas, also reduces the search space significantly.
Anytime property.The anytime property of our algorithm is also consequence of storing the smallest formula seen so far ((Line 7 and 14)).Once we find a sep-Algorithm 1 Overview of our algorithm 1: B ← ∅ 2: ψ ← ∅: best formula found 3: for all s in "size order" do 4: D ← all Directed LTL formulas of parameter s 5: for all ϕ ∈ D do 6: if ϕ is separating and smaller than ψ then 7: ψ ← ϕ 8: end if 9: end for 10: B ← Boolean combinations of the promising formulas in B 12: for all ϕ ∈ B do 13: if ϕ is separating and smaller than ψ then 14: ψ ← ϕ 15: end if 16: end for 17: end for 18: Return ψ arating formula, we can output it and continue the search for smaller separating formulas.
Extension to the noisy setting.The algorithm is seamlessly extended to the noisy setting by rewriting lines 6 and 13: instead of outputting only separating formulas, we output ε-separating formulas.

Searching for directed formulas
The first insight of our approach is the definition of a fragment of LTL that we call directed LTL.
A partial symbol is a conjunction of positive or negative atomic propositions.We write s = p 0 ∧ p 2 ∧ ¬p 1 for the partial symbol specifying that p 0 and p 2 hold and p 1 does not.The definition of a symbol satisfying a partial symbol is natural: for instance the symbol {p 0 , p 2 , p 4 } satisfies s.The width of a partial symbol is the number of atomic propositions it uses.
Directed LTL is defined by the following grammar: where s is a partial symbol and n ∈ {0, 1, • • • }.As an example, the directed formula F((p ∧ q) ∧ F X 2 ¬p) reads: there exists a position satisfying p ∧ q, and at least two positions later, there exists a position satisfying ¬p.The intuition behind the term "directed" is that a directed formula fixes the order in which the partial symbols occur.A non-directed formula is F p ∧ F q: there is no order between p and q.Note that Directed LTL only uses the X and F operators as well as conjunctions and atomic propositions.
Generating directed formulas.Let us consider the following problem: given the sample S = P ∪ N , we want to generate all directed formulas together with a list of traces in S, they satisfy.Our first technical contribution and key to the scalability of our approach is an efficient solution to this problem based on dynamic programming.
Let us define a natural order in which we want to generate directed formulas.They have two parameters: length, which is the number of partial symbols in the directed formula, and width, which is the maximum of the widths of the partial symbols in the directed formula.We consider the order based on summing these two parameters: (1, 1), (2, 1), (1, 2), (3, 1), (2, 2), (1, 3), . . .(We note that in practice, slightly more complicated orders on pairs are useful since we want to increase the length more often than the width.)Our enumeration algorithm works by generating all directed formulas of a given pair of parameters in a recursive fashion.Assuming that we already generated all directed formulas for the pair of parameters ( , w), we define two procedures, one for generating the directed formulas for the parameters ( + 1, w), and the other one for ( , w + 1).
When we generate the directed formulas, we also keep track of which traces in the sample they satisfy by exploiting a dynamic programming table called LastPos.We define it is as follows, where ϕ is a directed formula and t a trace in S: The main benefit of LastPos is that it meshes well with directed formulas: it is algorithmically easy to compute them recursively on the structure of directed formulas.
A useful idea is to change the representation of the set of traces S, by precomputing the lookup table Index defined as follows, where t is a trace in S, s a partial symbol, and i in [1, |t|]: The table Index can be precomputed in linear time from S, and makes the dynamic programming algorithm easier to formulate.
Having defined the important ingredients, we now present the pseudocode 2 for both increasing the length and width of a formula.For the length increase algorithm, we define two extension operators ∧ =k and ∧ ≥k that "extend" the length of a directed formula ϕ by including a partial symbol s in the formula.Precisely, the operator s ∧ =k ϕ replaces the rightmost partial symbol s in ϕ with (s ∧ X k s), while s ∧ ≥k ϕ replaces s with (s For the width increase algorithm, we say that two directed formulas are compatible if they are equal except for partial symbols.For two compatible formulas, we define a pointwise-and (∧ • ) operator that takes the conjunction of the corresponding partial symbols at the same positions.For instance, X(a The actual implementation of the algorithm refines the algorithms in certain places.For instance: -Line 3: instead of considering all partial symbols, we restrict to those appearing in at least one positive trace.-Line 13: some computations for ϕ ≥j can be made redundant; a finer data structure factorises the computations.-Line 25: using a refined data structure, we only enumerate compatible directed formulas.
Lemma 1. Algorithm 2 generates all directed formulas and correctly computes the tables LastPos.
The dual point of view.We use the same algorithm to produce formulas in a dual fragment to directed LTL, which uses the X and G operators, the last predicate, as well as disjunctions and atomic propositions.The only difference is that we swap positive and negative traces in the sample.We obtain a directed formula from such a sample and apply its negation as shown below:

Boolean combinations of formulas
As explained in the previous section, we can efficiently generate directed formulas and dual directed formulas.Now we explain how to form a Boolean combination of these formulas in order to construct separating formulas, as illustrated in the introduction.
Boolean combination of formulas.Let us consider the following subproblem: given a set of formulas, does there exist a Boolean combination of some of the formulas that is a separating formula?We call this problem the Boolean subset cover, which is illustrated in Figure 1.In this example we have three formulas ϕ 1 , ϕ 2 , and ϕ 3 , each satisfying subsets of u 1 , u 2 , u 3 , v 1 , v 2 , v 3 as represented in the drawing.Inspecting the three subsets reveals that (ϕ 1 ∧ ϕ 2 ) ∨ ϕ 3 is a separating formula.
The Boolean subset cover problem is a generalization of the well known and extensively studied subset cover problem, where we are given S 1 , . . ., S m subsets of [1, n], and the goal is to find a subset I of [1, m] such that i∈I S i covers Algorithm 2 Generation of directed formulas for the set of traces S 1: procedure Search directed formulas -length increase( , w) 2: for all directed formulas ϕ of length and width w do 3: for all partial symbols s of width at most w do 4: for all t ∈ S do 5: for all i ∈ I do 7: for all j ∈ J do 9: add j to LastPos(ϕ=j, t) 11: end for 12: for all j ≤ max(J) do 13: for all directed formulas ϕ of length and width w do 24: for all directed formulas ϕ of length and width 1 do 25: if ϕ and ϕ are compatible then 26: for all t ∈ S do 28: LastPos(ϕ , t) ← LastPos(ϕ, t) ∩ LastPos(ϕ , t) 29: end for 30: end if 31: end for 32: end for 33: end procedure all of [1, n] -such a set I is called a cover.Indeed, it corresponds to the case where all formulas satisfy none of the negative traces: in that case, conjunctions are not useful, and we can ignore the negative traces.The subset cover problem is known to be NP-complete.However, there exists a polynomial-time log(n)-approximation algorithm called the greedy algorithm: it is guaranteed to construct a cover that is at most log(n) times larger than the minimum cover.This approximation ratio is optimal in the following sense [8]: there is no polynomial time (1 − o(1)) log(n)-approximation algorithm for subset cover unless P = NP.Informally, the greedy algorithm for the subset cover problem does the following: it iteratively constructs a cover I by sequentially adding the most 'promising subset' to I, which is the subset S i maximising how many more elements of [1, n] are covered by adding i to I. We introduce an extension of the greedy algorithm to the Boolean subset cover problem.The first ingredient is a scoring function, which takes into account both how close the formula is to being separating, and how large it is.We use the following score: where |ϕ| is the size of ϕ.The use of √ • is empirical, it is used to mitigate the importance of size over being separating.
The algorithm maintains a set of formulas B, which is initially the set of formulas given as input, and add new formulas to B until finding a separating formula.Let us fix a constant K, which in the implementation is set to 5. At each point in time, the algorithm chooses the K formulas ϕ 1 , . . ., ϕ K with the highest score in B and constructs all disjunctions and conjunctions of ϕ i with formulas in B. For each i, we keep the disjunction or conjunction with a maximal score, and add this formula to B if it has higher score than ϕ i .We repeat this procedure until we find a separating formula or no formula is added to B.
Another natural approach to the Boolean subset cover problem is to use decision trees: we use one variable for each trace and one atomic proposition for each formula to denote whether the trace satisfies the formula.We then construct a decision tree classifying all traces.We experimented with both approaches and found that the greedy algorithm is both faster and yields smaller formulas.We do not report on these experiments because the formulas output using the decision tree approach are prohibitively larger and therefore not useful for explanations.Let us, however, remark that using decision trees we get a theoretical guarantee that if there exists a separating formula as a Boolean combination of the formulas, then the algorithm will find it.

Theoretical guarantees
The following result shows the relevance of our approach using directed LTL and Boolean combinations.
Theorem 1.Every formula of LTL(F, X, ∧, ∨) is equivalent to a Boolean combination of directed formulas.Equivalently, every formula of LTL(G, X, ∧, ∨) is equivalent to a Boolean combination of dual directed formulas.
The proof of Theorem 1 can be found in the appendix.To get an intuition, let us consider the formula F p ∧ F q, which is not directed, but equivalent to F(p ∧ F q) ∨ F(q ∧ F p).In the second formulation, there is a disjunction over the possible orderings of p and q.The formal proof generalises this rewriting idea.
This implies the following properties for our algorithm: terminating: given a bound on the size of formulas, the algorithm eventually generates all formulas of bounded size, correctness: if the algorithm outputs a formula, then it is separating, completeness: if there exists a separating formula in LTL(F, G, X, ∧, ∨) with no nesting of F and G, then the algorithm finds a separating formula.

Experimental evaluation
In this section, we answer the following research questions to assess the performance of our LTL learning algorithm.
RQ1: How effective are we in learning concise LTL formulas from samples?RQ2: How much scalability do we achieve through our algorithm?
RQ3: What do we gain from the anytime property of our algorithm?
Experimental Setup.To answer the questions above, we have implemented a prototype of our algorithm in Python 3 in a tool named SCARLET6 (SCalable Anytime algoRithm for LEarning lTl).We run SCARLET on several benchmarks generated synthetically from LTL formulas used in practice.To answer each research question precisely, we choose different sets of LTL formulas.We discuss them in detail in the corresponding sections.Note that, however, we did not consider any formulas with U-operator, since SCARLET is not designed to find such formulas.
To assess the performance of SCARLET, we compare it against two state-ofthe-art tools for learning logic formulas from examples: 1. FLIE7 , developed by [22], infers minimal LTL formulas using a learning algorithm that is based on constraint solving (SAT solving).2. SYSLITE8 , developed by [1], originally infers minimal past-time LTL formulas using an enumerative algorithm implemented in a tool called CVC4SY [25].
For our comparisons, we use a version of SYSLITE that we modified (which we refer to as SYSLITE L ) to infer LTL formulas rather than past-time LTL formulas.Our modifications include changes to the syntactic constraints generated by SYSLITE L as well as changing the semantics from past-time LTL to ordinary LTL.
To obtain a fair comparison against SCARLET, in both the tools, we disabled the U-operator.This is because if we allow U-operator this will only make the tools slower since they will have to search through all formulas containing U.
All the experiments are conducted on a single core of a Debian machine with Intel Xeon E7-8857 CPU (at 3 GHz) using up to 6 GB of RAM.We set the timeout to be 900 s for all experiments.We include scripts to reproduce all experimental results in a publicly available artifact [24].

Table 1: Common LTL formulas used in practice
Absence: G(¬p), G(q → G(¬p)) Disjunction of patterns: Sample generation.To provide a comparison among the learning tools, we follow the literature [22,26] and use synthetic benchmarks generated from realworld LTL formulas.For benchmark generation, earlier works rely on a fairly naive generation method.In this method, starting from a formula ϕ, a sample is generated by randomly drawing traces and categorizing them into positive and negative examples depending on the satisfaction with respect to ϕ.This method, however, often results in samples that can be separated by formulas much smaller than ϕ.Moreover, it often requires a prohibitively large amount of time to generate samples (for instance, for G p, where almost all traces satisfy a formula) and, hence, often does not terminate in a reasonable time.
To alleviate the issues in the existing method, we have designed a novel generation method for the quick generation of large samples.In our method, we first convert the starting formula into an equivalent DFA and then extract accepted and rejected words to obtain a sample of the desired size.We provide more details on this new generation method used in the appendix.

RQ1: Performance Comparison
To address our first research question, we have compared all three tools on a synthetic benchmark suite generated from eight LTL formulas.These formulas originate from a study by Dwyer et al. [9], who have collected a comprehensive set of LTL formulas arising in real-world applications (see Table 1 for an excerpt).The selected LTL formulas have, in fact, also been used by FLIE for generating its benchmarks.While FLIE also considered formulas with U-operator, we did not consider them for generating our benchmarks due to reasons mentioned in the experimental setup.Our benchmark suite consists of a total of 256 samples (32 for each of the eight LTL formulas) generated using our generation method.The number of traces in the samples ranges from 50 to 2 000, while the length of traces ranges from 8 to 15.
Figure 2a presents the runtime comparison of FLIE, SYSLITE L and SCARLET on all 256 samples.From the scatter plots, we observe that SCARLET ran faster than FLIE on all samples.Likewise, SCARLET was faster than SYSLITE L on all but eight (out of 256) samples.SCARLET timed out on only 13 samples, while FLIE and SYSLITE L timed out on 85 and 36, respectively (see Figure 2b).
The good performance of SCARLET can be attributed to its efficient formula search technique.In particular, SCARLET only considers formulas that have a high potential of being a separating formula since it extracts Directed LTL formulas from the sample itself.FLIE and SYSLITE L , on the other hand, search through arbitrary formulas (in order of increasing size), each time checking if the current one separates the sample.
Figure 2c presents the comparison of the size of the formulas inferred by each tool.On 170 out of the 256 samples, all tools terminated and returned an LTL formula with size at most 7.In 150 out of this 170 samples, SCARLET, FLIE, and SYSLITE L inferred formulas of equal size, while on the remaining 20 samples SCARLET inferred formulas that were larger.The latter observation indicates that SCARLET misses certain small, separating formulas, in particular, the ones which are not a Boolean combination of directed formulas.
However, it is important to highlight that the formulas learned by SCARLET are in most cases not significantly larger than those learned by FLIE and SYSLITE L .This can be seen from the fact that the average size of formulas inferred by SCARLET (on benchmarks in which none of the tools timed out) is 3.21, while the average size of formulas inferred by FLIE and SYSLITE L is 3.07.
Overall, SCARLET displayed significant speed-up over both FLIE and SYSLITE L while learning a formula similar in size, answering question RQ1 in the positive.

RQ2: Scalability
To address the second research question, we investigate the scalability of SCARLET in two dimensions: the size of the sample and the size of the formula from which the samples are generated.
Scalability with respect to the size of the samples.For demonstrating the scalability with respect to the size of the samples, we consider two formulas ϕ cov = F(a 1 ) ∧ F(a 2 ) ∧ F(a 3 ) and ϕ seq = F(a 1 ∧ F(a 2 ∧ F a 3 )), both of which appear commonly in robotic motion planning [11].While the formula ϕ cov describes the property that a robot eventually visits (or covers) three regions a 1 , a 2 , and a 3 in arbitrary order, the formula ϕ seq describes that the robot has to visit the regions in the specific order a 1 a 2 a 3 .
We have generated two sets of benchmarks for both formulas for which we varied the number of traces and their length, respectively.More precisely, the first benchmark set contains 90 samples of an increasing number of traces (5 samples for each number), ranging from 200 to 100 000, each consisting of traces of fixed length 10.On the other hand, the second benchmark set contains 90 samples of 200 traces, containing traces from length 10 to length 50.As the results on both benchmark sets are similar, we here discuss the results on the first set and refer the readers to the appendix for the second set.
Figure 3a shows the average runtime results of SCARLET, FLIE, and SYSLITE L on the first benchmark set.We observe that SCARLET substantially outperformed the other two tools on all samples.This is because both ϕ cov and ϕ seq are of size eight and inferring formulas of such size is computationally challenging for FLIE and SYSLITE L .In particular, FLIE and SYSLITE L need to search through all formulas of size upto eight to infer the formulas, while, SCARLET, due to its efficient search order (using length and width of a formula), infers them faster.
From Figure 3a, we further observe a significant difference between the run times of SCARLET on samples generated from formula ϕ cov and from formula ϕ seq .This is evident from the fact that SCARLET failed to infer formulas for samples of ϕ seq starting at a size of 6 000, while it could infer formulas for samples of ϕ cov up to a size of 50 000.Such a result is again due to the search order used by SCARLET: while ϕ cov is a Boolean combination of directed formulas of length 1 and width 1, ϕ seq is a directed formula of length 3 and width 1. Scalability with respect to the size of the formula.To demonstrate the scalability with respect to the size of the formula used to generate samples, we have extended ϕ cov and ϕ seq to families of formulas (ϕ n cov ) n∈N\{0} with ϕ n cov = F(a 1 ) ∧ F(a 2 ) ∧ . . .∧ F(a n ) and (ϕ n seq ) n∈N\{0} with ϕ n seq = F(a 1 ∧ F(a 2 ∧ F(. . .∧ F a n ))), respectively.These family of formulas describe properties similar to that of ϕ cov and ϕ seq , but the number of regions is parameterized by n ∈ N \ {0}.We consider formulas from the two families by varying n from 2 to 5 to generate a benchmark suite consisting of samples (5 samples for each formula) having 200 traces of length 10.
Figure 3b shows the average run time comparison of the tools for samples from increasing formula sizes.We observe a trend similar to Figure 3a: SCARLET performs better than the other two tools and infers formulas of family ϕ n cov faster than that of ϕ n seq .However, contrary to the near linear increase of the runtime with the number of traces, we notice an almost exponential increase of the runtime with the formula size.
Overall, our experiments show better scalability with respect to sample and formula size compared against the other tools, answering RQ2 in the positive.

RQ3: Anytime Property
To answer RQ3, we list two advantages of the anytime property of our algorithm.We demonstrate these advantages by showing evidence from the runs of SCARLET on benchmarks used in RQ1 and RQ2.
First, in the instance of a time out, our algorithm may find a "concise" separating formula while the other tools will not.In our experiments, we observed that for all benchmarks used in RQ1 and RQ2, SCARLET obtained a formula even when it timed out.In fact, in the samples from ϕ 5 cov used in RQ2, SCARLET (see Figure 3b) obtained the exact original formula, that too within one second (0.7 seconds in average), although timed out later.The time out was because SCARLET continued to search for smaller formulas even after finding the original formula.
Second, our algorithm can actually output the final formula earlier than its termination.This is evident from the fact that, for the 243 samples in RQ1 where SCARLET does not time out, the average time required to find the final formula is 10.8 seconds, while the average termination time is 25.17 seconds.Thus, there is a chance that even if one stops the algorithm earlier than its termination, one can still obtain the final formula.
Our observations from the experiments clearly indicate the advantages of anytime property to obtain a concise separating formula and thus, answering RQ3 in the positive.

Conclusion
We have proposed a new approach for learning temporal properties from examples, fleshing it out in an approximation anytime algorithm.We have shown in experiments that our algorithm outperforms existing tools in two ways: it scales to larger formulas and input samples, and even when it timeouts it often outputs a separating formula.
Our algorithm targets a strict fragment of LTL, restricting its expressivity in two aspects: it does not include the U ("until") operator, and we cannot nest the eventually and globally operators.We leave for future work to extend our algorithm to full LTL.
An important open question concerns the theoretical guarantees offered by the greedy algorithm for the Boolean subset cover problem.It extends a well known algorithm for the classic subset cover problem and this restriction has been proved to yield an optimal log(n)-approximation. Do we have similar guarantees in our more general setting?

A Proof of Theorem 1
For readability, in this section, we refer to Directed LTL as dLTL and the Boolean combination of Directed LTL as dLTL(∧, ∨).
In this section, we prove the first statement of Theorem 1 which can be re-stated as theorem stated below.The remaining part of Theorem 1 is a consequence of the proof of the following theorem.
We first prove a lemma necessary for the proof of the above theorem.
Proof.To prove the lemma, we use an induction over the structure of ∆ 1 ∧ ∆ 2 to show that it can be written as a disjunction of dLTL formulas.As induction hypothesis, we consider all formulas ∆ 1 ∧ ∆ 2 , where at least one of ∆ 1 and ∆ 2 is structurally smaller than ∆ 1 and ∆ 2 respectively, can be written as a disjunction of dLTL formulas.
The base case of the induction is when either ∆ 1 or ∆ 2 is a partial symbol.In this case, ∆ 1 ∧ ∆ 2 is itself a dLTL formula by definition of dLTL formulas.The induction step proceeds via case analysis on the possible root operators of the formulas ∆ 1 and ∆ 2 -Case: either ∆ 1 or ∆ 2 is of the form s∧∆ for some partial symbol s.Without loss of generality, let us say ∆ 1 = s∧∆.In this case, ∆ 1 ∧∆ 2 = (s∧∆)∧∆ 2 = s ∧ (∆ ∧ ∆ 2 ).By hypothesis, ∆ ∧ ∆ 2 = i Γ i for some Γ i in dLTL.Thus, . By hypothesis, both formulas (X δ1 ∧ δ 2 ) and (δ 1 ∧ F δ 2 ) can be written as a disjunction of dLTL formulas.Thus, ∆ 1 ∧ ∆ 2 can also be written as a disjunction of dLTL formulas -Case: ∆ 1 is of the form F δ 1 and ∆ 2 is of the form F δ 2 .In this case, . By hypothesis, both formulas δ1∧F δ 2 and δ 2 ∧F δ 1 can be written as a disjunction of dLTL formulas.Thus, ∆ 1 ∧∆ 2 can also be written as a disjunction of dLTL formulas Proof (Proof of theorem).The proof proceeds via induction on the structure of formulas ϕ in LTL(F, X, ∧, ∨).As induction hypothesis, we consider that all formulas ϕ which are structurally smaller than ϕ can be expressed in dLTL(∧, ∨).
As the base case of the induction, we observe that formulas p for all p ∈ P, are dLTL formulas and thus, in dLTL(∧, ∨).
For the induction step, we perform a case analysis based on the root operator of ϕ.
Using lemma 2, we can re-write i ∆ i as i Γ i for some Γ i 's in dLTL.As a result, ϕ = j i i ).Thus, ϕ is in dLTL(∧, ∨).

B Sample generation method
To evaluate the performance of the tools FLIE, SYSLITE L , and SCARLET effectively, we rely on our novel sample generation algorithm to generate benchmarks from LTL formulas.The outline of the generation algorithm is presented in Algorithm 3. The crux of the algorithm is to convert the LTL formula ϕ into its equivalent DFA A ϕ and then extract random traces from the DFA to obtain a sample of desired length and size.
To convert ϕ into its equivalent DFA A ϕ (Line 3), we rely on a python tool LTL f 2DFA9 .Essentially, this tool converts ϕ into its equivalent formula in First-order Logic and then obtains a minimal DFA from the formula using a tool named MONA [18].
For extracting random traces from the DFA (Line 5 and 9), we use a procedure suggested by [3].The procedure involves generating words by choosing letters that have a higher probability of leading to an accepting state.This requires assigning appropriate probabilities to the transitions of the DFA.In this step, we add our modifications to the procedure.The main idea is that we adjust the probabilities of the transitions appropriately to ensure that we obtain distinct words in each iteration.
Unlike existing sample generation methods, our method does not create random traces and try to classify them as positive or negative.This results in a much faster generation of large and better quality samples.

6:
P ← P ∪ {w} 7: end 8: Loop nP times 9: w ← random accepted word of length l from A c ϕ .10: N ← N ∪ {w} 11: end 12: return S = (P, N ) C List of all formulas used for generating benchmarks

D Comparison of tools on existing benchmarks
To address our first research question RQ1 in the 'Experimental evaluation' section, we compared the performance of three tools on an existing benchmark suite 10 [14].The benchmark suite has been generated using a fairly naive generation method from the LTL formulas listed as Absence, Existence, Universality and Disjunction of patterns listed in Table 2.  Figure 4 represents the runtime comparison of FLIE, SYSLITE L and SCARLET on 98 samples.From the scatter plots, we observe that SCARLET runs much faster than FLIE on all samples and than SYSLITE L on all but two samples.Also, SCARLET timed out only on 3 samples while SYSLITE L timed out on 6 samples and FLIE timed out on 15 samples.Figure 5 presents the comparison of formula size inferred by each tool.On 84 out of 98 samples, where none of the tools timed out, we observe that on 65 samples, SCARLET inferred formula size equal to the one inferred by SYSLITE L and FLIE.Further, in the samples where SCARLET learns larger formulas than other tools, the size gap is not significant.This is evident from the fact that the average formula size learned by SCARLET is 4.13 which is slightly higher than that by FLIE and SYSLITE L , 3.84.

E Scalability on the benchmark having increasing trace lengths
To address our second research question RQ2 in 'Experimental evaluation', we evaluated the scalability of our algorithm on two sets of benchmarks generated from formulas ϕ cov and ϕ seq .While the first benchmark set contains 90 samples with increasing sizes but a fixed length, the second benchmark set contains 90 samples with 200 traces having lengths ranging from 10 to 50.We provide the results for the second benchmark set here.Figure 6a depicts the results we obtained by running all three tools on it.The trends we observe here are similar to the ones we observe in the first benchmark set.SCARLET performs better on the samples from ϕ cov than it does on samples from ϕ seq .The reason remains similar: it is easier to find a formula which is a boolean combination of length 1, width 1 simple LTL, than a simple LTL of length 3 and width 1.
Contrary to the results on the first benchmark set, we observe that the increase of runtime with the length of the sample is quadratic.This explains why on samples from ϕ seq on large lengths such as 50, SCARLET faces time-out.However, for samples from ϕ cov , SCARLET displays the ability to scale way beyond length 50.

Fig. 2 :
Fig. 2: Comparison of SCARLET, FLIE and SYSLITE L on synthetic benchmarks.In Figure 2a, all times are in seconds and 'TO' indicates timeouts.The size of bubbles in the figure indicate the number of samples for each datapoint.

Fig. 3 :
Fig. 3: Comparison of SCARLET, FLIE and SYSLITE L on synthetic benchmarks.In Figure 3a, all times are in seconds and 'TO' indicates timeouts.

Fig. 5 :
Fig. 5: Comparison of SCARLET, FLIE and SYSLITE L on existing benchmarks.In Figure 2a, all times are in seconds and 'TO' indicates timeouts.The size of bubbles indicate the number of samples for each datapoint.

Fig. 6 :
Fig. 6: Comparison of SCARLET, FLIE and SYSLITE L on synthetic benchmarks.In Figure 6a, all times are in seconds and 'TO' indicates timeouts.