Learn a PDFA using (Balle et al., 2013)

PDFA Learning (Balle et al., 2013)

In this notebook, we will show how to use the implementation of PDFA learning, as described in [2].

In terms of APIs, the only things that change are the parameter algorithm to learn_pdfa (it must be Algorithm.BALLE) and the set of parameters. See pdfa_learning.learn_pdfa.balle.params.py for all the details.

Example

Utility functions to display SVGs.

%matplotlib inline

from pprint import pprint

from helpers import render_automaton
from pdfa_learning.learn_pdfa.base import learn_pdfa, Algorithm
from pdfa_learning.learn_pdfa.utils.generator import MultiprocessedGenerator, SimpleGenerator
from pdfa_learning.pdfa import PDFA

Example with 1 state.

Let's use the following automaton to generate samples.

p = 0.3
automaton = PDFA(
    nb_states=2,
    alphabet_size=2,
    transition_dict={
        0: {
            0: (0, p),
            1: (1, 1 - p),
        },
        1: {-1: (-1, 1.0)}
    }
)
render_automaton(automaton)

[2021-01-02 15:36:50,713][graphviz.files][DEBUG] write 195 bytes to '/tmp/tmp4krev1q6/output'
[2021-01-02 15:36:50,715][graphviz.backend][DEBUG] run ['dot', '-Kdot', '-Tsvg', '-O', 'output']

Now we will run the PAC learning algorithm to learn the above automaton.

With respect to the previous guide, we have the following parameters:

nb_samplesis the number of samples.
delta: is the probability of failure.
n: is the upperbound on the number of states.

generator = MultiprocessedGenerator(SimpleGenerator(automaton), nb_processes=8)

pdfa = learn_pdfa(
    algorithm=Algorithm.BALLE,
    sample_generator=generator,
    alphabet_size=automaton.alphabet_size,
    nb_samples=20000,
    delta=0.1,
    n=10,
)

[2021-01-02 15:36:54,341][pdfa_learning.learn_pdfa][INFO] Parameters: ("{'alphabet_size': 2,\n"
 " 'dataset_size': None,\n"
 " 'delta': 0.1,\n"
 " 'epsilon': 0.1,\n"
 " 'n': 10,\n"
 " 'nb_samples': 20000,\n"
 " 'sample_generator': "
 '<pdfa_learning.learn_pdfa.utils.generator.MultiprocessedGenerator object at '
 '0x7f5410c6bf50>,\n'
 " 'with_ground': False,\n"
 " 'with_infty_norm': True,\n"
 " 'with_smoothing': False}")
[2021-01-02 15:36:54,343][pdfa_learning.learn_pdfa][INFO] Generating the sample.
[2021-01-02 15:36:54,737][pdfa_learning.learn_pdfa][INFO] Average trace length: 2.4088.
[2021-01-02 15:36:54,738][pdfa_learning.learn_pdfa][INFO] Populate root multiset.
[2021-01-02 15:36:54,755][pdfa_learning.learn_pdfa][INFO] Iteration 0
[2021-01-02 15:36:54,756][pdfa_learning.learn_pdfa][INFO] Iteration 1
[2021-01-02 15:36:54,757][pdfa_learning.learn_pdfa][INFO] Iteration 2
[2021-01-02 15:36:54,758][pdfa_learning.learn_pdfa][INFO] Biggest multiset has cardinality 0, done

The learned automaton is:

print("Transitions: ")
pprint(pdfa.transitions)
render_automaton(pdfa)

[2021-01-02 15:36:57,832][graphviz.files][DEBUG] write 201 bytes to '/tmp/tmpmq4hjn0x/output'
[2021-01-02 15:36:57,834][graphviz.backend][DEBUG] run ['dot', '-Kdot', '-Tsvg', '-O', 'output']

Transitions: 
{(0, -1, 0.0, -1), (1, -1, 1.0, -1), (0, 0, 0.2852, 0), (0, 1, 0.7148, 1)}

Example with 2 states.

Now let's try to learn the following automaton:

p1 = 0.4
p2 = 0.7
automaton = PDFA(
    3,
    2,
    {
        0: {
            0: (1, p1),
            1: (2, 1 - p1),
        },
        1: {
            0: (2, 1 - p2),
            1: (1, p2),
        },
        2: {
            -1: (-1, 1.0)
        }
    },
)
render_automaton(automaton)

[2021-01-02 15:17:37,396][graphviz.files][DEBUG] write 248 bytes to '/tmp/tmpguqglne2/output'
[2021-01-02 15:17:37,397][graphviz.backend][DEBUG] run ['dot', '-Kdot', '-Tsvg', '-O', 'output']

generator = MultiprocessedGenerator(SimpleGenerator(automaton), nb_processes=8)

pdfa = learn_pdfa(
    algorithm=Algorithm.BALLE,
    sample_generator=generator,
    alphabet_size=automaton.alphabet_size,
    nb_samples=20000,
    delta=0.1,
    n=10,
)

[2021-01-02 15:37:10,974][pdfa_learning.learn_pdfa][INFO] Parameters: ("{'alphabet_size': 2,\n"
 " 'dataset_size': None,\n"
 " 'delta': 0.1,\n"
 " 'epsilon': 0.1,\n"
 " 'n': 10,\n"
 " 'nb_samples': 20000,\n"
 " 'sample_generator': "
 '<pdfa_learning.learn_pdfa.utils.generator.MultiprocessedGenerator object at '
 '0x7f53fd4d4590>,\n'
 " 'with_ground': False,\n"
 " 'with_infty_norm': True,\n"
 " 'with_smoothing': False}")
[2021-01-02 15:37:10,976][pdfa_learning.learn_pdfa][INFO] Generating the sample.
[2021-01-02 15:37:11,348][pdfa_learning.learn_pdfa][INFO] Average trace length: 2.4088.
[2021-01-02 15:37:11,348][pdfa_learning.learn_pdfa][INFO] Populate root multiset.
[2021-01-02 15:37:11,367][pdfa_learning.learn_pdfa][INFO] Iteration 0
[2021-01-02 15:37:11,367][pdfa_learning.learn_pdfa][INFO] Iteration 1
[2021-01-02 15:37:11,368][pdfa_learning.learn_pdfa][INFO] Iteration 2
[2021-01-02 15:37:11,368][pdfa_learning.learn_pdfa][INFO] Biggest multiset has cardinality 0, done

render_automaton(pdfa)

[2021-01-02 15:37:13,469][graphviz.files][DEBUG] write 201 bytes to '/tmp/tmp41alblmt/output'
[2021-01-02 15:37:13,471][graphviz.backend][DEBUG] run ['dot', '-Kdot', '-Tsvg', '-O', 'output']

References

[1] Balle, Borja, Jorge Castro, and Ricard Gavaldà. "Learning probabilistic automata: A study in state distinguishability." Theoretical Computer Science 473 (2013): 46-60.