Thursday, December 13, 2018

Playing with Android Realm

A long time ago I used to do a lot of Android development, up to Android Honeycomb and Gingerbread. I have used various client side databases, including

SQLite
SQLCipher
Couchbase-Lite

A couple of years ago I heard about Realm DB and wondered how well it works with Android. Realm database, by realm.io is an object database that seems to be growing in popularity, not only for Android development, but also for iOS, Xamarin and others. (It would be good to see a Unity3D SDK, but doesn't seem to be on the horizon). Creating a data model is very easy in Android, as all stroed Realm objects are mapped to Java objects, so it is just a matter of defining your objects as *POJO* Java classes.
I decided to make a sample Android application that uses Realm for local storage.

Impressions of Realm

I found using Realm to be very easy to setup and very easy to understand the API. Creating Model classes was simple and, even though my use case only required two very simple model classes, I think extending to more complicated applications would not cause many headaches.
To keep track of the realm connection I either wanted to create some singleton manager class, or just extend the Android base Application class and use the instance of that as a quasi-singleton. Creating the initial realm is simple:

Realm.init(this);
RealmConfiguration config = new RealmConfiguration.Builder()
                    .name("MasterData")
                    .encryptionKey(masterBytes).build();
Realm.setDefaultConfiguration(config);
mRealm = Realm.getDefaultInstance();

Realm Queries

I found querying Realm to be relatively straightforward. For instance getting al passwords is as easy as

RealmQuery<PasswordModel> pwQuery  = ((SecureXApplication) getApplication()).getRealm().where(PasswordModel.class);

Getting all notes associated with given password, given the password's id:

RealmQuery<NoteModel> query = mRealm.where(NoteModel.class);
            query.equalTo("passwordId", passwordId);
RealmResults<NoteModel> results = query.findAll();

Overall, it feels easier to use than SQLite on Android, especially if the amount of data is not large, or highly relational. I am not sure how Realm would fare trying to negotiate huge join queries on multiple tables.
Also, as with SQLCipher, Realm allows for data encryption, which is something I am always interested in. Of course, managing the database encryption key is another matter - one that I didn't solve with my demonstration application. Realm's specific encryption process uses AES-256 with CBC mode. As far as I know there is no possiblity to use any other encryption algorithm. If Realm could work with Unity3D on Android then it would be a useful way to store encrypted data on the device, especially data that I don't want the user looking at, or modifying.

Problems

I didn't really encounter many problems, although one issue was annoying.
What if I want to rekey my database? SQLCipher offers the functionality to rekey the database, which is sometimes useful, in cases where you suspect the user has access to the data. Or in my demo app's case, if the user wants to change his master password. That functionality is not a deal breaker, but is something that would put Realm up there with SQLCipher for encryption.

Wednesday, June 27, 2018

Falling into the GAP, Part 1

GAP is a computational algebra package, which is incredibly good for working with groups - as in Group Theory. Although Computer Algebra Packages are quite common, GAP seems to be unique in its specialization of Computational Group Theory. The only similar package I am aware of is Magma (which is proprietary and not free). GAP can run on Windows, MacOS, and Linux. Since I usually use Ubuntu Desktop (17.10), I installed GAP using

sudo apt-get install gap

First Play

To get a feel of GAP, let's go through some typical and simple commands. First, let's create a symmetric group on 8 elements (i.e. \(Sym_{8}\)). We know \(Sym_{8}\) can be generated by two permutations; one involution and one permutation of order 8.

gap> s8 := Group( (1,2), (1,2,3,4,5,6,7,8) );
Group([ (1,2), (1,2,3,4,5,6,7,8) ])

What are the conjugacy classes of \(Sym_{8}\)? First, let's quickly define what we mean by a conjugacy class. Let \(G\) be a finite group and let \(a,b \in G\). Then we say \(a\) and \(b\) are in the same conjugacy class if and only if there exists some \(c\) in \(G\) such that \(c^{-1}ac=b\). Conjugate elements form an equivalence class, and the number of conjugacy classes of \(G\) is a fundamental property of \(G\). Since conjugacy classes are equivalence classes, we can represent each class by a singular element. In Gap we can do:

gap> cc := ConjugacyClasses(s8);
[ ()^G, (1,4)(2,3)(5,7)(6,8)^G, (1,7,4,5)(2,6,3,8)^G, (1,4)(6,8)^G, 
  (1,6,4,8)(5,7)^G, (1,6,4,8)(2,3)(5,7)^G, (1,6,5,7,3,2,4,8)^G, 
  (1,4,5)(3,6,8)^G, (1,8,4,3,5,6)(2,7)^G, (2,7)^G, (1,4,5)(2,7)(3,6,8)^G, 
  (3,6,8)^G, (2,7)(3,8,6)^G, (1,4)(2,3)(6,8)^G, (1,6,4,8)^G, 
  (1,4)(2,5,3)(6,8)^G, (1,8,4,6)(2,3,5)^G, (1,2,8,3,5)^G, (1,3,2,5,8)(4,6)^G, 
  (1,8,4,3,5,6)^G, (1,8,3,4,2,5,7)^G, (1,3,2,5,8)(4,7,6)^G ]

The ConjugacyClass function gives a long list of permutations. Each permutation represents one conjugacy class. How many classes are there?

gap> Size(cc);
22

We get 22 conjugacy classes.

A closer Look

Now we have played a little with GAP, it would be good to try to understand more about it. The GAP package has been in development in some form since 1985. It seems there are many contributors to the main source code, and the project is a collaboration between several and other universities.

Core System

The core system of GAP has four parts.
1. A kernel, written in C, which gives memory management. A set of functions, operating on integers, fields, permutations. An interpreter for the GAP language. The language is, as you can probably guess, untyped, and supports some form of OO programming. Some system functions for handling IO etc, testing and debugging tools, and a REPL.
2. A large library of GAP functions. Users can also add their own functions.
3. A library of "Group Theoretical data". I am guessing this means it has some hardcoded data for small Groups, and also seems to have other data, e.g. character tables.
4. Documentation.

The Gap Website has a great deal of information about the GAP core system (here).

Groups

If you have any interest in Group Theory then playing around in GAP for a few minutes can be quite interesting. There are a large variety of inbuilt functions. For example, Lets start with a Cyclic Group (\(Cyc(6)\).

Group([ (1,2,3,4,5,6) ])
gap> Order(c6);
6
gap> IsAbelian(c6);
true
gap> IsSimple(c6);
false

Since Cyclic groups have a single generator, we only needed one permutation to generate (\(Cyc(6)\). Obviously it is abelian, and it is not simple, since (\(Cyc(2)\) and (\(Cyc(3)\) are normal subgroups.

Checking Isomorphism

It is possible to check whether two groups are isomorphic. Checking isomorphism gts complicated for larger and larger groups but an easy test case is to determine whether \(Cyc(4) \cong Cyc(2)\times Cyc(2)\) (Spoilers: they are not isomorphic).

gap> LoadPackage("sonata");
true
gap> c4 := Group((1,2,3,4));
Group([ (1,2,3,4) ])
gap> c2 := Group((1,2));
Group([ (1,2) ])
gap> c2xc2 := DirectProduct(c2,c2);
Group([ (1,2), (3,4) ])
gap> IsIsomorphicGroup(c2xc2,c4);
false

We loaded the "sonata" package, which contained the definition of the function IsIsomorphicGroup (see here).

Sporadic Simple Groups

Mathieu Groups

GAP has inbuilt support for generating well-known groups and Sporadic Simple Groups. The first discovered sporadic simple groups were the Mathieu groups. There are actually 5 true Mathieu groups - permutation groups on sets of size 11,12,22,23,24 respectively. GAP also includes groups acting on sets of size 9,10,21, which aren't strictly sporadic simple groups (e.g. the group \(M_{21}\) is actually a projective linear group, which can be generated from permutations of a steiner system \(S(2,5,21)\).
Regardless, we can generate these groups using the MathieuGroup function. Below, we generate \(M_{24}, M_{23}, M_{22}\) and also look at the stabilizer of a single point in the set of 24 points which \(M_{24}\) permutes, and the stabilizer of a two-point set. As it happens \(M_23\) is isomorphic to the stabilizer of a single point. \(M_{22}\) is isomorphic to the stabilizer of a single point in the stabilizer of a pair in \(M_{24}\). We cannot prove these, but we can at least see the orders of the groups are the correct size.

gap> m24 := MathieuGroup(24);
Group([ (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23), (3,17,
10,7,9)(4,13,14,19,5)(8,18,11,12,23)(15,20,22,21,16), (1,24)(2,23)(3,12)(4,16)
(5,18)(6,10)(7,20)(8,14)(9,21)(11,17)(13,22)(15,19) ])
gap> m23 := MathieuGroup(23);
Group([ (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23), (3,17,
10,7,9)(4,13,14,19,5)(8,18,11,12,23)(15,20,22,21,16) ])
gap> x := Stabilizer(m24, 1);
Group([ (3,17,10,7,9)(4,13,14,19,5)(8,18,11,12,23)(15,20,22,21,16), (2,24,16,
5,8)(4,22,21,15,17)(7,11,20,10,13)(12,18,19,23,14), (6,22,17)(8,11,13)
(9,15,21)(10,12,23)(14,20,18)(16,24,19) ])
gap> m22 := MathieuGroup(22);
Group([ (1,2,3,4,5,6,7,8,9,10,11)(12,13,14,15,16,17,18,19,20,21,22), (1,4,5,9,
3)(2,8,10,7,6)(12,15,16,20,14)(13,19,21,18,17), (1,21)(2,10,8,6)(3,13,4,17)
(5,19,9,18)(11,22)(12,14,16,20) ])
gap> y := Stabilizer(m24, (1,2));
<permutation group of size 887040 with 10 generators>
gap> Size(m23);
10200960
gap> Size(x);
10200960
gap> Size(m22);
443520
gap> Size(y);
887040

Orbits and Stabilizers

GAP has functions for calculating the orbits and stabilizers of permutations, acted on by a group.
Just to recall, the Orbit-Stabilizer Theorem states:
Let \(G,X\) be a group and a set on which the group acts, respectively.
For any \(x \in X\), the orbit of \(x\), \(Orb(x)\), that is, all the elements of \(X\) for which there is a group action of \(G\) mapping \(x\) to that element.
The stabilizer of some \(x \in X\), \(Stab(x)\), is the set of \(g \in G\) such that \(gx \rightarrow x\). i.e. the set of group elements which map \(x\) to itself.
Then \[ |Orb(x)| \times |Stab(x)| = |G|.\]

We can calculate orbits and stabilizers. For this example, let's use the Dihedral Group \(D_{8}\), the permutation group of a square.

gap> d8:= Group((1,3),(1,2,3,4));
Group([ (1,3), (1,2,3,4) ])
gap> Size(d8);
8
gap> Orbit(d8,(1,2));
[ (1,2), (2,3), (3,4), (1,4) ]

The Orbit of the pair \((1,2)\) is \((1,2), (2,3), (3,4), (1,4)\). Let's take a look at it's stabilizer.

gap> stab:=Stabilizer(d8,(1,2));
Group([ (1,2)(3,4) ])
gap> Size(stab);
2

We can confirm that the size of the orbit (4) multiplied by the size of the stabilizer (2) is indeed equal to the size of the group (8). Moving on from that trivial example, let's find the orbit of the permutation (1,2,3,4,5) in the Mathieu group \(M_{24}\).

gap> m24:=MathieuGroup(24);
gap> orbit:= Orbit(m24,(1,2,3,4,5));
gap> Size(orbit);
1020096
gap> stab:=Stabilizer(m24,(1,2,3,4,5));
<permutation group of size 240 with 4 generators>
gap> Size(orbit) * Size(stab) = Size(m24);
true

We know that \(M_{24}\) acts on a set of 24 points. What is the orbit of a set of a subset of those 24 points, of size 5, say?

gap> orb:= Orbit(m24,[1,2,3,4,5],OnSets);
gap> Size(orb);
42504

That is, the total number of 5-sets of a 24-set. i.e. \(M_{24}\) is a 5-transitive group; any 5 points are mapped to any other 5 points. (This is a consequence of \(M_{24}\) being transitive on the octads of the \(S(5,8,24)\) steiner system on which it acts.

That is it for the first part of this initial dive into GAP. I will write Part 2 soon, as there are many, many more things GAP is capable of.

Thursday, April 12, 2018

Implementing Goodstein Sequences in J

I recently read about Goodstein sequences for the first time, and was interested in writing a J verb that can generate a sequence. Goodstein sequences are defined here: definition A Goodstein sequence is defined in terms of hereditary representations of numbers. An example of hereditary representation of 100 using base 2. \[ 100 = 2^{6}+2^{5}+2^{2} = 2^{2^{2}+2}+2^{2^{2}+1}+2^{2}.\] Let the hereditary representation of \(n\) using base \(2\) be \(G_{0}(n)\), i.e. the start of the sequence. Then \(G_{1}(n)\) uses the hereditary representation of \(n\) base 2, but exchanges 3 for 2 in all places in this representation. Finally, we minus 1. We can continue, so in general, \[G_{m+1}(n) = H_{m+1}(G_{m}) - 1\] where \(H_{a}(x)\) is the base \(a\) hereditary representation of \(x\). For example: \[G_{0}(100) = 100 = 2^{6}+2^{5}+2^{2} = 2^{2^{2}+2}+2^{2^{2}+1}+2^{2}.\] \[G_{1}(100) = 3^{3^{3}+3}+3^{3^{3}+1}+3^{3} - 1 = 228767924549636.\] \[G_{2}(100) = 34860300617850752458892464995335199931446351133540222782081259753676586478191222...\] \[G_{3}(100) = 59814694315693446387155462718734091954899936833389952752466750303416092596593874...\] This sequence grows explosively fast for most values of \(n\). Interestingly, Goodstein's Theorem states that it will always terminate at 0 eventually. For all positive values of \(n\). Even though it is futile to attempt computing a Goodstein sequence for most values of \(n\), as the numbers involved become unmanageable very quickly, it is still interesting and worthwhile writing a verb to do such a thing. Here is one implementation in J:

goodstein=: 4 : 0"0 0

if. y <: 0 do. 0 return.
elseif. y = x do. x+1 return.
end.  
s=. I. (x:x) (|.@:((>:@:>.@:^. # [) #: ])) x:y
+/ (x+1) ^ (x: x) goodstein s
)

G=: <:@:goodstein`]@.(0&=@:])

NB. generates sequence
genSeq=: 3 : 0"1
'base val its'=. y
c=. 0
vals=. val
whilst. its > c=. c+1 do.
  val=. base G val
  vals=. vals,val
  base=. base+1
  if. val = 0 do. break.end.
end.
vals
)

There are three verbs here, goodstein, G, genSeq. goodstein is a recursive dyadic verb which calculates the hereditary representation of the right argument, with base the left argument, and then exchanges occurences of the base by \(base+1\). The reason for the recursion is that for each exponent of the base in the hereditary representation sum, we also need to calculate a hereditary representation. G merely subtracts one from the return value of goodstein.FInally, genSeq, is used to simply generate a sequence. It takes three arguments; the start base (which should be 2); the starting value; the number of iterations to calculate. Example:

genSeq 2 100 3

will return 100 228767924549636 34860300617850752458892464995...

Tuesday, January 23, 2018

Very Simple Intro to Tensorflow, Part 1

This is a very simple intro to using Tensorflow. We will seriously under-utilize it on a basic classification problem - the famous iris dataset. The dataset contains 4 features, and 3 classes of Iris species (Iris setosa, Iris virginica and Iris versicolor). There are only 150 datapoints in the dataset. It can be found here. The source code here is mostly based on, and modified from, the great book, Hands-on Machine Learning with Scikit-Learn and Tensorflow. To do this we need the following Python packages tensorflow scipy numpy sklearn pandas matplotlib ipython jupyter We will do all our work using a Jupyter notebook. After pip installing all the necessary packages, type

jupyter notebook

Go to the newly opened tab in you browser and create a new notebook.

import tensorflow as tf
import numpy as np
import pandas
import matplotlib
import sklearn as sk

Let's grab the dataset (from the local filesystem):

data=pandas.read_csv("/path/to/iris.csv")"

We then need to label the classes. In their raw form they are represented as string literls - the name sof the iris species. But we want some numeric label, well tensorflow does, anyway. Then we need to split our dataset into a training set and a test set. I found the easiest way to split the dataset was just to use scikit-learn.

data=pandas.read_csv("/path/to/iris.csv")
X=data.T[:4].T.as_matrix()
Y=data.T[4:].T
y=pandas.get_dummies(Y)
Xd=X
yd=y.as_matrix()
labels = np.argmax(yd, axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.33)

Now we have our training and test sets, we can get to the fun part. We will create a 3-layered neural network with the aim to classify the who iris dataset, into the correct iris species classes. The below diagram gives a simple outline of the network. We have 4 inputs (the green nodes). The first hidden layer will have 10 nodes. This number is chosen relatively arbitrarily, and is, in all likelihood superfluous. We could probably get away with fewer nodes. But for the purposes of this demonstration we will stick with 10. The activation function for each node will be the sigmoid activation function. Tensorflow allows the choice of several saturating and non-saturating activation functions - sigmoid, relu, atanh for example). The next hidden layer will have 5 nodes, and will also use sigmoid activation functions. Lastly the output layer has three nodes - one node for each class - and will use the softmax activation function, to choose a single node, or single class, for any given input.

Let's create a 3-layer neural network

Xvar = tf.placeholder(tf.float32, shape=(None, 4), name="Xvar")
yvar = tf.placeholder(tf.int64,   shape=(None), name="yvar")
lr=0.01
with tf.name_scope("iris_dnn"):
    hidden1 = fully_connected(Xvar,10,  activation_fn=tf.nn.sigmoid)
    hidden2 = fully_connected(hidden1,5,activation_fn=tf.nn.sigmoid)
    logits= fully_connected(hidden2,3, activation_fn=tf.nn.softmax)
    
with tf.name_scope("loss"):
    xentropy=tf.nn.sparse_softmax_cross_entropy_with_logits(labels= yvar, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(lr)
    training_op = optimizer.minimize(loss)
    
with tf.name_scope("pred"):
    correct = tf.nn.in_top_k(logits,yvar,1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init=tf.global_variables_initializer()
saver=tf.train.Saver()

Now let's run the algorithm. We will run 1000 epochs, and use mini-batches of size 5. For each step we select 5 random datapoints. We will keep track of the training set accuracy and the testing set accuracy so we can plot a graph of how accuracy changes with each epoch.

n_epochs=1000
batch_size=5

tr_acc=[]
te_acc=[]
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(30):
            ind = np.random.permutation(len(X_train))[:batch_size]
            X_batch = X_train[ind] 
            y_batch = y_train[ind] 
            sess.run(training_op, feed_dict={Xvar:X_batch, yvar : y_batch}) 
            acc_train = accuracy.eval(feed_dict={Xvar: X_batch, yvar: y_batch})
            acc_test = accuracy.eval(feed_dict={Xvar: X_test, yvar: y_test}) 
            tr_acc.append(acc_train)
            te_acc.append(acc_test)
            print(epoch, "training acc: ",acc_train, "Test acc: ",acc_test)
            save_path = saver.save(sess,"./irismodel.ckpt")

We can see that after 400 epochs of training, on the admittedly tiny datset, we reach a very high auccracy on both the training set and the test set. This example is a very simple introduction into using Tensorflow, and it really does under-utilize Tensorflow's power, which is more effective for deep neural networks, with recurrent layers, convolutional layers etc, and with GPU integration. We can now look at the accuracy plot. We will use Matplotlib, to plot two line graphs, one for training accuracy, the other for test set accuracy.

import matplotlib.pyplot as plt
r = range(334)
plt.plot(r,tr_acc,r,te_acc)
plt.ylabel('accuracy on training set and test set')
plt.xlabel("epochs")
plt.show()

The graph will look something like this:

That is it for the first part of this introduction. I will add some more simple "use-cases" of Tensorflow later, before moving on to more interesting and powerful uses.