Was the Merkle Tree forgotten in the University of Melbourne Blockchain project for student records?

Here all the way down-under in Australia, we don't have the constant joy brought by gossiping about the latest presidential dramas, perhaps because our honorary prime minister doesn't amuse us as often as his U.S. counterpart. It's thus understandable that the latest favorite topic in our little Sydney Blockchain fan club is the announcement of University of Melbourne's project using Blockchain for student records. I'll take this as an opportunity to comment on the broader identity topic, and I count the many 'likes' my previous LinkedIn identity article from last year as a justification for the effort to write it :)

This is a two-section blog post, using the project done by the University of Melbourne as an example throughout. The first piece focuses on the basics: certificates and Merkle-Tree. The second will be about Blockchain identity management.

The University of Melbourne project uses the intellectual work of blockcerts. This is how I believe it works from reading MIT Media Lab's blogs who has built it:

  1. A student Alice installs an identity wallet, which generates her private and public keys. It also derives her bitcoin address.

  2. Alice makes the university know her public key. It is crucial for the university to know that it is Alice who is sharing the address, hence it has to be done in person.

  3. The university issues a certificate, sending its hash to her bitcoin address on the bitcoin Blockchain, and the certificate to the student's identity-wallet.

  4. Having the certificate in her identity-wallet, she is able to display it to a third party or send it to anyone who needs to validate her identity. The validator can hash it, and validate it by its matching Bitcoin transaction.

This simple explanation requires some elaboration.

What is a certificate?

In this context, a certificate is a description of the truth. This will do for a minimal certificate:

"In 2017, Alice graduated from the University of Melbourne with a degree of Bachelor of Science."

That's it, all of it. It wasn't that simple before Blockchain technology existed - a traditional certificate would have those:

𝑎. Alice's public key
𝑏. the description of truth ("Alice graduated....")
𝑐. the hash of 𝑎+𝑏.
𝑑. the signature on the hash 𝑐.

Blockcerts' implementation has a transaction to Alice's bitcoin address which has the following information:

𝑎. Alice's public key in the output script*
𝑏. The hash of the certificate (the previous whole trunk)
𝑐. The hash of 𝑎+𝑏 (in fact, the transaction)
𝑑. The signature on the hash 𝑐.

*A bitcoin expert will point out that it's usually the hash of the public key. Since the transaction is a 'strange transaction' here, we can define our game.

A bitcoin transaction served the function of parts 𝑎, 𝑐 and 𝑑 in the certificate, leaving only one necessary component of the certificate, the truth part. So, a certificate is effectively reduced to the truth itself:

Due to some engineering necessity, blockcerts' version of the truth is formatted. In our case, "In 2017, Alice graduated from University of Melbourne with a degree of Bachelor of Science." looks like this after the formatting:

{ 
"name" : "Alice",
"graduationYear": "2017",
"Issuer": "Melbourne University",
"degree": "Bachelor of Science"
}

The truth is fitted into a table of fields and values. The format used here, as decided by blockcerts, is JSON format. Format is often compared to taste. My poison is the good old ASN.1, a taste which I acquired by working with LDAP since 2004. But taste aside, the best format here is neither. It's the Merkle-Tree.

The Merkle-Tree

Merkle-Tree is the buzzword in Blockchain. If you walk into an interview hoping to get a job in the Blockchain space and the interviewer throws you a problem, you can often start by saying: Let's use Merkle-Tree.

And such an answer would be right in this context! What's the best format for a certificate? Merkle-Tree!

Using Merkle-Tree for certificates would provide us additional privacy benefits. Since it's not actually used in the case with University of Melbourne, I'll keep this blog post short by explaining the benefits should it be used, without explaining in detail how it works.

Merkle-Tree is a data structure. Validating some parts of the Merkle-Tree does not require access to the whole tree. By validation, I mean checking if it is not modified or corrupted.

A simple example is getting the telephone number of your clinic. Let us suppose the whole phone-book is a Merkle-Tree, of which you know its hash. It's so big, that it is hosted on a website. If you ask for the clinic's phone number, the website would provide it to you. But what if the website has been compromised, and their book is thus a phony book, not the phone book you know? Remember in Blockchain, no one is trusted until they provide a proof.

The website can return the phone number from the Merkle-Tree. The phone number from the Merkle-Tree will be a bit longer than the phone number itself, but certainly not nearly as big as the whole phone-book.

The beauty is, as long as you know the hash of the whole phone-book, you can conclude whether or not the phone number returned to you is the same one in that phone-book, without having access to the entire phone-book.

Let's see why it is relevant to our discussion. Alice, our protagonist, has the identity-wallet mentioned earlier. When she goes to a bar. She has her identity information in her identity-wallet:

{
"name" : "Alice",
"graduationYear": "2017",
"Issuer": "University of Melbourne",
"degree": "Bachelor of Science"
}

Had the University of Melbourne decided to have it in Merkle-Tree format, it would look like this:

To order a cup of wine (yes, we do drink wine too in Australia) she needs to give the bartender confidence in her age, but she doesn't want to reveal her name. This can be done if the certificate were in a Merkle-Tree format. She would take one piece of information from her certificate Merkle-Tree:

It would be like taking a phone record out of the big phone-book, except, in this case, not to save storage, but for privacy protection. This piece from the Merkle-Tree maintains data integrity so that the bartender (his computer) can verify this piece of information against the bitcoin Blockchain the same way as he would verify the certificate if it was presented in its entirety.

That's it, we talked about what is a certificate, how it is used on the bitcoin Blockchain, and how to improve it with a Merkle-Tree structure. In the next blog post, we will examine the security risks of students from the University of Melbourne.