Blockchain outstanding questions -

Last week we left a few unanswered questions while talking about blockchain technologies. I would like to address those unanswered questions this week. The first one that comes to mind is the problem of double spending. Double spending can occur with a digital currency or any digital payment processing system, unless all payments are authorized by a single, central authority. Blockchain does not have a single central authority so in the early stages of development, the main problem that had to be addressed was the ability for the same currency to be used in two transactions simultaneously, resulting in a double spend.

The double spend problem was solved by only allowing a single path in the chain, and making each link depend on the hash code of the prior link. If two efforts to spend on the same chain occurred at exactly the same time, only one transaction would be processed. The other would have an invalid hash code linking to the previous transaction and be ruled invalid. This solution raised the second question, “What is a hash?”

A hash is created by a computational function, called a hash function. A hash function maps data of any size to a fixed size value. There are three basic rules to a hash function. The first is that each time you encode, or hash, the data you get the same results. The second is that small changes, even a single character change in the data must result in a different hash. The third is that a hash function cannot be reversed, meaning that you cannot use the hash to recreate the original data. Two pieces of data with a large difference can result in the same hash. An example of a trivial hash function is a function that maps names to a two-digit number. John Smith is 02, Lisa Smith is 01, Sam Doe is 04, and Sandra Dee is also 02. We won’t get into exactly how the mapping takes place, because it is a very advanced topic. All we need to know to understand how a hash function works is that it maps input data to a given set of specific values, like the example maps names to numbers between 00 and 99.

We mentioned that the hash function is used to tie the links in the blockchain together. The links form a Merkle tree. In cryptography and computer science, a Merkle tree is a data structure that links data in a single direction from a leaf to parent. In a Merkle tree, each leaf node is labeled with the hash of the data it contains, and every parent node is labeled with the cryptographic hash of the labels of its child’s nodes. Merkle trees allow for efficient and secure verification of the contents of large data structures, like transactional databases used in blockchains.

Merkle trees are named for Ralph Merkle who patented the technology in 1979. They are used to verify any stored data that is handled and transferred in and between computers. They ensure that datablocks are not changed by other peers in a peer-to-peer network, either by accidental corruption or fake blocks created by malicious systems on the network. This makes it difficult, but not impossible, to introduce fake data into a blockchain as it merely requires creating a data block that matches the hash of the block you are replacing, in effect, corrupting the tree. However, generating the fake data block is a time-consuming process and likely to not be completed by the time the next real block is generated, making it impossible to inject your change.

Join me again next week for an overview of data-structures and their applications.