Hashing
What is Hashing?
In computer science, hashing is the technique of creating a fixed-size output from a variable-size input. Hash functions are mathematical formulas that accomplish this (implemented as hashing algorithms).
The so-called cryptographic hash functions are at the heart of cryptocurrencies, despite the fact that not all hash functions use encryption. As a result, distributed systems such as blockchains can reach high levels of data integrity and security. Deterministic hash functions can be used for both traditional and cryptographic hash functions. the algorithm will always produce the same output regardless of whether or not the data input changes (also known as digest or hash).
When it comes to cryptocurrencies, the hashing algorithms are supposed to be a one-way function, which means that they can't be readily reversed. That is to say, creating an output from an input is simple, but going the other way is more complicated (to generate the input from the output alone). As a general rule, the harder it is to identify the input, the more secure the hashing algorithm is considered to be.
How does a hash function work?
Different hash functions will produce outputs of differing sizes, but the possible output sizes for each hashing algorithm is always constant. For instance, the SHA-256 algorithm can only produce outputs of 256 bits, while the SHA-1 will always generate a 160-bits digest.
To illustrate, let’s run the words “DeFiDocs” and “defidocs” through the SHA-256 hashing algorithm (the one used in Bitcoin).
SHA-256
Input
Output (256 bits)
DeFiDocs
f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191
defidocs
59bba357145ca539dcd1ac957abc1ec5833319ddcae7f5e8b5da0c36624784b2
Note that a minor change (the casing of the first letter) resulted in a totally different hash value. But since we are using SHA-256, the outputs will always have a fixed size of 256-bits (or 64 characters) - regardless of the input size. Also, it doesn’t matter how many times we run the two words through the algorithm, the two outputs will remain constant.
Conversely, if we run the same inputs through the SHA-1 hashing algorithm, we would have the following results:
SHA-1
Input
Output (160 bits)
DeFiDocs
7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1
defidocs
e58605c14a76ff98679322cca0eae7b3c4e08936
Notably, the acronym SHA stands for Secure Hash Algorithms. It refers to a set of cryptographic hash functions that include the SHA-0 and SHA-1 algorithms along with the SHA-2 and SHA-3 groups. The SHA-256 is part of the SHA-2 group, along with SHA-512 and other variants. Currently, only the SHA-2 and SHA-3 groups are considered secure.
Why do they matter?
Typical applications for conventional hash functions include database lookups, huge file analytics, and data management. A different purpose for cryptographic hash functions is to verify messages and create digital fingerprints, both of which are critical components of modern information security. Bitcoin's mining relies heavily on cryptographic hash functions, which are used to generate new addresses and keys as well.
It is only when dealing with massive amounts of data that hashing can truly shine. For example, a hash function can be used to quickly validate the accuracy and integrity of a large file or dataset. Using hash functions, this is achievable since the input results in a simplified, condensed output every time (hash). Such a strategy eliminates the necessity of storing and 'remembering' massive volumes of data.
Cryptographic Hash Functions
A cryptographic hash function, on the other hand, is one that makes use of cryptographic techniques. In general, a large number of brute-force tries are required to crack a cryptographic hash function. To "revert" a cryptographic hash function, one would have to estimate the input until the equivalent result was created through trial and error. On the other hand, a "collision" might occur when distinct inputs produce exactly the same outcome.
To be considered secure, a cryptographic hash function must meet three requirements. We can refer to these as collision resistance, preimage resistance, and second preimage resistance.
Each property's logic will be summarized in three sentences before we go into the specifics:
No two distinct inputs can ever create the identical hash as an output due to collision resistance.
It is impossible to "revert" the hash function because of preimage resistance (find the input from a given output).
It is impossible to find a second input that collides with a defined input.
Collision Resistance
A collision occurs when two or more inputs generate the same hash. So long as no one detects a collision, a hash function is deemed collision-resistant until such time as one does. For any hash function, collisions are inevitable since the number of possible inputs is unlimited and the number of possible outputs is limited.
To put it another way, a hash function is said to be collision-resistant if it would take millions of years of calculations to identify a collision. Even if collision-free hash functions do not exist, some of the existing hash functions are strong enough to be deemed resistant (e.g., SHA-256).
The SHA-0 and SHA-1 algorithms, among others, are no longer secure due to the discovery of collisions. At this time, the SHA-2 and SHA-3 cryptographic hashing algorithms are deemed impervious to collisions.
Preimage Resistance
The concept of one-way functions is closely linked to the trait of preimage resistance. Because of the minimal likelihood that an output can be traced back to its source, it is said to be preimage-resistant for a hash function.
It's important to note that this attribute differs from the previous one because an attacker would be trying to infer what the input was based on the result. On the other hand, when two separate inputs yield the same output, it does not matter which inputs were used. A collision happens when this occurs.
Preimage resistance is an important data security feature since it allows the hashing of a message to establish its authenticity without disclosing the contents of the message itself. A lot of service providers are really using hashes of passwords, rather than plaintext ones, instead of storing and using the passwords themselves.
Second-Preimage Resistance
The second-preimage resistance, to put it simply, lies somewhere in the middle of the other two qualities. In the event of a second-preimage assault, the attacker is able to identify an input that produces the same outcome as an input that the attacker already knows.
If two random inputs yield the same hash, the second-preimage attack looks for an input that generates the same hash that was formed by another unique input, rather than searching for two random inputs.
Second-preimage assaults, on the other hand, are impossible if the hash function is resistant to collisions. Even if a function is collision-resistant, a preimage attack can still be used to find a single input from a single output.
Closing Thoughts
When working with large amounts of data, hash functions are a must-have tool in the computer science toolbox. Hashing techniques, when used in conjunction with cryptography, can provide a wide range of security and authentication options. As a result, understanding the features and functioning principles of cryptographic hash functions is essential for everyone interested in blockchain technology.
Last updated
Was this helpful?