Information Security | In the Internet Age, How to Establish Trust?

Last week, I gave a technical presentation in the front-end group. Since the company is promoting "full English in the office," this was also a fully English presentation. Speaking entirely in English for an hour and a half was a challenge for me, and I would rate my performance as more courageous than skilled 🐶.

In the internet era, how do we build trust❓ The foundation of building trust is of course to ensure that information transmission is secure, so that users dare to communicate, shop, and make payments online... Today, let's take a step-by-step look at the development of information security in the internet era!

The "Outline of this Article" is as follows:

The "Keywords" are cryptography, symmetric key systems, asymmetric key systems, hashing, digital signatures, digital certificates, SSL/TLS, SSH, iOS signing, OpenSSL, WireShark.

💡: Don’t worry, we won’t discuss complex mathematical operations here.

Introduction#

We have a lot of code related to information security in our projects, such as RSA, AES, HMAC... Every time I encounter them, I feel confused and even overwhelmed, so I want to understand what they actually do.
I previously stumbled upon issues with iOS signing, which led to my last article (iOS | Illustrated Principles Behind iOS Signing, which you can check out if you're interested). This time, since it’s a presentation for the front-end group, it’s an expansion based on the previous article.

These reasons led to the creation of this article, and I hope you in the internet era will find it interesting.

Objectives of this Article#

To answer and understand the following questions:

🥣 Why is symmetric encryption + asymmetric encryption generally used for information transmission? Can we not use just one of them?
🥣 Why is a digital signature needed for information security?
🥣 Why is hashing required before signing?
🥣 Why is a digital certificate necessary for information security?

Ultimate Goal: When we encounter cryptography-related issues, we will no longer feel fear or confusion.

What is Information Security?#

This is a relatively broad question, and here I would like to answer it through the three elements of information security (referred to as CIA).

Source: comtact

The components of CIA are as follows:

Confidentiality: Refers to the information not being disclosed to unauthorized users or entities during storage, transmission, and use.
Integrity: Refers to the information not being tampered with by unauthorized users during storage, transmission, and use, or preventing authorized users from making inappropriate modifications to the information.
Availability: Refers to ensuring that authorized users or entities can normally use information resources without being denied access, allowing them to reliably and timely access information resources.
- ➕Authentication: Also understood as non-repudiation, it refers to both parties in network communication being assured of the authenticity of the participants and the information provided, meaning that all participants cannot deny or repudiate their true identity, as well as the integrity of the information provided and the operations and commitments made.
- ➕Controllability: Refers to the degree of control over the network system and information within the scope of transmission and storage.

Reference: 5 Security Features of Information Security——51know

💡 Here are two points that need clarification:

Two additional elements, authentication and controllability, have been added to availability. I believe these two serve availability, so they are grouped together.
This article will discuss the three elements: confidentiality, integrity, and authentication. We can treat them as three requirements, and the primary task of this article is to fulfill them. Additionally, the explanations above for these three elements are quite technical, so let me briefly introduce them in simpler terms:
1. ❗️Confidentiality: A sends a message to B and does not want C to see the message content.
2. ❗️Integrity: A sends a message to B and does not want C to modify the message content.
3. ❗️Authentication: A sends a message to B, and B can confirm that the sender's identity is A, not C.

Now, let's continue with our three requirements.

Why is Information Security Needed?#

Source: pixabay

Before discussing how to achieve it, let's first consider the reasons for needing information security. I can summarize it simply as "users need it, and companies must meet it." To elaborate, it can be divided into three points:

The need for information security is a common understanding, especially in fields involving money and user privacy, such as banking and e-commerce.
Analyzing from the perspective of internet users, if their information security cannot be guaranteed, how can they dare to shop, pay, take loans, or enter account passwords and other private information online?
For a company, if it cannot guarantee the information security of its users, it will lose the trust of its users, which is equivalent to losing users. What kind of development can such a company expect?

Therefore, in the internet era, information security is extremely necessary. Now, let's see how to achieve it!

How to Achieve Information Security#

❗️Remember our three requirements: Confidentiality, Integrity, Authentication.

Source: electronicdesign

When it comes to information security, we cannot overlook the role of cryptography, as the three basic security goals of cryptography (confidentiality, integrity, availability) directly address the three requirements mentioned above.

First, we can learn about the history of cryptography from the following videos:

Brief overview: The History of Cryptography｜Explained For Beginners——Binance Academy, Youtube
Detailed overview:
- Secret Codes: A History of Cryptography (Part 1)——The Generalist Papers, Youtube
- More Secret Codes: A History of Cryptography (Part 2)——The Generalist Papers, Youtube

Now, you should have a general understanding of cryptography. Next, let's look at the three common algorithms in cryptography.

Three Common Cryptographic Algorithms#

They are symmetric key algorithms, asymmetric key algorithms, and hashing algorithms.

Symmetric Encryption Algorithm#

Source: preyproject

As shown in the yellow box in the image above, the process of symmetric encryption is that the sender encrypts the plaintext using a key (Secret Key), resulting in ciphertext, which is then sent to the receiver. The receiver decrypts the ciphertext using the same key to obtain the plaintext.

💡 The characteristic of symmetric encryption: The key used for encryption/decryption is the same (Same Key).

Q1: Can you think of some common symmetric encryption algorithms?

DES, 3DES, AES, IDEA, SM1, SM4, RC2, RC4.

Among these, except for RC4, which is a stream cipher (encrypting/decrypting one bit or byte at a time), the others are block ciphers (encrypting after averaging into N groups and then combining in order).

The purpose of listing these algorithm names is to help you clearly understand what they do when you encounter these terms, so you can explore the mathematical principles if you're interested.

Reference: Basics of Cryptography (I) Common Cryptographic Algorithm Classification——Blog

Q2: Looking at the image above, you might wonder: How should the key be sent to the receiver? The receiver needs it to decrypt the ciphertext.

This is a good question. In the real world, we could meet in secret to exchange the key, but in the internet world, hackers can easily intercept your communication. Therefore, the biggest challenge of symmetric encryption algorithms is the key distribution problem.

How to solve it? This leads us to the following asymmetric encryption algorithm.

Asymmetric Encryption Algorithm#

Source: preyproject

As shown in the yellow box in the image above, the process of asymmetric encryption is generally similar to that of symmetric encryption.

💡 The only difference is the characteristic of asymmetric encryption: The keys used for encryption/decryption are different (Different Key).

Just like the symmetric key in symmetric encryption, the private key (Secret Key / Private Key) in asymmetric encryption is very private and important and should not be shared with others, while the public key (Public Key) can be freely distributed.

If you want to communicate securely with the sender, you can send the public key to the sender for them to use for encryption. Then, when you (the receiver) receive the ciphertext, you can decrypt it using the private key.

Q1: Can you list some common asymmetric encryption algorithms?

RSA, ECC, DSA, ECDSA, SM2.

Q2: Since asymmetric encryption solves the key distribution problem, does that mean symmetric encryption is no longer needed?

Not at all. Asymmetric encryption also has its drawbacks; its disadvantage is that the encryption speed is much slower than that of symmetric encryption (symmetric encryption is fundamentally bitwise operations, while asymmetric encryption involves exponentiation and modular operations).

Therefore, this leads us to the goal question 1: Why is symmetric encryption + asymmetric encryption generally used for information transmission? Can we not use just one of them?

We will discuss this further in the section on "Encryption Methods for Information Transmission." Next, let's introduce the last common cryptographic algorithm—hashing algorithm.

Hashing Algorithm#

Source: hackmd.io

As shown in the image above, a hashing algorithm can convert any data into a fixed-length code, which we generally refer to as a hash value or digest.

💡 Characteristics of hashing algorithms:

"Unique" Identification: The same input always produces the same output; different inputs will most likely produce different outputs. (Because it is "most likely," the uniqueness is in quotes.)
Irreversibility: The output cannot be used to deduce the input.

We can think of the hash value as the "fingerprint" of the original data. At a crime scene, although we cannot deduce what the corresponding person looks like from a fingerprint (irreversibility), we can compare the fingerprints found at the scene with those of suspects (or in a fingerprint database) to identify the perpetrator!

💡 Combining the two characteristics above, hashing algorithms generally have two uses:

Verifying whether data has been modified

When we download certain software, we often see a hash value (MD5) attached near the download link. What is its purpose?

In fact, this hash value acts like a "fingerprint" of the original download package A. If the package we downloaded was altered during the download process and became a fake package B, we can calculate the hash value B-hash (MD5) of the fake package B and compare it with the original package A's hash value A-hash. If B-hash and A-hash do not match, then we can conclude that package B is "suspicious."

Additionally, the subsequent digital signature technology will also use hash functions, which we will discuss later.

Storing User Privacy

A platform's database needs to store users' account names and passwords. If passwords are stored in plaintext, it can be dangerous; if the database is compromised, all passwords could be leaked.

Therefore, the database stores hash values of passwords. When a user inputs their password to log in, it only needs to compare the original password with the hash value of the input password. This is why we can only reset our password on a platform if we forget it; the platform does not know our original password!

At this point, let's look at two small Q&As.

Q1: Can you list some common hashing algorithms?

MD5, SHA-1, SHA-2, SHA-3, HMAC, SM3.

Q2: The SM series algorithms are recognized and published by our country. Do you know what their full name is?

Because they are recognized by our country, SM actually comes from the pinyin, and its full name is 商（S）用密（M）码.

Returning to the second use of hashing algorithms (storing user privacy), there are some risks, namely 🌈Rainbow Attacks.

Source: ckd3

As shown in the rainbow table above, for some commonly used passwords, their corresponding hash values (MD5) are already well-known. If hackers obtain these hash values, it is equivalent to having access to the original passwords. Therefore, we need additional measures to reduce the risk of rainbow attacks. Here is a website based on rainbow tables cmd5 that you can check out if you're interested.

For example, hashing with salt and HMAC. The former adds a random number (salt) to the plaintext before calculating the hash value; the latter is even more secure, combining a key (a pre-shared symmetric key) with the plaintext before calculating the hash value.

References:

Now, we have covered the three common cryptographic algorithms. Next, based on these foundational algorithms, let's discuss how to achieve our three requirements. Do you remember them? Confidentiality, Integrity, Authentication.

Encryption Methods for Information Transmission#

❗️Achieving Requirement 1: Confidentiality.

🥣 Answering Goal Question 1: Why is symmetric encryption + asymmetric encryption generally used for information transmission? Can we not just use asymmetric encryption?

First, let's answer question 1. From our previous learning, we know that both encryption methods have their shortcomings but can complement each other.

🥣 Therefore, combining the advantages of symmetric encryption (fast speed) and asymmetric encryption (ease of key distribution) to achieve secure information transmission means using asymmetric encryption to transmit the symmetric key, and then using the symmetric key for subsequent communication. This is currently the best-known method.

PS: Regarding key exchange methods, in addition to the method based on asymmetric encryption mentioned above, there are actually two other ways to exchange keys.

Dedicated key exchange algorithms, such as DH(E), ECDH(E);
Pre-deployment methods, such as PSK, SRP.

In summary, remember one thing: because symmetric encryption has good performance, a large amount of frequent communication data is encrypted using symmetric keys. The keys mentioned for exchange are also symmetric keys.

❗️With encryption methods in place, the confidentiality of information transmission is guaranteed, thus fulfilling our first requirement.

Now, how do we ensure the integrity of the information? This is where digital signatures come into play.

Digital Signatures#

❗️Achieving Requirement 2: Integrity.

🥣 Answering Goal Question 2: Why is a digital signature needed for information security?

🥣 Answering Goal Question 3: Why is hashing required before signing?

Digital signatures are generally included with the data to be transmitted to prevent data tampering. Next, let's discuss how they achieve this.

First, their underlying core is the hashing algorithm and asymmetric encryption algorithm mentioned earlier.

Generation (Signing)#

The signature is generated by the sender in the communication. The sender first hashes the data to be transmitted to obtain the data digest (which is the hash value), and then uses the private key to compute the digest, thus generating the signature for the data.

Verification (Signature Verification)#

Upon receiving the data and its signature, the receiver performs the following actions:

Data: Uses the same hashing algorithm as the sender to compute the actual data digest A;
Signature: Uses the public key corresponding to the sender's private key to compute the original data digest B from the signature.

By comparing digest A and digest B, if they are equal, it indicates that the actual data has not been tampered with; otherwise, there is an issue with the data (isn't this somewhat similar to using hashing to verify data integrity? However, the security level of signatures is clearly higher because it also involves the protection of the private key).

The entire process of generating and verifying the signature is as follows. Now, returning to Goal Questions 2 and 3, do you have the answers?

🥣2: Why is a digital signature needed for information security?

Simply put, it is to ensure the integrity of the information (❗️ thus fulfilling Requirement 2).

🥣3: Why is hashing required before signing?

First, consider two questions:

What is the purpose of hashing? It converts any data into a fixed-length code.
The essence of signing is asymmetric encryption. Does it have any shortcomings? Yes, it has low performance.

Therefore, signing a large piece of data is slower than signing a small piece of data, and coincidentally, hashing algorithms can largely ensure the uniqueness of the data.

However, 1) speeding up the signing process is only part of the answer. If hashing is not performed before signing, there will also be 2) security risks:

Reordering. If the message to be transmitted is too long and exceeds the maximum length supported by the asymmetric encryption algorithm, the system can only perform segmented signing, resulting in multiple signatures. The receiver then verifies each signature individually. However, if this is the case, the order of the segmented messages cannot be guaranteed to remain unaltered, as each signature can still be verified (some may think that multiple signatures can be combined and then generate a single signature, but this may again be limited by the maximum length supported by the asymmetric encryption algorithm).
Message Forgery. Hackers can capture any signature and deduce the plaintext message, allowing them to assemble and use it later (because the public key used for deduction is easily obtainable).

Reference: Why hash the message before signing it with RSA?——StackExchange

Now, we are left with one requirement❗️ and one goal question🥣.

First, let's think about a question:

Q: How does the receiver obtain the public key used for signature verification?

Please continue reading.

Digital Certificates#

💡 Achieving Requirement 3: Authentication.

🥣 Answering Goal Question 4: Why is a digital certificate necessary for information security?

Before understanding digital certificates, regarding the question "How does the receiver obtain the public key used for signature verification?" we might immediately think that the sender could simply attach the public key when sending the data and signature, and then everything would be in place. As shown in the image below:

However, this introduces a new question Q: How does the receiver confirm that the public key has not been maliciously replaced by someone else? In other words, the identity of the public key is unclear.

🎉 Ding ding ding! Now it's time for the digital certificate to come into play.

Components#

Let's first look at the components of a digital certificate. It consists of the public key, the identity information of the public key, and their signature (another signature).

Note:

The private key used to encrypt the public key data is another public-private key pair issued by an authoritative certification authority (Certificate Authority, CA). We certainly do not have the CA's private key.
How to understand CA? We can relate it to our ID card, which is issued by the government.

Public Key Wrapped in Digital Certificate Transmission#

Now, the content we send has changed slightly: public key → certificate.

This means that the original data + signature + public key has changed to data + signature + certificate.

The certificate contains the public key needed by the receiver to verify signature A, as well as the identity information of the public key and signature B:

The existence of identity information eliminates the risk of unclear public key identity (❗️ thus fulfilling Requirement 3: Authentication, 🥣 answering Goal Question 4: Why is a digital certificate necessary for information security?);
Signature B guarantees the integrity of the public key and its identity information.

However, because of signature B, we also need to verify the signature B when retrieving the public key:

The public key used to verify signature B is also issued by the CA and is found in the CA certificate.

(Here we can review the components of the certificate: public key + identity information of the public key + their signature.)

♻️ You might have another question Q: How does the receiver obtain this CA certificate? Even if they have the CA certificate, how do they verify the signature of this CA certificate? It seems to enter an infinite loop.

In fact, CA certificates are generally built into the system/software during installation, so we should trust them, right?

Next, let's learn about the trust chain of certificates.

Certificate Trust Chain#

Based on the position of the certificate in the trust chain, certificates can be divided into three types:

Root Certificate
Intermediate Certificate
Leaf Certificate

🌰 For example: My developer certificate A (Apple Development) is issued by intermediate certificate B (Apple Worldwide Developer Relations Certification Authority, built-in when installing Xcode), and intermediate certificate B is issued by root certificate C (Apple Root CA, built-in in the system), while root certificate C is issued by its own CA, as C is at the top of the trust chain and has the final say.

Now, you can also check your Mac > Keychain Access > Certificates to deepen your understanding.

There is also an interesting question Q: Can certificates be tampered with?

Direct modification? First, consider directly modifying the certificate's content (public key and identity information). This is certainly not feasible, as hackers do not have the CA's private key and cannot re-sign the certificate content.
Direct packet dropping? This is also not possible. The certificate contains the identity information of the sender. For example, if I access a website (sender) through a browser (receiver), the website's certificate will contain domain information, which the browser can directly compare with the requested domain to determine if the certificate has been tampered with.

Reference: Thoroughly Understand the Encryption Principles of HTTPS——Zhihu

Expansion (Specifications, Systems, Standards): Basics of Cryptography (II) Digital Certificates and Key Basics——Blog

At this point, we have fulfilled all three major requirements❗️ and answered all four goal questions🥣! Let's take a breather.

Finally, let's discuss some additional content:

Related technologies for information security (SSL/TLS, SSH, iOS signing)
Some practical applications (OpenSSL, WireShark).

Due to the length of this article, please refer to another article: Additional Content: Information Security | How to Build Trust in the Internet Era?.

Returning to the Initial Goals#

If you can only remember a little from this presentation 🤏, then try to understand the answers to the questions below!

Why is symmetric encryption + asymmetric encryption generally used for information transmission? Can we not just use asymmetric encryption?
Why is a digital signature needed?
Why is hashing required before signing?
Why is a digital certificate necessary?

Additionally, have you achieved the ultimate goal of this article? I look forward to your feedback!

Ultimate Goal: When we encounter cryptography-related issues, we will no longer feel fear or confusion.

——Written on a day of red rainstorm warning⛈️, when I decided to take a day off in Shenzhen