Problems with HTTP and Plain text
The Internet is an untrusted channel of communicate. When you send or receive information from an old HTTP site http://www.example.com in your browser, a lot of things can happen mid-way to your packets.
- A bad actor can intercept the communication, copy the data for themselves, before resending it again on the channel towards you or the server you were talking to. Without the knowledge of either parties, the information is compromised. We need to ensure that the communication is private.
- A bad actor can modify the information as it is being sent over the channel. Bob might have sent a message “x” but Alice would receive “y” from Bob, because a bad actor intercepted the message, and modified it. In other words, the integrity of the message is compromised.
- Lastly, and most importantly, we need to ensure that the person we are talking to is indeed who they say they are. Going back to the example.com domain. How can we make sure that the server that replied back to us is indeed the rightful holder of www.example.com? At any point in your network, you can be misdirected to another server. A DNS somewhere is responsible for converting a domain name, such as www.example.com, into an IP address on the public internet. But your browser has no way of verifying that the DNS translated IP address.
The first two problems can be solved by encrypting the message before it is sent over the Internet to the server. That is to say, by switching over to HTTPS. However, the last problem, the problem of Identity is where a Certificate Authority comes into play.
Initiating Encrypted HTTP sessions
The main problem with encrypted communication over an insecure channel is “How do we start it?”
The very first step would involve the two parties, your browser and the server, to exchange the encryption keys to be exchanged over the insecure channel. If you are unfamiliar with the term keys, think of them as a really long randomly generated password with which your data will be encrypted before being sent over the insecure channel.
Well, if the keys are being sent over an insecure channel, anyone can listen on that and compromise the security of your HTTPS session in the future. Moreover, how can we trust that the key being sent by a server claiming to be www.example.com is indeed the actual owner of that domain name? We can have an encrypted communication with a malicious party masquerading as a legitimate site and not know the difference.
So, the problem of ensuring identity is important if we wish to ensure secure key exchange.
Certificate Authorities
You may have heard of LetsEncrypt, DigiCert, Comodo and a few other services that offer TLS certificates for your domain name. You can choose the one that fits your need. Now, the person/organization who owns the domain has to prove in some way to their Certificate Authority that they indeed have control over the domain. This can be done by either create a DNS record with a unique value in it, as requested by the Certificate Authority, or you can add a file to your web server, with contents specified by the Certificate Authority, the CA can then read this file and confirm that you are a the valid owner of the domain.
Then you negotiate a TLS certificate with the CA, and that results in a private key and a public TLS certificate issued to your domain. Messages encrypted by your private key can then be decrypted by the public cert and vice versa. This is known as asymmetric encryption
The client browsers, like Firefox and Chrome (sometimes even the Operating system) have the knowledge of Certificate Authorities. This information is baked into the browser/device from the very beginning (that is to say, when they are installed) so they know that they can trust certain CAs. Now, when they try and connect to www.example.com over HTTPS and see a certificate issued by, say DigiCert, the browser can actually verify that using the keys stored locally. Actually, there are a few more intermediary steps to it, but this is a good simplified overview of what’s happening.
Now that the certificate provided by www.example.com can be trusted, this is used to negotiate a unique symmetric encryption key which is used between the client and the server for the remaining of their session. In symmetric encryption, one key is used to encrypt as well as decryption and is usually much faster than its asymmetric counterpart.
Nuances
If the idea of TLS and Internet security appeals to you, you can look further into this topic by digging into LetsEncrypt and their free TLS CA. There’s a lot more minutiate to this entire rigmarole than stated above.
Other resources that I can recommend for learning more about TLS are Troy Hunt’s Blog and work done by EFF like HTTPS Everywhere and Certbot. All of the resources are free to access and really cheap to implement (you just have to pay for domain name registration and VPS hourly charges) and get a hands on experience.