# A Crash Course on Web Tech, or, ## "Everything\* that happens after you hit enter in the URL bar": ### Web basics for CS 3710 <div class="text-center"> _(\*not actually everything)_ </div> === ## What happens after you hit enter? --- ## What happens after you hit enter? You might take it for granted that after you type in `www.google.com` and hit `Enter`, your browser takes you directly to Google. <div class="text-center"> <img src="../../img/web/google_homepage.png" style="max-height: 10em;"> </div> But what's actually going on inside your computer? notes: Worth mentioning that this is a common interview question that students might need to respond to someday. --- ## What happens after you hit enter? <div class="container"> <div class="col"> _**Preface:**_ we could easily spend the entire semester covering every technology involved in fetching, serving, and rendering a webpage. But we won't. If you want a more complete answer, I'd suggest taking a look at <div class="text-center text-small"> [https://github.com/alex/what-happens-when](https://github.com/alex/what-happens-when) </div> for a start. </div> <div class="col"> <div class="text-center"> <img src="../../img/web/what_happens_when_toc.png"> </div> </div> </div> === ## Networking: IP addresses and ports --- ## The client-server relationship The World Wide Web is largely built on the _**client-server model**_, with machines classified as either *clients* or *servers*. - _**Client:**_ wants to fetch some data from the internet - _**Server:**_ services the client's request by responding with some data <div class="text-center image-background"> <img src="../../img/web/Client-server-model.svg"> </div> --- ## The client-server relationship Your computer (*the client*) needs to find the right server to talk to to service its request for the domain `www.google.com`. To do this, it first finds the _**IP address**_ for Google. --- ## IP addresses An _**Internet Protocol (IP) address**_ is a label used to route packets from the server to the client, and vice-versa. An IPv4 address is a four-byte value, commonly written in the form <div class="text-center"> `XXX.XXX.XXX.XXX` </div> where `XXX` is a number from `0` to `255`. **Example:** `127.0.0.1` is a special "reserved" IP address that can be used to let your computer talk to itself. --- ## Aside: IP subnets It's often useful to refer to *ranges* of IP addresses, instead of just individual IP addresses. <div class="fragment"> For instance, IP addresses prefixed by `127` (i.e., all IP addresses from `127.0.0.0` - `127.255.255.255`) are used as *loopback addresses*, i.e., IP addresses your computer can use to talk to itself. </div> <div class="fragment"> To refer to these subnets, we usually use _**Classless Inter-Domain Routing**_ (_**CIDR**_) notation. </div> notes: References: - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) --- ## Aside: IP subnets **Example:** here's how we refer to the subnet with all IP addresses prefixed by `127`: <div class="text-center text-bold"> `127.0.0.0/8` </div> <div class="fragment"> The `/8` says "keep the first 8 bits fixed, the other 32 - 8 = 24 bits can be anything" </div> <div class="fragment"> The IP address is a four-byte value separated by `.`, so - The first eight bits = `127` </div> <div class="fragment"> - The last 24 bits = `0.0.0` </div> --- ## Aside: IP subnets Some common IP subnets include: <div class="fragment"> `127.0.0.0/8` (`127.0.0.0` - `127.255.255.255`): used for *loopback interfaces*, which allow your computer to communicate with itself. </div> <div class="fragment"> Subnets used for private networks: - `10.0.0.0/8` (range: `10.0.0.0` - `10.255.255.255`) - `192.168.0.0/16` (range: `192.168.0.0` - `192.168.255.255`) - `172.16.0.0/12` (range: `172.16.0.0` - `172.31.255.255`) </div> notes: Reference: - [Reserved IP addresses](https://en.wikipedia.org/wiki/Reserved_IP_addresses) --- ## DNS: converting a hostname to an IP address How do we find the IP address corresponding to `www.google.com`? For this, we use the _**Domain Name System (DNS)**_. <div class="fragment"> DNS provides a way of mapping human-readable domains like `www.google.com` into IP addresses. <div class="image-background text-center"> <img src="../../img/web/ip_domain_mapping.drawio.svg"> </div> </div> --- ## DNS: converting a hostname to an IP address <div class="code-inline-bg"> On Linux, a nice tool for making DNS queries is `dig`. </div> <pre class="code-wrapper"> <code class="plaintext" data-trim data-noescape style="overflow: hidden;" data-line-numbers="1|3-20|14-15" data-fragment-index="0"> $ dig www.google.com ; <<>> DiG 9.16.27-Debian <<>> www.google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21716 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 178 IN A 142.250.72.68 ... </code> </pre> <div class="r-stack"> <div class="fragment fade-in code-inline-bg" data-fragment-index="1"> We get an A record telling us that the IP address for `google.com` is `142.250.72.68`. </div> </div> --- ## Ports Many servers need to run multiple services. In addition, clients often need to make many requests to different services at the same time. To solve both of these problems, we use **ports**. --- ## Ports <div class="r-stack"> <div class="fragment fade-out" data-fragment-index="0"> <figure> <img src="../../img/web/ip_and_port_1.svg"style="height: 50vh;"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in-then-out" data-fragment-index="0"> <figure> <img src="../../img/web/ip_and_port_2.svg"style="height: 50vh;"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in-then-out" data-fragment-index="1"> <figure> <img src="../../img/web/ip_and_port_3.svg"style="height: 50vh;"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in-then-out" data-fragment-index="2"> <figure> <img src="../../img/web/ip_and_port_4.svg"style="height: 50vh;"> <figcaption> </figcaption> </figure> </div> <div class="fragment fade-in" data-fragment-index="3"> <figure> <img src="../../img/web/ip_and_port_5.svg"style="height: 50vh;"> <figcaption> </figcaption> </figure> </div> </div> --- ## Common ports In general, the ports used by client programs are randomly selected from a range of possible ports. <div class="fragment"> Services can in theory be hosted on any port, but there are "standard" ports for many services: **Port 22:** SSH **Port 53:** DNS **Port 80:** HTTP **Port 443:** HTTPS </div> notes: On Linux, the port range for client programs is controlled by /proc/sys/net/ipv4/ip_local_port_range === ## HTTP --- ## HTTP: interacting with the web server At this point, we have the IP address for `www.google.com`, and we know what port we're going to use. Now: how do we actually communicate with the server? <div class="fragment"> We use the **Hypertext Transfer Protocol** (**HTTP**) to make a request to port 80 on `www.google.com`. </div> --- ## Dissecting an HTTP request <pre class="code-wrapper"> <code class="plaintext" data-trim data-noescape data-line-numbers="1-4|1|2-4" data-fragment-index="0"> GET /images/cat.jpg HTTP/1.1 Host: www.google.com User-Agent: curl/7.74.0 Accept: */* </code> </pre> <div class="r-stack"> <div class="fragment fade-out" data-fragment-index="0"> Here's a basic HTTP request. This data gets sent to Google as plaintext. </div> <div class="fragment fade-in-then-out" data-fragment-index="0"> `GET /images/cat.jpg HTTP/1.1`: request line, containing - `GET`: the request method. `GET` is typically used to fetch a resource; other common methods are `POST` (to send data), `DELETE` (to delete a resource), etc. - `/images/cat.jpg`: the resource we want to interact with - `HTTP/1.1`: the HTTP protocol version to use </div> <div class="fragment fade-in" data-fragment-index="1"> Zero or more *request headers*, which are sent alongside the request: - `Host: www.google.com`: the web host that we want to interact with (useful when a server hosts multiple domains) - `User-Agent: curl/7.74.0`: tells the server the browser / program we're using to communicate with it. - `Accept: */*`: which content types (*MIME types*) the client is able to understand. </div> </div> notes: MIME = "Multipurpose Internet Mail Extensions" References: - [Host header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/host) - [User-Agent header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) - [Accept header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept) --- ## Dissecting an HTTP response <pre class="code-wrapper"> <code class="plaintext" data-trim data-noescape data-line-numbers="1-9|1|2-9" data-fragment-index="0"> HTTP/1.1 200 OK Date: Wed, 15 Jun 2022 20:43:24 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN ... (lots of HTML content) ... </code> </pre> <div class="r-stack"> <div class="fragment fade-out" data-fragment-index="0"> The HTTP server returns some text to the client in response to the request </div> <div class="fragment fade-in-then-out" data-fragment-index="0"> The *status line* contains the protocol version and the status of the HTTP request. - `200 OK` = all good! - `300 <= status < 400`: used to redirect requests - `400 <= status < 500`: client error, request cannot be fulfilled </div> <div class="fragment fade-in" data-fragment-index="1"> The response also contains multiple headers. Here are some of the more important ones: - `Cache-Control`: directives about how long the response should be cached - `Content-Type`: the type of data sent in the response. </div> </div> notes: References: - [HTTP response codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) - [Cache-Control header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control) --- ## HTTP requests with `curl` <pre class="code-wrapper"> <code class="plaintext" data-trim data-noescape data-line-numbers="1-9|2-3|4-7|1,10-14"> $ curl -s -v www.google.com * Trying 142.250.72.68:80... * Connected to www.google.com (142.250.72.68) port 80 (#0) > GET / HTTP/1.1 > Host: www.google.com > User-Agent: curl/7.74.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Date: Wed, 15 Jun 2022 20:43:24 GMT < Expires: -1 < Cache-Control: private, max-age=0 < Content-Type: text/html; charset=ISO-8859-1 < P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." < Server: gws < X-XSS-Protection: 0 < X-Frame-Options: SAMEORIGIN ... </code> </pre> --- ## What's the difference between HTTP and HTTPS? <div class="fragment semi-fade-out" data-fragment-index="0"> This has been a high-level overview of HTTP; "under the hood" it's actually built on top of other protocols (e.g. TCP). </div> <div class="fragment fade-in-then-semi-out" data-fragment-index="0"> HTTPS is "HTTP over TLS", where TLS = *Transport Layer Security*. TLS *encrypts* network traffic to ensure that nobody in between you and the server can see the contents of your HTTP requests. </div> <div class="fragment" data-fragment-index="1"> We'll talk about TLS much later when we discuss networking and encryption. </div> === ## Frontend technology: HTML, CSS, and JavaScript --- ## HTML _**HyperText Markup Language**_ (**HTML**) is a language for creating documents that are meant to be rendered in a browser. HTML is composed of *elements* that are delimited by opening and closing angle brackets (`<` and `>`) that define the structure of the page. <div class="overlap"> <div class="fragment fade-out" data-fragment-index="1"> <pre> <code class="html" data-trim> <!doctype html> <html> <head> <meta charset="utf-8"> <link rel="stylesheet" href="style.css"> </head> <body> <h1>This is a heading</h1> <p>Hello, world!</p> </body> <script type="application/javascript"> console.log("hello, world!"); </script> </html> </code> </pre> </div> <div class="text-center fragment" data-fragment-index="1"> <img src="../../img/web/basic_webpage.webp"> </div> </div> --- ## CSS _**Cascading Style Sheets**_ (**CSS**) adds style to the structure provided by HTML elements. <div class="text-center"> <div class="container"> <div class="col"> <img src="../../img/web/css_zen_garden_1.webp"> </div> <div class="col"> <img src="../../img/web/css_zen_garden_2.webp"> </div> </div> <p> *Source: [CSS Zen Garden](http://www.csszengarden.com/)* </p> </div> notes: The page on the left and the page on the right are actually identical, but they use CSS to style the pages in completely different ways. --- ## JavaScript _**JavaScript**_ is a programming language for designing responsive user interfaces in websites (although it can also be used for backend design via e.g. Node.js). There are a few different ways of including JavaScript in a web document: <div class="text-center overlap"> <div class="fragment fade-out" data-fragment-index="1"> <pre> <code class="html" data-trim> <html> ... <script type="application/javascript"> console.log("hello, world!"); </script> ... </html> </code> </pre> Embedding JavaScript directly in HTML </div> <div class="fragment" data-fragment-index="1"> <pre> <code class="html" data-trim> <html> ... <script type="application/javascript" src="script.js"></script> ... </html> </code> </pre> Referring to an external resource </div> </div> --- ## Cookies <div class="container"> <div class="col"> _**Cookies**_ are small pieces of data that the browser can store from a website and that are typically sent alongside requests from the browser to the web server. One common use for them we'll be interested is *authentication* -- websites typically use cookies to store a token in your browser after you log in. </div> <div class="text-center col"> <img src="../../img/web/squirrel_cookie.webp" style="transform: rotate(90deg); max-height: 10em; margin-top: 3em;"> </div> </div>