How to Use URL Rewriting

How to Use URL Rewriting
URL Rewriting is a process of changing the request URL to something else as defined in the web server. Nginx uses ngx_http_rewrite_module module, which mainly uses return, rewrite directives for rewriting purpose. Other than these directives defined in this module, the map directive, defined in ngx_http_map_module, can also be used to rewrite URLs with ease. This guide intends to explain about 2 main directives – return, rewrite, and their flags, how they do work, and their applications.

Prerequisites

This guide is optimized for Nginx 1.0.1 and above, and thus it’s highly recommended to update the existing Nginx instance to aforesaid or above version. However, some of the commands, syntaxes might still work for any version before the aforesaid version. Since this guide is about URL rewriting, which is a bit advanced topic, it assumes the audience is aware of installation procedure of Nginx, and thus it’s not explained here.

Return

Return is the basic directive that performs URL rewriting and is simple to understand. It doesn’t use regular expressions, but it can include variables to parse, captured from the location block’s path. Usually, return directive is used to redirect the request URL to a different location, and therefore it often uses HTTP codes like 301 for permanent redirection, and 302 for temporary redirection. The following code snippets demonstrate some of the use cases of the return directive.

The following code snippet redirects the request URL to Google.com. It can be used either directly under the server code block or under a location code block, but make sure not to redirect to the same domain in order avoid redirect loop

return 301 https://google.com;

The following code snippet redirects the request URL to Nucuta.com along with the path, for instance the aforesaid example doesn’t contain any path, parameters, and thus no matter which URL is typed in the address bar, the request is redirected to the root domain of Google, whereas in the following example, the path, along with the parameters are carried over without the domain name. Alternatively, $is_args$args can be used, but then instead of $request_uri, $uri variable should be used as $request_uri contains parameters of the URL as well. If the requirement is to redirect to a different directory of the same domain, then use $host variable instead of the domain name in return directive, for instance in the following example instead of nucuta.com, use $host.

return 301 https://nucuta.com$request_uri;

The following code snippet redirects the incoming request to the path directory of the same domain, and the scheme, meaning if the following code snippet is used in http://Linux.com, and if a visitor made a request to it, it’s redirected to the path directory, and therefore the following code snippet is useful when managing a large number of web sites. Here $scheme defines protocol of the URL, such as FTP, HTTP, HTTPS, and the $host defines the current server’s domain with its domain extension, such as Google.com, Linux.Net etc. Since this doesn’t perform any protocol redirection, such as from HTTP to HTTPs, it has to be done manually as in the second example.

return 301 $scheme://$host/path;
if ($scheme != "https") {
return 301 https://$host$request_uri;
}

Another useful use case of return directive is the ability to include regex variables, but for that the regular expression should be specified in the location block, and it should capture a pattern, then the captured pattern can be combined with the existing URL in return directive for redirection purpose, for instance in the following example, when a request is made to access a text file, it captures the text file’s name in location block, then it passes that name to the return directive, then return directive combines it with the existing URL to redirect the request to another directory.

location ~* ^/([^/]+.txt)$ {
return 301 /chrome/$1;
}

Rewrite

Rewrite is a directive used to rewrite URLs internally in the web server without exposing the underlying mechanism to the client side. As per its syntax, it’s used with regular expressions. The basic syntax goes as following. The regex placeholder is for using regular expressions, replacement placeholder is for replacing the matched URL, whereas the flag is for manipulating the flow of the execution. At the moment, the flags used in rewrite directive are break, permanent, redirect and last.

rewrite regex replacement [flag];

Before proceeding to the regular expressions, replacements, pattern capturing, and variable, it’s important to know about how flags make the internal engine of Nginx to behave. There are four major flags used with rewrite directive as explained earlier, among them permanent, redirect flags can be paired together as both perform same functionality, meaning redirection.

Redirect

Redirect flag is used to signal the browser the redirection is temporary, which is also helpful in search engine crawlers to recognize the page is temporary moved away and will be reinstated in its original location some time later. When the page signals it’s 302, search engines don’t make any changes in its indexing, and therefore visitors still see the original page in search engine index when searching, meaning the old page isn’t removed and, in addition, all the qualitied, such as page rank, link juice are not passed to the new page.

location /
{
rewrite ^ http://155.138.XXX.XXX/path redirect;
}

Permanent

Permanent flag is used to signal the browser the redirection is permanent, which is also helpful in search engine crawlers to recognize the page is permanently moved away and will NOT be reinstated in its original location some time later like with temporary moving. When the page signals it’s 301, search engines make some changes in its indexing, and therefore visitors see the new page in search engine index instead of the old page when searching, meaning the old page is replaced with the new page, in addition, all the qualitied, such as page rank, link juice are passed to the new page.

location /
{
rewrite ^ http://155.138.XXX.XXX/path permanent;
}

Regular Expression, Pattern Capturing, And Variables.

Nginx uses Regular expression heavily with rewrite directive, and thus knowing about Regular expressions come in handy in this segment.  There are multiple types of regular expressions, but Nginx uses Perl Compatible Regular Expressions aka PCRE. Having a regular expression testing tool is useful to make sure the written pattern indeed works beforehand using it in the Nginx configuration file. This guide recommends https://regex101.com/ as the tool, and all the following examples are tested with the aforesaid tool thoroughly.

Regular Expressions

rewrite ^/fr/(.*)$ http://nucuta.com/$1 permanent;

A typical pattern of rewrite directive goes as above, it contains the rewrite directive at the beginning, then with a space the “pattern” in regular expression, then with a space the “replacement”, then finally the “flag”. The rewrite directive can be placed anywhere within the server brackets, but is recommended to keep it after specifying listen, server_name, root, and index directives. When a visitor makes a request to the server, a URL is sent along with the request, then if the URL is matched with the regular expression pattern specified in the rewrite directive, it’s rewritten based on the replacement, then the execution flow is manipulated based on the flag.

The regular expression pattern uses brackets to indicate the group, whose sub-string is extracted out of the URL upon matching the regex pattern with the URL of the request, then that sub-string taken out of the URL is assigned to the variable in the “replacement” of rewrite directive. If there are multiple matched groups, sub-string of each matched group is assigned to the variables in “replacement” in numerably order, meaning sub-string of the first matched group is assigned to first variable ($1), sub-string of the second matched group is assigned to second variable ($2), and so on.

Out of 4 flags, 2 flags were already explained in this guide, the remaining ones are last, and break. Before understanding how the remaining flags work, it’s important to understand how Nginx engine behaves with rewrite directives. When a URL is sent along with a request, the Nginx engine tries to match it with a location block. Whether it’s matched or not, if a directive like rewrite, return is stumbled upon, it’s executed sequentially. If the sent URL is matched with the pattern of a rewrite directive, Nginx engine executes the whole configuration file, regardless of where the rewrite directive is specified as a loop, until the newly rewritten URL matches with one of the location blocks.

The following URL is used as a demonstration to explain how both flags make the execution flow of Nginx engine behaves with rewrite directive. The following screenshot portrays the file structure of the web server.

http://155.138.XXX.XXX/browser/sample.txt (the URL sent as a request)

When No Flag Is Used

When no flag is used, both rewrite directives are executed sequentially; hence first URL in the following list turns into 2nd, then 2nd URL turns into the last URL So when the sample.txt file in browser folder is requested, web server actually serves the sample.txt file in the root folder. Since the URL rewriting is completely abstracted away from the browser, it doesn’t see any difference in serving compared with return directive that states the browser whether the request was redirected or not with a HTTP number.

  1. http://155.138.XXX.XXX/browser/sample.txt
  2. http://155.138.XXX.XXX/chrome/sample.txt
  3. http://155.138.XXX.XXX/sample.txt
location /{
}
rewrite ^/browser/(.*)$ /chrome/$1;
rewrite ^/chrome/(.*)$ /$1;
location /chrome {
try_files $uri $uri/ =404;
}

When Either Break, or Last Flag is Specified Outside of Location Block

When either break or last flag is specified outside of the location block, the rewrite directives after the matched rewrite directive are not parsed at all, for instance in the following example the request URL is rewritten to the 2nd one in the following list regardless of the flag used, and that’s it.

  1. http://155.138.XXX.XXX/browser/sample.txt
  2. http://155.138.XXX.XXX/chrome/sample.txt
location /{
}
rewrite ^/browser/(.*)$ /chrome/$1 last;#break
rewrite ^/chrome/(.*)$ /$1 last;#break
location /chrome {
try_files $uri $uri/ =404;
}

When Last Flag Is Used Inside of a Location Block

When the last flag is used inside of a location block, it stops parsing anymore rewrite directives inside of that particular location block and plunges into the next rewrite location block if the rewritten URL is matched with the path of that location block, then it executes the subsequent rewrite directive inside of it.

  1. http://155.138.XXX.XXX/browser/sample.txt
  2. http://155.138.XXX.XXX/chrome/sample.txt
  3. http://155.138.XXX.XXX/sample.txt
location /{
rewrite ^/browser/(.*)$ /chrome/$1 last;
}
location /chrome {
rewrite ^/chrome/(.*)$ /$1 last;
try_files $uri $uri/ =404;
}

When Break Flag Is Used Inside of a Location Block

Break flag, on the other hand, when it’s inside of a location block, stop parsing anymore rewrite directives, regardless of where they are located, when one rewrite directive is matched with the request URL, and serves the content to the user.

location /{
rewrite ^/browser/(.*)$ /chrome/$1 break;
}
location /chrome {
rewrite ^/chrome/(.*)$ /$1 break;
try_files $uri $uri/ =404;
}

Conclusion

URL rewriting is a process of rewriting URLs within a web server. Nginx provides multiple directives like return, rewrite, map directives to make it possible. This guide demonstrates what are return, and rewrite directives, and how they are used to rewrite URLs with ease. As demonstrated in the examples, return directive is suitable to signal the browser, and the search engine crawlers the whereabouts of the page, whereas rewrite directive is useful in abstracting out the URL rewriting process without letting the browser knows what is going on behind the scene. This is quite useful in serving content through a CDN, cached server or from a different location within the network. The users never know from where the resource is coming from as the browser only shows the URL given to them.

Related Posts
Leave a Reply

Your email address will not be published.Required fields are marked *