volpe/posts/drafts/remove-tracking-params-from-links.md

22 KiB

https://www.example.com:443/products/shoes?utm_source=facebook&utm_campaign=summer_sale&fbclid=abc123#reviews
Protocol* What protocol to use to communicate with the server. (https, ftp, etc...)
Hostname* Name of the website used to look up the IP address of the server
Port Network port to connect to (default to 443 for https, and 80 for http)
Path The specific page or resource location on the server
Query Params Key-value pairs that pass data to the page
Fragment Links to a specific section within the page (not sent to server)

Query Parameters

Query parameters are a part of a URL that are used to encode some data about a page in its url. They are intended to be optional values but sometimes most of the time developers don't actually read or remember technical documentation and RFC's that outline how technologies are supposed to be used and put mandatory data in the optional parameters. 1

?size=medium&color=light%20blue&utm_source=facebook
Structure
? Delimiter Marks the start of query parameters in a URL
Key The name of the data being passed
= Assignment Connects each key to its value
Value The actual data being passed
& Separator Separates multiple key-value pairs
Parameters in this example
size=medium tells the page which size use a medium size
color=light%20blue special characters are encoded with a % then a number
utm_source=facebook this param was secretly added to the link when it was posted on facebook

These parameters can be very useful for things like saying how many items should be returned with a query, what specific page the query is for, or for filters on a page itself.

limit=50, page=2, sort=latest, name=JohnJohn%20Doe, etc...

If there are any extra query parameter in a websites URL it isn't harmful to the function of the website because the website and its server can simply just ignore the unused parameters.

Over time ad tech companies learned that they can take any arbitrary URL that is displayed on their website and just add their own query parameters to it.

A common format that these tracking parameters take are UTM tracking codes.

query parameter description
utm_source how did you get to the site
utm_medium what type of link was used to get you to the site
utm_campaign what specific promotion brought you here
utm_term search term used
utm_content what specific page element was clicked to bring you to the page

From there the host site can use that data to track analytics how users got to their site or how their site is being used and if the page is using any scripts served up by those same ad tech companies or tooling they have built then those scripts and tools can harvest that data and send it back to the ad tech company to track user habits.

While UTM codes are probably the most common way that tracking information is added to links, they are not the only way and there is nothing stopping companies from using other techniques.

One such technique is though the usage of URL shorteners. Not only can url shorteners hide the usage of tracking query parameters behind short nice looking redirect, URL shortener companies also track the ip address's of all users who click on a link as well as embed cookies into your browser sessions to track what sites you are visiting specific down to the individual user level.

So why is this bad

explain why that is a bad thing for privacy, and personal life

  • you didn't consent to being tracked you just wanted to open a webpage
  • can be used to make targeted ads work better and influence people behavers and beliefs
  • can be used to track what you specifically are looking at on the internet and who you know
  • it can be used to track and build profiles of who knows who

How can you help protect yourself and others

give example of tracking links and how to protect against them

  • delete it manually
  • browser plugins that remove it
  • desktop apps that scan your clipboard for them

  1. While the query parameters are not directly defined as optional they are not a part of the main path which should on its own define a unique stable path to any given element. If a query parameter is needed to fetch a specific resource then the path is by definition not uniquely identifying that resource. ↩︎