Partitioning your Digital Footprint

I want my online data to be in the form of a Rubik's cube, hard to assemble.

Partitioning your Digital Footprint

The internet has evolved from being just a web of networked hardware, it's now a web of personal information. Every time you Google or login to a website it causes a ripple of data that resonates in servers around the world.

It means nowadays it is very hard to understand or even prove what governments, companies and other organisations have on us. We can hope and rely on the ethics of companies or regulations like GDPR to keep us protected, but with so many privacy and data breaches we need to be responsible ourselves and limit what data we leave scattered around the internet; a digital footprint.

A few years back when I browsed the HaveIBeenPwned site, it made me aware of the data breaches for an old email address and how easy it was to connect multiple breaches with just one piece of information.

This prompted a change in my online use in the following ways.

Browser

It all starts with the browser and being a lifelong Mozilla user, Firefox has been my goto browser. However it still requires some simple tweaks to make it more effective.

If you are on Chrome or Internet Explorer Edge, then I would recommend you ditch them ASAP and go for Firefox, or Chromium, if you don't want to switch to another layout.

Using an account container addon with your browser is a must too for you to ensure separation of your browsing between social media, eCommerce and other sites.

Ad-blocking

I won't dabble too much on this subject but having an advertisement blocker either on your browser or through a proxy is a godsend. In that it minimises your personal data being passed between advertisement brokers and also muddies the water on analytics (what you are clicking, viewing etc.)

For browser addons there are many decent ones; Ghostery, Adblock Plus and uBlock.

If you want to block any advertisements before they even reach your machine, you can try out Pi-Hole.

Separate Email Addresses

Throughout my career in software I've always used email addresses as user identifiers due to their uniqueness and ease of use for logging in.

When analytics teams aggregate data from multiple sources, there is a high probability they will use the same email address to build referential links between all these facets of data they've gathered from places like Facebook or Google Analytics.

So the simplest way to make it hard for organisations to find out more about you, is to use multiple email addresses.

As outlined in the above diagram, having three Gmail addresses would work.

  • Layer 1: for your most important and dearest information, keep that to a specific email address, that is used seldomly.
  • Layer 2: this will be your bread and butter email address, for all your order confirmations and receipts etc.
  • Layer 3 "Social layer": for all interactions with other users on social websites.
  • Layer 4: I don't have  a dedicated email for this layer, instead I use temporary or discardable emails for this e.g. for logging into a forum that requires user sign in before sharing content. For Layer 4 sites, I use Guerrila Mail for disposable emails.

If you are thinking this will be a pain to manage, it need not be. It would take at most a few hours to setup the email accounts and update various websites to use those newly created email addresses.

You can then collect all your mails on an email client (I use Thunderbird desktop), so your experience is not hampered. However I would not recommend aggregating your emails through one main mail account, as that is putting all your eggs into one basket if you get hacked.

Note on Gmail Aliases:

As you may be aware, you can use aliases with your Gmail account e.g.

foo@gmail.com
foo+amazon@gmail.com
foo+facebook@gmail.com

This is a very good way to keep track of your interactions with websites, and potentially where your email is being shared with third parties. However it is not enough to partition your internet usage, as data scientists can simply match the root of your email address against any permutations e.g. with regex

foo(?![\w\d]).*.gmail.com

I do not use aliases and prefer separate email accounts for this reason.

Unique Passwords for all sites

In this day and age if you are at least technically inclined, you should be using a password vault of some kind. Whether its a vault managed by the OS, Browser or a separate program.

Many sites still use weak hashing methods for passwords, meaning hackers and internet sleuths can link accounts by a password (if they can assume it is unique enough).

So your super secure password that you use everywhere might give up more information than desired.

Therefore a unique and strong password should be used for each website you login to.

There are plenty of password managers to use out there like Keepass, Lastpass, Enpass.

Restrict OAuth Usage

OAuth is a great tool, it makes my life easier a web developer for authenticating users and it saves users time in setting up a new account for each new website.

That ease of access comes at a price though; which is a big fat link to your Gmail, Facebook or other account you used to authenticate with.

So always try and register manually with your desired email address.

Do you need to give real information?

One time I was taking the train with a family member and I was watching them sign onto the "Free Wifi" onboard. Just before they hit submit I stopped them and pointed out that they didn't need to provide their real identity to get online.

So when you are confronted with a sign up page, it is good to develop a mental prompt that gets you to think, "do I really need to give away personal information here?".

This will limit trails and breadcrumbs which companies can capitalise on when crunching their big data.

e.g. "Bob used our free wifi on these days during offpeak hours, at these locations. Lets target an advert for discounted train tickets to him".

Sweeping your tracks

During the Rhodisian bush war, there was a famous regiment in the Rhodesian army called the Selous Scouts. Their expertise was in tracking enemies on the ground over long distances in which they could interpret information about their numbers, how much equipment they were carrying and if any of them were injured.

This knowledge also allowed them to infiltrate enemy territory without being tracked back or detected.

Your old Yahoo and MySpace accounts are much the same as a footprint in the internet. So its worthwhile Googling yourself to find any old sites that you used in the past.

I would recommend searching old email addresses, usernames and combinations of your first name and birth year (for those accounts you made in a hurry with your birth year). Some examples using Google operators

"foobar@gmail.com"

foo AND bar AND myspace

Then its a case of trying to shut those accounts down, you can also try contacting site administrators to see if they can help.

Worst case scenario; you find a cringey post on your old MySpace account.

Conclusion

You can take the pessimistic outlook and say that the battle for user privacy is lost or take a realist view that it is still being waged. There's no denying that Google and Facebook already have a decent idea about who I am; that I am a software developer, who likes action movies and alternative music.

This information is important but what I am more worried about; is them collecting my habits. My habits will be different tomorrow, next month, next year and in five years. These habits are manifested in what I am clicking on in my browser.

Those clicks and keypresses are what data scientists, advertisers, hackers and even governments prize the most. The steps I have taken above will not fully prevent them from gathering information about me, but it will sure make it a lot harder for them.  In the time I wrote this piece, there has probably been 4100 terabytes of data created online. With that comes a lot of noise for those entities and a lot of effort on their part to crunch that information. So why make it easier for them to sift through it?

Currently there is a lot of talk about internet freedoms, whether it be spying, restricting networks and user freedoms. However one freedom which we are not appreciating at the moment, is the ability to be forgotten like our ancestors pre-internet.

Our digital footprints will eventually become our digital ghosts, wouldn't you at least like to try and minimise that?

Other Mentions:

These are a few other things I considered but either prooved too clunky or time consuming.

  • IP address masking: Use Tor or a private VPN
  • Sandboxing: run a fresh VM with Linux or Windows for each daily browsing session (overkill in my opinion)
  • Go full Richard Stallman