Applying Bayesian in Data Retention.

When statistics is playing on you.

Applying Bayesian in Data Retention.

In: Data Retention, Security, Policies, Privacy

Assuming you ever been into system administration, the bellow probably rings the bell of postfix bayesian spam filter. Good old one.

The recent consulting work I got was related to data retention and wiping some old data in accordance with detention policies. Customer wanted to be sure that customer records older than 3 years are completely wiped out of the corporate system. They needed an independent researcher to put the signature onto a document that all the records are destroyed.

Team started with the standard procedure. Moved all new records towards the new drives, wiping the data from the old ones, then physically destroying them. It was a routine job, waiting for my signature upon completion.

In a talk with the company CEO, I understood it’s of top importance for data to get destroyed. It was probably a legal requirement or whatsoever, I did not get too deep, but based on amount they are ready to pay, the deadline and the nervous faces, it was clear they want that data out.

Once the job was done, guy was looking at the list of tasks performed with a happy face. Can we finalize the documentation, he asked. Instead, I asked him to give me names of two of his suppliers, and the net turnover with them for a period we were deleting data.

Since it was a small industry, with everyone inter-connected, I was able to do a complete calculation in just 3 days, reconstructing the almost-whole customer records that were deleted.
Tools used: Public records of revenue / losses of each company within the industry.

There were only a single pattern that can support the turnover within the 3 companies within the industry starting with the supply chain. Not only I was able to reconstruct the suppliers, but also reconstruct the level of trades in between the customers. He was stoned.

It was only two customers / suppliers, with the breakdown of trades. Without that, it was impossible to perform such analysis. But with that 2 records, relatively easy. (ok about 3 slipless nights).

Now let’s get back to applied Bayesian theory here. This is the diagram I have presented the company devops team with.

When 100% not really that high.

If only 2 customer records are left somewhere, at the accountant office, an employee USB flash drive, personal email, no wiping will save you from full reconstruction.
Eventually we did solve the problem, but the project was an extensive and lengthy one.

The point and summary is. If you are into data retention policies, plan on time, when the records get stored, not once they need to be deleted.
With every single piece of data, the probability of reconstruction with maximum accuracy is raising exponentially. 100% could easy become as low as 1% Bayes has a lot more to offer in information security.

This also opens up another question (for a totally a whole new topic). Assuming you posses a some level of information about the trades of multiple entities within the one country, can you reconstruct the whole trading scenarios only based on yearly financial reports of revenue / loss. Can you build a tree data of who is buying from who, in which amount? The answer is yes. With the most accurate data by analysis at the end of fiscal year. Afterwards, there's a factor of correction, growing negatively towards the point of next fiscal year, when it get's most of accuracy. Can this be used as a tool for preventing financial crime - yes, a very powerful one. At the end of combination and ranking process, there's only a single pattern that fits the best to describe inter-trading with most accurate probability. But due to something called "mid year financial reports, and these who are late, the factor of correction get's really close to completing the puzzle very accurately."

How this affects GDPR regulation, more important, does GDPR address the issues that is suppost to address or there would be workaround. Keeping in mind that cloud providers do posses serious amount of data that cuould be crossed over, can GDPR protect anything or not at all.

Stefan Ćertić
Share the Fun!

Sharing is caring, and sharing is easy! made it easy!

Join the talk

Share your toughts on the subject or whatever you would like to know.


Browse blog post by popular tags.

Share Page

Back to Top