February 1, 2023
 min read

How to reduce cloud costs by removing data that is more than seven years in backup files?

Reduce your AWS/Azure/GCP bill


In this economy, where BigTech (Meta, Amazon, Microsoft, Google) have laid off more than 50,000 employees, cloud costs are the next big spend for companies. Before we discuss how to reduce cloud costs regarding removing data from backups that are seven and more years older, let's understand where the seven-year agreement came from.

Data Retention Regulations 

Large financial institutions in the U.S. must comply with the Sarbanes-Oxley Act (SOX) (as a public company), the Gramm-Leach-Bliley Act (for financial companies), the Payment Card Industry Data Security Standard (for credit service providers), SEC Rule 17a-4 (for those in the financial services industry) and local privacy regulations when operating in other countries.

Depending upon the industry in which you operate,  several established standards regarding business data retention are shown. Let's see for SOX, HIPAA and PCI.

SOX Retention Requirements – 7 Years 

Sarbanes-Oxley Act of 2002 (SOX) was modified in 2003 to require relevant auditing and review documents to be retained for seven years after the audit or review of the financial statements is concluded. 

HIPAA Data Retention Requirements – 6 Years  

The Health Insurance Portability and Accountability Act (HIPAA) requires covered entities to keep HIPAA-related documents for a minimum of 6 years from when the document was created. In the case of policies, the time requirement is six years from the date it was last in effect. This applies to “policies and procedures implemented to comply [with HIPAA] and records of any action, activity or assessment,” CFR §164.316(b)(1) and (2) and include HIPAA audit logs. 

The Privacy Rule doesn’t expressly stipulate how long medical records should be retained. Covered entities and BAs must refer to their state laws governing the retention of medical records.  

PCI Data Retention Requirements – Variable 

According to PCI DSS standards, payment card data should not be stored longer than necessary. The minimum retention period for payment card data is one year after completing the transaction.

How is data archived?

To store old data, AWS S3 Glacier or Azure Archive Tier are popular cloud archiving solutions. And it is common to store backups of databases, application logs, CSV files, and more in those archives. One of the popular backup formats is the Bak file. Database Applications like SQL Server use .bak files to back up their databases and other applications. These backup files have data of organizations and systems for the last 'x' years, typically since the company was born. So if a company was born in 2010 and the current year is 2023, if backups were taken, per SOX, you would not need data present in backups older than 2016.

These backups are dumped in an archiving/offline tier service to reduce costs. There is no good way to know which records are seven or six years older without looking into the data. Data can be backups of relational databases that may be compressed/encrypted, JSON files, log formats, and dozens of different forms, at the very least.

How to Identify which data needs to be retained from huge backups?

Strac's goal is to identify any data. Along with the vast catalog of sensitive data it can discover, it can also find data from .bak files, CSV formats, and JSON files. Clients can configure what kind of data they want to search from backups, and Strac will meet that criterion. Strac will go through Petabytes of data and different formats to find the data client wants to search for.

Once Strac identifies the data, the client knows exactly what data they can freely delete and what data they need to retain per data retention requirements. This reduces the cost dramatically for clients.

To learn more, please book a demo with us.

Founder, Strac. ex-Amazon Payments Infrastructure (Widget, API, Security) Builder for 11 years.

Latest articles

Browse all