Abstract

Normalization is a technique in databases to eliminate data redundancy or inconsistent dependency. Normalizing is achieved by dividing larger tables of a database into smaller ones and defining relationships between them. Normalization can yield performance gains, such as improved response times, but only to an extent. Highly normalized databases are not performance optimal. Optimal database performance is often obtained at a sweet spot between optimizing sizes of individual tables and the number of tables. A highly normalized database is therefore often denormalized to improve performance. Traditionally, denormalization is driven by user or developer intuition.

This disclosure describes a machine-learning model to optimally denormalize a database based on, e.g., typical database queries, frequency of queries, response-times, projections of response time upon denormalization, etc. The techniques result in an optimal normal form of the database, in turn resulting in superior data integrity and performance.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS