How does Power BI manage memory and data compression?
The remarkable efficiency of Power BI in handling and analyzing data at impressive speeds can be attributed to its robust in-memory analysis engine, Vertipaq. At the core of this engine is a technology that operates within Analysis Services specifically designed for tabular models. It’s this innovative engine that empowers Power BI to deliver exceptional performance in data management and analysis.
The Vertipaq engine employs a memory-optimized ColumnStore index, a feature that it shares with the SQL Server engine. This indexing mechanism takes the form of non-clustered column indexes, a proven technology recognized for its efficiency in handling large datasets. The implementation of this ColumnStore index in Power BI ensures that the tool can rapidly process and analyze data, providing users with a seamless and responsive experience.
When you’re actively using Power BI Desktop, it’s essential to understand that the Analysis Services engine, with its Vertipaq in-memory analysis capabilities, operates quietly in the background. This engine is the anonymous hero that processes, optimizes, and manages the data, ensuring that our Power BI reports and visualizations are not only visually compelling but also backed by a powerful analytical engine that operates efficiently.
In essence, the synergy between Power BI and the Vertipaq in-memory analysis engine stresses the commitment to delivering a high-performance analytics platform. The integration of such cutting-edge technologies not only sets Power BI apart in terms of speed and efficiency but also highlights the dedication to providing users with a tool that seamlessly combines advanced analytics with user-friendly design.
How does Vertipaq work?
Let’s assume a data table. Each column consumes space depending on the data column type. An entire row could end up consuming, say, 100 bytes. So, for 100 million rows on the table, we will need 10 gigabytes of disk space.
Such a large volume of data is why traditional database technologies store their data on disk. Vertipaq uses Column-store technology combined with other performance-tuned technologies. In simple words, Vertipaq stores data column by column, and will first sort each column by its values. We will then create a mapping table for it with indexes at the beginning and end of each section. The point of this compression is that when we have a huge column, many of the values are repetitive in that column. So instead of storing that 100 million times, which consumes about 400 MB of space (4 bytes each value, multiplied by 100 million rows will equal 400 MB), it stores unique values and their start and end indexes.
This method is how the compression engine works in Vertipaq. Learning this means knowing that data will be compressed before Power BI loads it into memory. The size of the data in Power BI is probably not the same size as the data in our database or source file. We may have an Excel file that is 800 MB in size, and that data may consume only 8 MB after uploading to Power BI. Such compression obviously depends on many things, including the cardinality of the data. and the number of unique values in each column.
It is important to know that if we have a column with many unique values, then the compression of the Power BI engine suffers and the memory consumption of Power BI will be quite large. Examples are the ID of the fact tables (not the ID of the dimensions that were used in the fact table as a foreign key), and also certain timestamp columns created or updated that even sometimes contain dates and times in milliseconds.
Where is the data stored in memory?
Power BI models are always loaded into an Analysis Services engine. Even if we don’t have Analysis Services installed, it will be on our system when we use Power BI with an Import Data connection type.
To check and see that Analysis Services is actually installed, we go to Task Manager on the machine where we have opened Power BI Desktop with a data import mode connection. We will find on that machine that the Microsoft SQL Server Analysis Services engine is running.
This is where our data is stored in memory. Analysis Services keeps the file in memory. When we save and close our Power BI file, then that data will be persisted in a *.pbix file. The next time we open the file, the data will be loaded back into Analysis Services in an in-memory engine.
In conclusion, Power BI might be considered as an implementation of the engine of Analysis Services, that arrived to provide business users with powerful data management capabilities in the desktop environment.
Recent Posts
- Fabric Data Factory vs. Azure Data Factory: A Simple Comparison
- Understanding Fabric Warehouse: When to Choose It and How It Compares with Other Options in Fabric
- The Importance of Secure Cloud Architecture in Data Analytics Projects
- The Importance of Models in Machine Learning
- Business Intelligence (BI) Adoption: Causes of Low Adoption and Strategies to Improve Engagement