Any good thing taken to an extreme can turn bad, we've all lived through examples of this. Maybe you had one too many eggnogs at your company's holiday party. Or think about the cat lady living down the street, who has gone a bit cat-crazy and now lives with a hoard of felines.
Parallelism in the
Teradata Vantage SQL Engine is a very good thing, and generally more is better. Parallel units, called AMPs, are built into the foundation of the database and both own data and process it. The parallelism is highly-efficient, predictable, and is applied automatically to all work running on the platform. One basic premise of the parallelism is that all AMPs get the same amount of data and that is achieved with our automated data distribution using a hashing algorithm.
There is an exception to this all-parallelism-all-of-the-time approach. This exception occurs when a single row is requested from the database. Under specific conditions, the optimizer can choose to ignore the vast power of the configured parallelism and instead produce a query plan that only reaches out to a single AMP.
Two completely opposite means of pulling data out of the database are available to the
SQL Engine optimizer: All-AMP access or one-AMP access. So then what's the big deal about too much parallelism?
Consider a situation where a very small table doesn't have enough rows to spread across all AMPs. Envision a Promotion table that has only 10 rows when your system has 100 AMPs. 90% of your AMPs will have no Promotion rows to process, yet those empty AMPs will have to be engaged and will have to attempt to find a row when the table is scanned. The full parallelism of the platform is brought to bear, often unnecessarily so.
A new feature introduced in 2018, Teradata MAPS, offers a novel solution for storing and accessing tables that don't need all the platform's parallel horsepower. The MAPS feature can remove over-parallelization when scanning small tables by creating "sparse maps" composed of just a single or a few AMPs. Once you move a very small table to the sparse map you've cut out a lot of unneeded parallel processing whenever you read that table. By doing so, you've freed up system resources for other queries to use.
How does this work?
In the SQL Engine, a "hash map" describes the range of AMPs available for supporting and operating on a table's rows. All tables must belong to a hash map. By default, this primary hash map includes all the AMPs in the configuration and all tables belong to it. In our example above, we’d have the Promotion table distributed across the 100 AMPs, with the majority of AMPs not receiving a row.
The table's hash map not only specifies a range of AMPs, it also controls which AMP in that range a given row will belong to. When a row is inserted into the database, the hash map that the table belongs to directs the new row to one of its AMPs, based on a hashing algorithm.
By using sparse maps, Teradata customers can now distribute relatively small tables, that formerly were spread across all AMPs, to a smaller number of AMPs. The magic part is that after doing so, queries scanning those tables will get only the level of parallelism they actually require, and users, utilities, or administrators do not need to know anything about this change to properly interact with the system.
This means that our 10-row Promotion table can be moved into a single-AMP sparse map that is provided automatically. All 10 of the rows will then be located within a single data block on one AMP, and scanning that table is now a single-AMP, single I/O operation. This small efficiency can be a benefit for queries that scan the small table repetitively, but also for queries that join the small table to other larger tables.
That's clever, but why should I bother?
Here's one way that sparse maps could help you. Imagine that call center agents and sales reps frequently need to get a list of all currently-available promotions, and as a result the Promotion table is scanned hundreds of thousands of times a day. By transforming what was previously an all-AMP access step to a single-AMP access step, you can speed up query performance
and free up resources for other queries active on the system. The effort to make this change is trivial.
My grandmother got her family through the
Great Depression by being frugal and by pinching pennies. At grandma's house you always turned the light off when you walked out of a room, and you only telephoned her at night when the landline rates were cheap. Small savings do add up, and small savings are equally relevant today, whether in your home or in a large data warehouse. Think about that call center application, with sub-second, repetitive queries. The agent on the phone with a disgruntled customer is waiting for quick promotional suggestions to calm the conversation and prevent customer churn. The single-AMP access you've just created with your sparse map will support a more consistent, short response time for these queries, and will lower their resource usage.
So, while full and persistent parallelism is a key performance enabler, there are times when backing off the total power of parallelism on your SQL Engine platform can actually be the smart thing to do. When those times arise, think about utilizing sparse maps as a way to better manage your parallelism. Just like holding off on that third eggnog at your company's holiday party next year, moving your very small tables into a sparse map can be a win-win situation.
(Author):
Carrie Ballinger
Carrie Ballinger is a Vantage Certified Master with over 30 years with Teradata, currently working in performance optimizations within the engineering organization. During her tenure, she has focused extensively on workload management, MAPS, and other database technologies. Carrie is the author of more than 32 technical Orange Books, the most recent one on Native Object Store, and provides regular technical blog postings here on Teradata.com and on Teradata Community.
View all posts by Carrie Ballinger