Feb 06, 2020

As I was updating myself with Excel questions in Quora this morning, considering what to post in my Quora Excel VBA Expert Space, I came across this question: *What is the most useful Excel function that no one ever uses?* **I needed less than a second to shout out: SUMPRODUCT!**

The original, formal purpose of the SUMPRODUCT function is to combine multiplication of values in columns of each row in a range (table), with summing all row-level results into a single total. This is, in effect, the SUM of PRODUCTS of those columns in the table, hence the name of the function: SUMPRODUCT.

The following example illustrates SUMPRODUCT in its basic form:

What is happening in the background, is lining up all columns involved in memory, one next to the other, and looping through all rows of these columns (that must match in length) while calculating the product of each row-set and accumulating the results. This is how it looks like:

Unlike more standard functions you’re used to, SUMPRODUCT actually builds a whole table (or array) in memory, in order to implement its calculation. You can perform similar calculations with Excel Array Functions (also known as CSE functions, representing the Control+Shift+Enter keys that commits the formula to the cell). SUMPRODUCT is one of those functions that implicitly work as an array function, without you having to do anything about it – it’s by design of the SUMPRODUCT function.

Let’s combine a couple of observations:

- Why line-up only table columns for the multiplication? How about adding some helper-columns (that is, “Columns” not really part of the table) in-memory, for the SUMPRODUCT process to consider?
- Multiplying X by 1 result to X. Multiplying X by 0 results to 0.
- Boolean data type values (True/False) are implicitly converted to numbers (1 or 0, respectively) by Excel when in the context of a mathematical operation (Recall that in VBA True is minus 1, not just 1).
- Two minus operations result in a plus operation (e.g. 2 – (-3) = 5).

If we can now add another column to the process, holding 0’s and 1’s, in effect we can filter out some of the rows in our original table from the calculation. Let’s consider the following example:

We now sum-product only those courses with score registered in 2019. Algebra is not counted.

Here’s the arrangement in memory for processing this function:

Note we pass only one argument to the SUMPRODUCT function here, as opposed to the first example in which we passed two arguments (two columns). Arguments are separated by commas, of course. Our single argument, however, explicitly specifies the multiplication operation between all columns involved. I find this to be more consistent and robust when working with Boolean filters, as opposed to separating arguments for SUMPRODUCT with commas.

Also, note the double dashes preceding the Boolean (0 or 1) helper column. These are actually two subtraction operators, forcing the conversion of True to 1 and False to 0, without changing the value itself (--1 remains 1, --0 remains 0). However the multiplication operation should be enough to drive the conversion from Boolean into mathematical values, again, I find explicit conversion of each variable to produce consistent results in all circumstances.

Taking this one step further, SUMPRODUCT can be used to run aggregation queries on a table (or even more than one table), as you would use with a Database table using the SQL language. Well, you should not expect the full breadth of SQL to be implemented with SUMPRODUCT, but as you will see, surprising things can be achieved.

Consider the following use case: your company maintains all deals managed by all sales reps in a pipeline table. You would like to have a summary table showing the total of all closed deals, by sales rep, within a specified number of days since started.

The following presents one way of achieving this, using the SUMPRODUCT function:

Here's the full function code:

`=SUMPRODUCT(DealsTable[Deal Value]*--([@[Sales Rep]]=DealsTable[Sales Rep])*--(DealsTable[Pipeline]="Closed")*--((TODAY()-DealsTable[Start Date])<Days_Filter))`

That’s it! One function calculates and filters in real-time, where otherwise you’d need some Database functions and helper criteria regions to maintain.

Here’s one SUMPRODUCT implementation used in one of my projects delivered to a customer in Vancouver, Canada:

*=SUMPRODUCT(--(DataTable[Sales Stage] = INDEX(StagesTable[Stage],1))*--(DataTable[PM Name] = [@[PM Name]])*--(YEAR(DataTable[Start Date]) >= Pipeline_Filter_Year_Min)*--(YEAR(DataTable[Start Date]) <= Pipeline_Filter_Year_Max)*--(MONTH(DataTable[Start Date])=Selected_Month_Open_Opps_Filter)*--(MONTH(DataTable[Start Date]) >= Selected_Quarter_Open_Opps_Filter)*--(MONTH(DataTable[Start Date])<= Selected_Quarter_Open_Opps_Filter+2))+SUMPRODUCT(--(DataTable[Sales Stage] = INDEX(StagesTable[Stage],1))*--(DataTable[PM Name] = [@[PM Name]])*--(YEAR(DataTable[Start Date]) >= Pipeline_Filter_Year_Min)*--(YEAR(DataTable[Start Date]) <= Pipeline_Filter_Year_Max)*--(MONTH(DataTable[Start Date]) >= Selected_Quarter_Open_Opps_Filter)*--(MONTH(DataTable[Start Date])<= Selected_Quarter_Open_Opps_Filter+2)*--(Selected_Month_Open_Opps_Filter=0))+SUMPRODUCT(--(DataTable[Sales Stage] = INDEX(StagesTable[Stage],1))*--(DataTable[PM Name] = [@[PM Name]])*--(YEAR(DataTable[Start Date]) >= Pipeline_Filter_Year_Min)*--(YEAR(DataTable[Start Date]) <= Pipeline_Filter_Year_Max)*--(MONTH(DataTable[Start Date])=Selected_Month_Open_Opps_Filter)*--(Selected_Quarter_Open_Opps_Filter=99))+SUMPRODUCT(--(DataTable[Sales Stage] = INDEX(StagesTable[Stage],1))*--(DataTable[PM Name] = [@[PM Name]])*--(YEAR(DataTable[Start Date]) >= Pipeline_Filter_Year_Min)*--(YEAR(DataTable[Start Date]) <= Pipeline_Filter_Year_Max)*--(Selected_Quarter_Open_Opps_Filter=99)*--(Selected_Month_Open_Opps_Filter=0))*

Now, ignite your imagination…

The in-depth course, * Beyond Excel Boundaries with Databases & SQL*,

50% Complete

Once you submit your details, you'll receive an email with a confirmation link. That's it! you're subscribed!