The Power of SQL Applied to Excel Tables for Fast Results

If you ever needed to query Excel tables (and who hasn’t) from your VBA programs that contained thousands of rows, or join multiple tables in the process – this is the Blog post you’ve been waiting for * Write complex SQL queries over Excel tables as if they were stored in a relational database

Do SQL and Excel go together?

YES!

SQL is the most popular Databases language. It is robust, proven and works with extremely large tables stored in any common relational Database software. If you want to learn more about Databases and SQL – read my last week’s Blog post here.

Excel is often used to store data in a table-like fashion: columns with headers and many rows below the columns’ headers.

If we have Excel tables (or ranges of cells arranged like tables) and we have SQL that is designed to query tables – why not use them both for better results?

Achieve unheard-of results with SQL in your VBA programs

Did I just say better results? Well, that was a polite way of saying: new possibilities are now open for you.

Well, to be accurate, I must say that technically speaking, you can do anything you want with VBA over Excel tables. However, going about quickly aggregating a 4-table join, processing thousands or tens of thousands of records – is not something I recommend doing without SQL (yes, even if using smart data structures and programming techniques, such as arrays).

What can be achieved in a single, well crafted SQL statement, may require hundreds of lines of VBA code, and the SQL statement will most probably perform much better.

What makes SQL available in VBA

Excel inherited the Jet engine developed by Microsoft for MS-Access. The Jet is a fair Databases connectivity and SQL engine, albeit not fully compliant with the SQL ANSI standards. The Jet is no longer developed and its latest version is 4.0.

Since MS-Office 2007 (.xlsx file extension), the Ace engine replaced the Jet engine (latest version: 16.0). The main benefits of the Ace over the Jet engine, are support for MS-Office 64bit, support some complex data types, encryption and security, and SharePoint and MS-Outlook integration.

However the ACE engine is backward-compatible, some inconsistencies may arise. I always suggest to start off with the Ace engine version 16, revert down to Ace version 12, and then (if not on MS-Office 64bit) default to the Jet engine.

The Jet engine is part of the Windows installation package. If Ace isn’t, download it here: https://www.microsoft.com/en-us/download/details.aspx?id=54920 (chose either the 32bit or the 64bit version to match your MS-Office installation (I said MS-Office – not MS-Windows!).

What you need to know

Although I cover in great detail VBA, relational Databases and SQL in my online courses, I will assume here that you know SQL and that you know VBA.

The rules you need to know about using SQL over Excel tables through the Ace/Jet engines, are:

  • Data type of a column is determined by the engine based on the values in the first (8 by default) rows in that column. If you have sparsely filled out columns – explicitly fill them with something (zeros, for example, for numbers), or set the IMEX=1 property (more on that later).
  • Tables are either a range of Worksheet cells or a Worksheet.

Refer to a Worksheet (requires that the “UsedRange” property resolve to cells arranged as a table) named “Past Invoices” using this syntax:

[Past Invoices$]

Refer to a range of cells within the “Past Invoices” Worksheet like that:

[Past Invoices$B10:G700]

I always leave no room for interpretations and always specify the explicit range.

  • Column names should be enclosed in square brackets:

[Customer Name]

  • Nested joins of more than two tables must be enclosed in parenthesis:

FROM ((… JOIN … ON… ) JOIN … ON … )

  • LIMIT clause becomes the TOP qualifier in the SELECT clause:

SELECT TOP 5 …

  • GROUP BY must include ALL non-aggregated SELECT expressions. The first column is not enough.
  • ORDER BY cannot reference aliases. Use the complete column expression as defined in the SELECT clause, or specify ordinal index in the SELECT clause:

ORDER BY 2, 4

Setting up the Connection

As with any data source, we first need to establish a connection using either the Jet engine or the Ace engine.

I typically use the Ace engine and establish an early binding to the ADODB library by adding a reference to the Microsoft ActiveX Data Objects 6.1 Library (in the VBA editor, open: Tools -> References and tick this entry).

Next, we create an instance of the ADODB Connection object in a local variable and call the Open method in order to establish the connection. We need to provide the correct connection string to the Open method – more about that shortly.

Here’s a full example of a Subroutine that establishes a connection to the current Excel Workbook (“ThisWorkbook”):

Public ConnDB as ADODB.Connection

Sub ConnectExcelDB()

'Open the connection to the DB Provider, if not already open  

    Dim strProvider As String

    'strProvider = "Provider=Microsoft.Jet.OLEDB.4.0;" 'If using the old JET engine

    strProvider = "Provider=Microsoft.ACE.OLEDB.16.0;"

   

    On Error GoTo FailedConnection

   

    If (ConnDB Is Nothing) Then

        Set ConnDB = New ADODB.Connection

    End If

 

    If Not (ConnDB.State = 1) Then

    ConnDB.Open strProvider & _

    "Data Source=" & ThisWorkbook.FullName & ";" & _

        "Extended Properties=""Excel 12.0;HDR=Yes;"";"   'The JET engine only works with "Excel 8.0"

    End If

 

    Exit Sub

FailedConnection:

    MsgBox "Failed connecting to Excel as a DB. Please contact support.", vbOKOnly + vbCritical, "Database Error"

    Set ConnDB = Nothing

End Sub

 

Couple of comments on that code:

  • I typically define the connection variable (ConnDB) to be public, so that it is readily available to any function submitting SQL statements, as well as to the connecting Sub (above) and the disconnecting Sub.
  • Before opening the connection, I verify it is not already open (State = 1 means it is open).
About the Connection String

As you can see, the Connection String has a Provider part, a Data Source part, and an Extended Properties part.

The Provider part is the engine used.

The Data Source part points to the full path and file name of an accessible Workbook. In this example I’m just taking ThisWorkbook’s FullName property

The Extended Properties allows us some control over the behavior of the engine. Note how it is in itself a string enclosed within double quotes, therefore they are escaped (“”) as they are within an already constructed string (the Connection String itself).

Here are the important parameters for the Extended Properties, delimited by semicolon (;):

  • Excel file version. Excel 12.0 refers to an Excel 2007 or later. For earlier versions (used with the Jet engine) use: Excel 8.0
  • HDR specifies if the first row of the range contains column headers or not.
  • ReadOnly will open the Workbook in Read Only mode.
  • MaxScanRows allows to override the default 8 first rows in determining the columns’ data type. Specify a number between 1 and 16. Specify 0 to consider all rows in the column.
  • IMEX=1 avoids the guessing of the columns’ data types by the engine. All values will be interpreted as text (I prefer that!)

Submitting Queries

Yes, most SQL statements will work, including INSERT, UPDATE, DELETE.

However, the best dish, in my mind, is given to us in the SELECT command.

Here’s an example of constructing an SQL query that find the top 5 selling Body Care products.

We see a join between the Products table and the Order Lines table, with a SUM aggregate on the Total column in the Order Lines table.

The result is presented in the range with a top-left cell named Top_Five_Products_Anchor.

As the Ace engine does not know what is a Smart Table (ObjectList), I’m translating the tables’ range address into the proper notation valid for the Ace engine. Mainly, we need to strip off the “$” character that is reserved as the delimiter between the Worksheet name and the range itself, and enclose within square brackets.

Sub ExcelGetTopFiveProducts()

   

    Dim strTable1 As String

    Dim strTable2 As String

    Dim strSQL As String

   

    Dim rs As New ADODB.Recordset

 

    strTable1 = "[DB Data$" & Replace(ThisWorkbook.Sheets("DB Data").ListObjects("ExcelarateProductsTable").Range.Address, "$", "") & "]"

    strTable2 = "[DB Data$" & Replace(ThisWorkbook.Sheets("DB Data").ListObjects("ExcelarateOrdersLinesTable").Range.Address, "$", "") & "]"

 

    strSQL = "SELECT TOP 5 PRD.[Product Code], PRD.[Product Description], SUM(ORL.[Total])"

    strSQL = strSQL & " FROM " & strTable1 & " AS PRD"

    strSQL = strSQL & " INNER JOIN " & strTable2 & " AS ORL ON PRD.[Product Code]=ORL.[Product]"

    strSQL = strSQL & " WHERE PRD.[Product Group]='Body Care'"

    strSQL = strSQL & " GROUP BY PRD.[Product Code], PRD.[Product Description]"

    strSQL = strSQL & " ORDER BY SUM(ORL.[Total]) DESC"

   

    ConnectExcelDB

    rs.Open strSQL, ConnDB

       

    ThisWorkbook.Names("Top_Five_Products_Anchor").RefersToRange.CopyFromRecordset rs

   

    rs.Close

    DisconnectDB

End Sub

For simplification, I haven’t implemented error handling here (although we have that in the called Subs).

Disconnecting from the Database

Finally, here’s my version of the DisconnectDB Sub:

Sub DisconnectDB()

'Close the connection to the DB if open

 

    On Error GoTo FailedConnection

   

    If Not (ConnDB Is Nothing) Then

        If Not (ConnDB.State = 0) Then

            ConnDB.Close

        End If

    End If

 

    Exit Sub

FailedConnection:

    MsgBox "Failed closing the DB connection.", vbOKOnly + vbCritical, "Database Error"

    Set ConnDB = Nothing

End Sub

The above code examples are taken from my online course: Beyond Excel Boundaries with Databases and SQL: High-Preforming, Scalable Business Applications. More details in this link.

 

Become a Databases & SQL Expert Yourself!

The in-depth course, Beyond Excel Boundaries with Databases & SQL, is now open.

If you are serious about your career, or about understanding Databases, SQL and how to develop Excel VBA business-grade programs with Databases - this course is definitely for you.
Tell me More about the course
Close

50% Complete

Two Step

Once you submit your details, you'll receive an email with a confirmation link. That's it! you're subscribed!