Data Change Audit (a Revised Approach), Winning on Performance, and XS Tool Suite
“You can customize existing templates, saving them under new names that ‘become your own’. Templates embody a lot of experience to be shared and is one of the core features of XS Tool Suite.”
Today I’m announcing the availability of the latest version of SQL-Hero – err – “XS Tool Suite.” This is the 2018 Volume 1 release, with a Volume 2 release likely later this year, depending on demand. (We certainly have a healthy back-log of features.) This release has several enhancements, one of which improves the delivery of templates to end users, via a centralized service maintained on www.xskrape.com. In doing so, we’ve included some new and improved templates that are undoubtedly of interest to SQL developers. A goal here is to show how these templates can offer solutions to problems, which you can apply in your own situations with just a few keystrokes. You get the benefit of our tested experience, and I’ll show how in the case of the “audit template,” we can beat the performance of common alternatives in an empirical way (including Microsoft’s own solution of temporal tables).
Let’s start by covering some of the changes in XS Tool Suite, since the remainder of the article relies on some of these changes, either directly or indirectly. The most obvious change is the name: from SQL-Hero to XS Tool Suite. Why this change? To more accurately reflect the breadth of the product: it’s always had a heavy emphasis on SQL, yes, but the intention is to support creating solutions that go well beyond “just SQL.” One of the things to watch for in the next release in the inclusion of a Solutions Gallery, which is intended to support sharing end-to-end problem solutions – mostly in the data space, but really anything that has artifacts which can be dynamically created from metadata. More on this when I discuss templates below.
Other changes include a new “mode” on the “Execute In” tool that offers intelligent suggestions, partly driven by the source of your request, usually an editor window that is associated with a specific database alias. (For those not familiar, XS Tool Suite lets you set up connections to multiple databases across potentially many servers – using short aliases to identify connections uniquely, on an enterprise level.) So for example, if you have aliases called “FooApp-Development”, “FooApp-QA” and “FooApp-Production” – and if you’re sitting on window associated with FooApp-QA and perform an “Execute In”, the new suggestion mode sees “FooApp-Development” and “FooApp-Production” as likely possible targets based on “FooApp” in the name – and it can promote these options near the top of the list, versus alphabetically, say, where suddenly you’re forced to “hunt” for them – a big benefit if you deal in dozens (or hundreds!) of databases. The tool also uses frequency and recentness of use, as illustrated here:
Another feature that’s received some new attention is the “Editor Sorting” tool. Ordering is something that helps keep active work together, with less recent work sliding off to the right. More options have been added:
We’ve also added more recently used lists, such as alias pairs for compares:
We could go on and on about features – one feature we’ve added on xskrape.com is a comprehensive list of our spotlight features, which you can browse to find something specific or just get a general sense of what you can accomplish by using the desktop tool. (Caveat: this list is not truly comprehensive, more of a sample of features – more to come.)
It’s fair to ask, “How are templates different from SQL Server Management Studio snippets?” In this case, it’s like comparing a Pinto to a Lamborghini! XS Tool Suite templates can include embedded logic including conditionals, iteration, scripting, and more – SSMS snippets have no such ability. XS templates are naturally aware of context (where you’re invoking them), have specific source object type awareness (can iterate tables, views, procedures, or other object types, even non-SQL ones), and can have rich interactive parameters (required vs. optional, Booleans implemented using checkboxes, lists sourced from things like the columns of your source table or view, etc.).
The output of what’s generated by templates can certainly be SQL – that’s common. However, we also have plenty of examples of generating other types of output, including C#. The latest release includes two new templates that generate C# entities to support another project we have available for public use, namely the open-source CodexMicroORM project on GitHub, an Entity Framework competitor that shines for its performance, parallel-friendliness, functionality and simplicity. We’ll be adding more templates, making them available to everyone instantly when available. What’ll be next? Generating a SSIS packages from source files? Generating PowerShell from a combination of relational and web data? The possibilities are endless.
The latest release also offers a new, easier way to run templates. You can now create .shcmd files using a simple checkbox on template parameter screens:
When checked, you can specify the name of a .shcmd file to create, along with some settings:
With this file created, simply double-click it in Windows Explorer. In this example, we can see it’s launched the shcommand.exe command-line tool and overwritten a C# file as its output:
Templates can also be invoked within the desktop tool, having selected one or more of their source object types. For example, this “Merge view contents to a table” template is associated with views, so it’s available when we have a view selected:
How did we associate it with views? You can use the template editor within the desktop app as one way:
(Another way would be to directly edit the template using a text editor – it’s an XML file ultimately.) Above we can see the basic structure of the template language – use of markup tags is common, and virtually every place a markup parameter can be provided, parameters themselves can contain markup as well, leading to a fully-recursive language that opens the door to interesting solutions. Another sub-feature of templates is the ability to set and read user-defined properties (aka UDP’s) on many types of objects. UDP’s can either be sourced from extended properties or the XS Tool Suite central repository database (a free option, when you install the product). UDP’s offer a way to customize code generation based on settings. Templates can advertise what UDP’s they “need” as well, and UDP’s can be typed: pick lists, checkboxes for Booleans, etc. are supported:
The latest release opens the entire template engine to be free, with the setting of UDP’s (using the screen shown) being a premium feature. For a more complete list of premium features and an explanation of what that means, visit xskrape.com.
Diving much deeper into the template language is out-of-scope for this article, but I wanted to emphasize that this template engine has a lot of capability that other comparable tools either do not have or have limitations we’re trying to explicitly overcome. As scenarios present themselves, we’re evolving the language, and I’ll cover more details in the future (stay tuned by signing up for free notifications!).
Our earliest version of SQL-Hero (as it was called at the time) had a template called “CodeXFramework – Audit and Procedures.” This template was tied to tables as its data source, where you could generate supporting SQL that would create (or update) stored procedures for table CRUD access (create-read-update-delete), add an optional history table, and add triggers to support moving data from your selected tables, to their corresponding history tables. The intention was to let you model your base tables without limitations, and for tables that you deemed “audit-worthy,” you could declaratively apply some UDP settings and have the necessary support code generated to maintain a history table with a very similar structure as the base table, but it would contain time-windowed versions of the base data (more on this shortly).
Let’s take an example. Say you have a table like this:
CREATE TABLE dbo.Foo
(FooID int NOT NULL IDENTITY PRIMARY KEY,
FooBirthdate date NULL,
FooName varchar(100) NOT NULL)
Let’s say that it’s a requirement to track changes against this table, so if the name or birthdate changes, we want to know who did it and when they did it. There are multiple ways we could approach this including using:
- Change Data Capture (CDC) (SQL 2008+)
- Temporal Tables (SQL 2014+)
- Custom tracking using triggers
- Custom tracking using queries (instead of triggers)
- Application logic
Some of what might influence a final choice can include factors such as what the desired end-format looks like. For example, do you want a replica of the base table’s schema with time windowing added, or pivot to name/value pairs of what has changed? Or do you want to track who changed records based on an application-assigned field, or is using SQL’s SUSER_SNAME() sufficient in every case? The use of custom tracking using triggers was an obvious choice before CDC or temporal tables were even an option – such as when using SQL 2005. (That’s true of when the “CodexFramework – Audit and Procedures” template was created!) Other considerations include simplicity and performance. It’s worth noting that I’ve actually written an implementation for some of these various options and put the working SQL on GitHub – more on this later.
Your concept of an “ideal format” might differ from mine but let me explain mine first. For me, I’ve seen plenty of value in having history available on tables where history makes sense – that’s not every table, admittedly. For instance, an event table that’s only ever inserted to (or perhaps updated in limited cases such as assigning cancelled attributes), is in and of itself, historical. Reference tables and many other transactional tables can certainly benefit from knowing who made changes and when. Massively long tables might be interesting to audit, but sometimes performance becomes a practical limitation. In a lot of applications I’ve dealt with, this equates to using LastUpdatedBy (varchar) and LastUpdatedDate (datetime) fields (names are less important than the intended function). LastUpdatedBy could be a system key identifying a unique user – but a lot of systems prefer to relax this as a foreign key and go with a text field instead. (It’s also useful if some changes are made outside of the normal application flow – e.g. “bsmith_datafix_tfs1234” could reflect Bob Smith as a DBA doing something tracked by a TFS work item number, recorded in LastUpdatedBy.)
An additional field that the template will automatically add to your base table (if you let it) is “IsDeleted” (bit). This is used for logical deletion that triggers physical deletion in the base table. Why do this? So we can track who actually deleted a record - the LastUpdatedBy would be assigned and IsDeleted flag set when deleting something, which gives us visibility to the application-driven username, in the LastUpdatedBy field. This is another reason that CRUD stored procedures are generated: for tables with this type of audit, the delete stored procedure (up_tablename_d) is implemented using an UPDATE, setting IsDeleted. Again: this is determined based on declarative settings, so you could change your mind about how a given table is handled (e.g. using a physical DELETE if no history) and it’s all transparent to an application that’s simply calling procedures depending on the desire to insert, update or delete data. The generated CRUD procedures also support optimistic concurrency checking, based on LastUpdatedDate, as another “feature.”
In our Foo table example, this changes things a bit:
CREATE TABLE dbo.Foo
(FooID int NOT NULL IDENTITY PRIMARY KEY,
FooBirthdate date NULL,
FooName varchar(100) NOT NULL,
LastUpdatedBy varchar(50) NOT NULL DEFAULT (SUSER_SNAME()),
LastUpdatedDate datetime NOT NULL DEFAULT (GETUTCDATE()),
IsDeleted bit NOT NULL DEFAULT (0))
Having a history table that mirrors the base table makes for an easy automated construction, based on the base table – that’s what the Audit template is doing. In fact, the template can generate SQL that looks like this:
CREATE TABLE [History].[Foo] (
HistID bigint IDENTITY
, LastUpdatedBy varchar(50) NOT NULL
, LastUpdatedDate datetime NOT NULL
, IsDeleted bit NOT NULL
, RowExpiryDate datetime NOT NULL DEFAULT CONVERT(datetime, '31-Dec-9999')
, FooID int NOT NULL
, FooBirthdate date NULL
, FooName varchar(100) NOT NULL
)
(The easy way to do this in the editor? Select the table in the object list, right-click to get the context menu, go to Templates, find the Audit template, pick it, enter parameters and click OK – voila – generated SQL shows up in a new editor window!)
The fields in red above are history-table-specific. The HistID is a machine key, and RowExpiryDate offers a way to identify point-in-time state of the data. In the early versions of the template, if you ran these 3 statements:
INSERT dbo.Foo (FooName, FooBirthdate, LastUpdatedBy, LastUpdatedDate) VALUES ('Bobby Tables', '1/1/2000', 'inserter_guy', GETUTCDATE())
GO
UPDATE dbo.Foo SET FooName = 'Robert Tables', LastUpdatedBy = 'updater_guy', LastUpdatedDate = GETUTCDATE() WHERE FooID = 1
GO
UPDATE dbo.Foo SET IsDeleted = 1, LastUpdatedBy = 'deleter_guy', LastUpdatedDate = GETUTCDATE() WHERE FooID = 1
GO
… then your base table, dbo.Foo, would contain zero records after the IsDeleted flag is set – there is a physical delete to go along with the update. However in history, History.Foo, you’d see three records:
HistID |
LastUpdatedBy |
LastUpdatedDate |
IsDeleted |
RowExpiryDate |
FooID |
FooBirthdate |
FooName |
1 |
inserter_guy |
8/18/2018 8:50:29.087 PM |
False |
8/18/2018 8:50:29.430 PM |
1 |
1/1/2000 12:00:00 AM |
Bobby Tables |
2 |
updater_guy |
8/18/2018 8:50:29.430 PM |
False |
8/18/2018 8:50:29.790 PM |
1 |
1/1/2000 12:00:00 AM |
Robert Tables |
3 |
deleter_guy |
8/18/2018 8:50:29.790 PM |
True |
12/31/9999 12:00:00 AM |
1 |
1/1/2000 12:00:00 AM |
Robert Tables |
Notice how the RowExpiryDate boxes off the validity of the record in an inclusive manner. This means if we ran a query like this:
SELECT *
FROM History.Foo f
WHERE '8/18/2018 20:50:29.450' >= f.LastUpdatedDate
AND '8/18/2018 20:50:29.450' < f.RowExpiryDate
… we’d get back the second row above, having identified the full state of the row at an exact point-in-time. (It’s also worth noting I cut out a bunch of supporting SQL from the template generation process: it includes indexing that makes for more efficient access of history data, as used both by user queries like the above and the triggers themselves, which are well optimized as I’ll demonstrate shortly.)
Let me address the question of “why not track just fields of interest, why track entire tables?” There’re four main reasons in my experience:
a) Users can be fickle: today they might want to track just FooName – but tomorrow they realize FooBirthday would have been good to track too. If we never tracked FooBirthday before in a name/value situation, we’d have no way to add this tracking retroactively. If we tracked the complete table, we could handle it retroactively: we’re really talking about a presentation format over base history data. We’ve got examples of how you can effectively pivot any fields of choice into a name/value type format.
b) It’s easy to generate and keep in-sync using templates and automation (e.g. creating a .shcmd file).
c) It’s hard to get back to a point-in-time view from a name/value model.
d) It becomes possible to “selectively restore” data easily from history tables. For example, I’ve been asked multiple times to recover deleted records, and doing so when you have a replica in the same format as the original makes that easy.
That said, there’s no real reason you can’t track a subset of columns by customizing generated code (we already exclude computed columns).
Some details about the generated code such as “Why does it use a History schema? Can I do something different?” are slightly out-of-scope for this article, but the template does offer a number of settings, including UDP’s that can be set at the database-level, controlling whether you use a schema to house history tables, or even use a separate database. You can customize existing templates, too, saving them under new names that “become your own.” Templates embody a lot of experience to be shared and is one of the core features of XS Tool Suite.
With the latest XS Tool Suite release, a new audit template variation is available: “CodexFramework – Audit and Procedures – Light.” This is considered “light” because it emits one fewer trigger per table and organizes data in history slightly differently: it neither maintains the latest version in history, nor maintains a persisted RowExpiryDate. This makes querying historical data more complex, but in practice, that’s more of an edge case, so it’s fair to penalize it, with performance improving on DML operations. The template does generate a view that presents history in a format identical to the older way.
In general, I really like the newer template and see it as an evolution that’s tough to beat by other options.
Having looked in depth at a template that requires “hooks” like triggers to provide live tracking of changes using history tables, there’s another possibility: what if you want to track changes on a table that you either don’t want to or cannot change the schema, including the creation of triggers or enabling of temporal tables? This could happen if you have a third-party product that has some tables where business users have asked to track changes (e.g. auditing rate changes), but since it’s part of an application you did not write – you can’t start adding audit fields, etc. (As you might have guessed, I’ve had this exact request as part of an effort to get a handle on users making changes, with request for reports showing who did what and when.)
If the requirements allow you to get away with non-real-time tracking - maybe detecting changes within 10-minute intervals is acceptable, for example – then it’s quite easy to construct a chunk of SQL that can inspect a table and compare it against a history table, to detect and persist changes. In fact, the structure of such a history table can be almost identical to what I described for the “SQL Audit Template” scenario: we maintain a time window with a start and end date, with changes appearing as new rows in the table, including the “flagging” of deletion. The question of determining who changed a record is wholly dependent on whether the source system maintains a “last updated by” field or not.
The template that’s available to support this is called “Track Table Changes via Procedure Call (low overhead)” (under the “Patterns and Solutions” category). Selecting a table and running the template brings up its parameter screen:
You must select one or more natural key columns (defaulted based on any existing primary key), can adjust naming conventions, identify record change date and by fields (if they exist), and even optionally specify which fields you’re interested in getting change tracking views for. (Change tracking views in this case pivot history data into “old / new value” columns, for fields of interest.) The generated history table to support this is updated only when you run the generated procedure, and then only if there are material changes detected, comparing the base table with the latest history version. You can schedule the running of this procedure at an interval you’re comfortable with, where if multiple changes do occur within your scheduled interval, you’ll only capture the latest version. The logic to achieve all this has a few wrinkles, so this is a good example of how you can benefit from our experience to get a working solution in minutes!
If you need transactional tracking – that can be a good case for using CDC instead, for these third-party situations.
What about other alternatives such as CDC? Some of the major differences compared to our trigger-based solution include:
- Temporal table data is somewhat hard to “get at” – a minor consideration but be aware of special keyword use.
- There’s quite a bit of infrastructure behind CDC, as described here. CDC and temporal tables attempt to hide their implementation as black boxes whereas the trigger-based approach is something you can customize away on “in the open” – for example, adding indexes on specific columns on history tables or using partitioning, where it makes sense. An extension of this is the trigger solution imposes fewer restrictions – e.g. with CDC you can’t TRUNCATE tracked tables.
- Temporal tables force some “extended audit” fields into your “base table.” For example, your “row expiry date” needs to be included, despite the fact it’s only a necessary artifact for historical data. There are also considerations about field use – for example, you must rely on the system assigning “last updated date,” versus controlling it yourself, which can imply a need to round-trip this value out of the database (on insertions for example), which can hurt OLTP performance. Ultimately it seems that temporal tables are best suited in business intelligence situations, not OLTP.
- Both CDC and temporal tables do not have a way to track the “who” on deleted records in the same way that the trigger template can.
- Handling schema changes is no less complicated with native options – all scenarios do address schema changes in different ways.
The notion that native options are simpler is not a strong argument against the trigger template, given the fact XS lets you generate all the SQL infrastructure with the double-click of an icon (if you were to say save your template parameters in a .shcmd file). Typically, I’ve run the template against a development database where history can be dropped and re-created, after which schema changes are moved forward as ALTER statements, offering non-destructive changes in QA and production environments. (This is something that XS Tool Suite can do for you as well, with the Schema Compare tool.)
If the thought is that performance must be better for CDC or temporal tables compared to the light audit template solution: think again. I’ll be doing a separate article that compares head-to-head performance in more detail, but I can say right now, in an early look comparing performance, the newer SQL Audit Template does no worse, and “wins” in a couple of categories. This chart shows various DML operations carried out against tables using different history strategies:
The testing involved running several thousand operations on tables pre-populated with 100,000 rows. Multiple trials were run and averaged. “100%” represents a baseline against the procedure-based approach (i.e. no live tracking), with smaller values reflecting better performance. For most test types, there’s little statistical difference, but interestingly, the trade-off made for the new audit template was to disfavor the query of history performance – yet it still clearly wins handily against all others!
The results here are from empirical timings, so if it looks a bit odd, you can review my work published on GitHub. Your workloads can be different, obviously, so I’d be curious to hear about variances and use that to improve existing templates. The main point is there are tradeoffs to consider with each approach, but what we’ve released with XS is easy to use and performs very well.
We’ve got some other new and improved templates available with XS Tool Suite. For example, the “MERGE Statement for Table” template has some new parameters:
Notice we’ve defaulted in column names that are used by the Audit template, so if you’re sticking with our standards, there’s nothing to change. Out of this template you can now do things like get the UPDATE clause filtered to detect material changes (no need to update rows that are not actually different!) – and testing has shown this works well in a wide range of situations. However – if you “save as” and customize the template, you can apply your own tweaks. That’s something we’d like you to share as well, if you choose – we’ll be doing more with versioning, system flags, authorship, publication, etc. in the next XS Tool Suite release.
Until then, stay tuned!
Did you like this article? Please rate it!