“This final example… illustrates sophisticated data cleansing and aggregation that can be accomplished using one (new) Excel function call. It’s as if we’ve brought a powerful query language like SQL straight into Excel!”
We've just dropped new bits for xSkrape for Excel (available to Excel desktop users from '97 to 2016). This release does some important things compared to the last release:
· Adds 12 new Excel functions:
CSExpression(cell, expression, throwError = false)
CSExpressionStr(cell, expression, throwError = false)
CSStatements(cell, statements, throwError = false)
CSStatementsStr(cell, statements, throwError = false)
CSExpressionSingleFromRange(cells, expression, throwError = false)
CSExpressionSingleFromRangeStr(cells, expression, throwError = false)
CSStatementsSingleFromRange(cells, statements, throwError = false)
CSStatementsSingleFromRangeStr(cells, statements, throwError = false)
CSExpressionRangeFromRange(cells, expression, minrows = 0, mincols = 0, throwError = false)
CSExpressionRangeFromRangeStr(cells, expression, minrows = 0, mincols = 0, throwError = false)
CSStatementsRangeFromRange(cells, statements, minrows = 0, mincols = 0, throwError = false)
CSStatementsRangeFromRangeStr(cells, statements, minrows = 0, mincols = 0, throwError = false)
· Moves to a new registration and pricing model.
I'll spend the remainder of this article covering these two topics in greater detail. The already-existing 7 functions that the add-in offered previously are still present, unchanged, with external documentation covering them in more detail. (In the final section I offer two new examples of using them to access external data in Google Sheets and JSON.) Given that Excel is one of the world's most popular "databases" (we can debate the merits, but fact are facts!) - the total capability we're adding to the product opens the door to many new possible solutions.
No, I'm not talking about VBA - these new functions offer a way to apply "snippets" of C# logic over cells and ranges in a way that narrows the gap between programmers and Excel users. Let's look at some real examples. Let’s start with a basic example. Let’s apply an expression on a cell-by-cell basis using:
=CSExpression(B2, "Regex.Match(cell.ToString(), @""\w+(?=\.$)"").Value + "" - "" + Regex.Match(cell.ToString(), @""^\w+"").Value")
If we have something like this:
Notice we have a formula in C2, where the first parameter refers to cell B2. The expression used here is using regular expressions, letting us leverage the power of the .NET base class library (BCL) from within worksheets, without much effort. The variable "cell" is the contents of the source reference, as a C# object. Notice we can simply copy-and-paste the formula from C2 to C3, and so on, getting the benefit of the calculation for the previous column on each row.
If we wanted to eliminate the call to "ToString", we could have used CSExpressionStr. This is the same as CSExpression except it assumes the cell should be treated like a C# string. No cast is needed for our same expression:
The CSExpression class of functions require us to compute a value and return it with effectively one line of code. If we'd like to use more than one line, the CSStatements class of functions can be used. The main difference is you need to assign the variable "value" prior to the end of the statement string. For example, here we're using a try / catch that surrounds the instantiation of a System.Net.Mail.MailAddress object - if it fails because of an invalid email address, "value" is assigned to false in the catch block, otherwise true is assigned.
=CSStatementsStr(B2, “try { new System.Net.Mail.MailAddress(cell); value = true; } catch { value = false; }”)
(This isn't a great way to detect valid email addresses, by the way - we can do better using regular expressions.)
The next example illustrates another use of CSExpressionStr. Here we've got a list of dates in an odd format - but it's one that .NET DateTime.ParseExact can handle easily if we use an appropriate format string:
=CSExpressionStr(B2, "DateTime.ParseExact(cell, ""dd/MM/yyyy HH:mm:ss.ffffff"", new CultureInfo(""fr-FR""))")
As a final step, I’ve reformatted column C to be a date – otherwise we have a numeric value that represents an “Excel date/time.”
Next, let's look at ranges. The first option is to take a range of any number of rows and columns as input and return a single value (also called a scalar). Let's use a range where we have various words in the cells and we'd like to find the shortest word for each row and return the longest of these among all rows. We can do this with the following C# statements:
=CSStatementsSingleFromRangeStr(D3:I5, "var best = (from ordered in (from r in cells select new { Row = r, MinLen = (from c in r where c.Length > 0 select c.Length).Min() }) orderby ordered.MinLen descending" & " select ordered).First(); value = string.Join("", "", (from c in best.Row where c.Length == best.MinLen select c).ToArray());")
This expression is using LINQ, a powerful deferred execution query tool that can work with objects. LINQ has been around in .NET languages for some time but our support for it with xSkrape lets you leverage this right from within Excel – no scripting needed. In fact, this makes Excel a great alternative for tools such as LINQPad, which lets you exercise LINQ expressions. Now you can do the same type of things using live data in Excel – something that’s especially useful if you want to try out LINQ concepts with live data as you learn the syntax. (The calculation engine of Excel, knowing about cell dependencies, automatically updates as you would expect when either the formula or source values change.) Keep in mind: the product is effectively free unless you start using it on a recurring basis – so learn away! 😊
As a final example, let’s look at accepting a range as input and returning a range as output. To do this, we’re going to use a LINQ expression again:
The key here is the formula we entered:
=CSStatementsRangeFromRangeStr(B2:C11, "double v = 0; value = (from p in (from r in cells let isvalid = double.TryParse(r[0], out v) && Regex.Replace(r[1], @""\W"", """").Length > 0 let Age = v " & "where isvalid && Age > 0 select new { Age, Pet = Regex.Replace(r[1], ""Dog"", ""Cat"", RegexOptions.IgnoreCase).ToUpper() }) " & "group p by p.Pet into g orderby g.Count() descending select new object[] { g.Key, g.Average((i) => i.Age), g.Count(), (from p in g where p.Age >= 20 select p).Any() ? ""*"" : """" });",10,4)
This one’s a bit more complex, so I’ll break it into smaller pieces:
o =CSStatementsRangeFromRangeStr(B2:C11 – this is going to require more than computing an in-line value: we’ll use two statements. The source range we’re using contains two columns: an “age” and a “(type of) pet”.
o double v = 0; - our first statement serves as a local variable that’s needed by the TryParse function we’ll see soon.
o value = - using statements requires us to assign the “value” variable prior to the end of our script – the result of our big LINQ expression is an enumerable list of rows and columns which xSkrape can turn into our output range.
o from r in cells let isvalid = double.TryParse(r[0], out v) && Regex.Replace(r[1], @""\W"", """").Length > 0 – this section of LINQ enumerates over our source range’s rows (each row is “r”). We’re calling “double.TryParse” (a BCL function) over the first column of the row (the “age”, in column “0” of the source range), where this function returns “false” if the value is not a valid number. We therefore omit rows from consideration where the age is not numeric – notice in our sample data, we might have “?” or “-“ used, possibly for various reasons we can’t control. Furthermore, the “&&” is a logical AND which also requires that the pet type (in column “1”) have at least one valid letter. It does this by stripping away all non-word characters and checking the resulting length of the string – if it’s zero, it’s not considered valid, and we’ll exclude these rows as well (e.g. row #7).
o let Age = v " & "where isvalid && Age > 0 – the “let” statement as part of LINQ lets you assign “work variables” and in this case, “v” as assigned from TryParse gets persisted to “Age” for later use. We also show an Excel string concatenation here within the formula itself, since our LINQ expression is longer than 255 characters: we can split it into smaller strings and concatenate them to work around that limitation. The “where isvalid” is where we exclude rows that don’t meet our age / pet type requirements, including any cases where the age might be negative (e.g. row #8).
o select new { Age, Pet = Regex.Replace(r[1], ""Dog"", ""Cat"", RegexOptions.IgnoreCase).ToUpper() }) – for each row we’ll return both the age and the pet type, where we’re treating all dogs as cats (sorry, dogs!), and converting the pet type to all upper-case (BCL function ToUpper), thus letting us consider “Cat” and “cat” as the “same” in later steps.
o group p by p.Pet into g orderby g.Count() descending select new object[] { g.Key, g.Average((i) => i.Age), g.Count(), (from p in g where p.Age >= 20 select p).Any() ? ""*"" : """" }); – some of the most interesting magic here has us taking the rows in the previous step and grouping them by the pet type. We’re then returning a transformed row (as an object array), with the pet type as the value grouped on, and the average age and count for all pets that qualified through our data filters. We’re also applying a sort (count, descending). The final column we’re returning is either blank or an “*” if there’s at least one pet aged 20 or more in the group (perhaps a signal to check for bad data).
o ,10,4 – because this is being returned as a range, we’re specifying the minimum range size (rows, columns). If we did not do this and our target range as set in Excel is larger than the data returned, Excel uses default values for empty cells – using an explicit size here has extra empty cells being filled with blank values instead of zeros.
How do we specify the returned range’s boundaries? This is something that’s not immediately obvious, but the easiest way is to select your target range area in Excel, enter the formula in the formula bar, then press Ctrl-Shift-Enter: now the entire range is covered by the single formula.
This final example was very contrived, yes, but it illustrates sophisticated data cleansing and aggregation that can be accomplished using one (new) Excel function call. It’s as if we’ve brought a powerful query language like SQL straight into Excel! Try adjusting the source data – you should see the aggregated output adjust accordingly, with all the parallel processing goodness of Excel. Now you have an effective “live data playground” without having to be a .NET developer – although it’s true, you will need to learn the parts of the BCL that matter to your requirements.
First, technically the product is free to install. You do, however, need to register it since actual usage after one (effectively) free month per machine starts to incur xSkrape credit usage. What are xSkrape credits? When you create an xSkrape.com account (free in itself), you have a repository for credits that you can purchase in credit packs. You currently can choose between a small pack (1,200 credits, unused portions expiring after 3 months) or a big pack (60,000 credits, unused portions expiring after a year). Your account also gets up to 250 free credits, each month (unused portions expire at the end of the period).
What this means is the first month of use for the add-in incurs 50 credits against your account. Since you got 250 free credits: you're using it for free! The next month you use the product on the same machine, you're charged 300 credits. Now you'll need some number of credits above the free amount, and you can choose how much you purchase based on expected usage. One low-cost scenario is $51.96 per year, covering two machines that use it every month.
Why do it this way? All our products can use one single billing model. You can also adjust your deployment size on-the-fly without needing to worry about fancy editions and such: it's all per machine credit usage. Any volume discounts we could give are accounted for in credit pack pricing. For example, with the larger credit pack, your per machine per month cost can be about 78 cents over 17 machines. Also – charges are only for actual usage: if you have 2,000 machines that might use it but only 34 do use it, then two big packs are probably enough to cover your needs.
The process of installing and registering is as follows:
· Visit our installation page – you can either use our web-based installer application or download and run the .MSI file directly (using the .MSI is also useful for automated deployments). The web-based installer handles any pre-requisites for you automatically (including ensuring you have .NET Framework 4.6.1). Once installation is complete, XSKRAPE should show on the ribbon bar in Excel:
· Sign up for an xSkrape.com account. It's free but you do need to provide a valid email address.
· Find the confirmation email - check your spam folder if needed. Click on the link it contains to confirm your registration.
· On the main menu, you should see My Account / Queries and Keys, select it:
· Click any of the links you see with key details - this should expose the actual key.
· Copy the key to your clipboard.
· Pick the “Register / Enter Key” option under XSKRAPE in Excel.
· Paste your key into the field for it:
· You now have a few options:
o You can choose to never store the client key locally. If you do this, you'll need to repeat the step of logging into your xSkrape.com account and copying your key every time you use Excel. In this case, you can enter your key as shown above. (This is where you'd go as well if you change your mind about how to store this key.)
o You can protect this key locally on your machine with a private key. This is quite secure since we've protected the private key behind obfuscation - but in theory is less secure than the next option.
o You can protect this key using a password. The password you provide is not recoverable since it's hashed and becomes the encryption key for your client key. This makes it quite secure since the key is tied to you, not code on your machine. To use xSkrape however, each time you open Excel, you'll need to use the Set Password option:
The add-in also supports the following functions:
· WebGetSingleFromTable(url, tableCriteriaQuery, tableColumnName, tableValueFilter = "", tableValueSort = "", additionalConfig = "", noDataValue = null)
· WebGetSingle(url, query, additionalConfig = "", noDataValue = null)
· WebGetMultiple(url, queryList, additionalConfig = "", noDataValue = null)
· WebGetTable(url, tableCriteriaQuery, includeHeaders = false, headerRenaming = "", additionalConfig = "", noDataValue = null, tableValueFilter = "", tableValueSort = "")
· WebGetDynamicTable(url, tableCriteriaQuery, includeHeaders = false, headerRenaming = "", additionalConfig = "", noDataValue = null, tableValueFilter = "", tableValueSort = "")
I’ll offer two new examples in this article of how they can be used within the desktop add-in. First, let’s look at a Google Sheets spreadsheet that’s publicly available:
As you can see, it contains tabular data that appears locked in Google Sheets – but it’s not! We can use this single xSkrape function to bring it into an Excel worksheet:
The “columnname=Date” helps xSkrape determine what tabular data we’re interested in – so if the format of the sheet changed by say adding additional header rows, our extract would still work as expected.
As another example of extracting tabular data, consider this URL which returns sample JSON data (it could just as easily been XML):
This type of data could just as easily have been returned by a request to a REST API. We could construct such a call right within Excel, using something like this:
Notice that the source JSON is not in a tabular format, but we’ve described the shape of the data we want using the XS.QL language and gotten a single, flattened table. XS.QL is open to growth – feel free to make enhancement requests, and sign up for blog post notifications to find out about new features and enhancements.
Did you like this article? Please rate it!