The Power of the xSkrape Vault
“The xSkrape Vault is a concept we introduced with the 1.1 release of
the xSkrape add-in. It lets you secure field-level
data in your spreadsheets. This article is a deep-dive into the ‘how’ and ‘why’
you'd want to use this powerful feature.”
Spreadsheets are used every day in ways that focus on sharing and collaboration. In doing so, we often rely on security using features that are built into our spreadsheet software, such as password protection, change tracking, and so on. However, one feature that’s frequently missing is the ability to hide data in a way that makes it impossible for others to see specific data values, without required privilege. For example, Excel offers “range protection” as a feature, but that’s related to who can edit values: it says nothing about redacting sensitive data from those who should not see it. There’s plenty of examples of this type of data: salary data, lists with private information such as social security numbers, phone numbers, contact names, and much more. (We can certainly debate the merits of not maintaining sensitive data at all in spreadsheets, favoring secure systems for example – but the truth is, it happens frequently!) There are other more complex scenarios such as where we’d like to hide individual data values but retain certain calculations since those might reflect final metrics that are less sensitive. We can try to address these cases using password protection on an entire workbook, but there are many times we might want to share a subset of data, without having to resort to copying data into multiple places, increasing maintenance pain.
The xSkrape add-in helps address the problems described above by securing ranges of values and formulas, using a simple concept: “lock” and “unlock” buttons. When you lock a range, there are five base options:
1. Remove values and formulas
2. Replace values and formulas with “#InVault”
3. Remove values and change formulas into values
4. Replace values with “#InVault” and change formulas into values
5. Replace values with obfuscated values and change formulas into values
Unlocking puts values and formulas back to their original state. A key question is, “where do values go that are removed or replaced?” We physically remove them from the spreadsheet and put them in our cloud storage vault. The storage is linked to your xSkrape.com account. From a “basics” standpoint, you have no need to think about the removed data when the range is locked and you can share a workbook with protected ranges with anyone else and they will have no way to get at original values (unless you want them to and they have the xSkrape add-in installed: more on that later when we cover protection levels).
The option to replace formulas with values supports some specific use cases, including one like this:
Here we see a list of people (column E) with what could possibly be their annual salaries (column F). Perhaps we’re okay with sharing the total in cell F40, but we don’t want anyone to see individuals’ salaries. We can protect all of column F using a secured range task shown on the right. Having selected “Change formulas into values (non-formulas use #InVault)”, when we click on the “Lock” button, the spreadsheet content changes to this:
Notice that individual salaries no longer show values, but the total is intact. In fact, the total has changed from the formula “=SUM(F35:F39)” to be the discrete value “402000”. When we click the “Unlock” button, the content changes back to what we saw previously, including our SUM as a formula.
We could easily share the workbook with the locked range with anyone, and they’d have no way to arrive at the individuals’ salaries. The vaulting concept is that simple: lock and unlock do what you’d expect logically.
Note: You can get this sample data to play with yourself using the 1.1 version of the xSkrape add-in. In the task pane view, select the Admin -> “Add Examples to Current Worksheet” menu option and sample data and tasks are added to your current worksheet.
The above scenario hides salary dollars for individual employees but retains the employee names and company-wide total. Another way to approach this problem, though, is to hide the employee names instead. We could do this with the blank or “#InVault” options, but if we wanted to retain a semblance of “reality” to the original spreadsheet, we might want to replace real employee names with fake names. The concept can be extended to addresses, company names, phone numbers, etc. In doing so, we can retain structure that might be important for the workings of the spreadsheet (e.g. a pivot table or chart that may use the name column, assuming non-blank values).
Using obfuscation is a two-step process. First we select this lock behavior – “use obfuscation pattern”:
Next, we must provide an obfuscation pattern. This is some text that can contain pre-defined words that are replaced with random data for each cell involved, or characters that are transformed into random characters, or we can include characters that are left “as is” in the final generated value. We offer several examples of these patterns here. So for example if we set our task’s obfuscation pattern to “firstname”, then clicking the lock button changes our data to look like this:
Notice that the original names are no longer the same! The concept extends to support anything that an obfuscation pattern can be written for.
Let’s consider a different example, and assume our original data looks like this:
After locking this data, we might see:
Notice that some values are repeating – and they correspond to the data that was repeating in the original data. Again: this can be important to maintain a “realistic fake-out” with our obfuscation, where the pattern of repetition is retained. Also of note, the generated data may not “make sense”: zip codes may not match actual states they normally fall into, and so on.
xSkrape offers four different protection levels for secured ranges. This is intended for use in an environment where you have multiple people who have the xSkrape add-in installed, using a common xSkrape.com account (to manage credit usage, etc.) – as would be encouraged in an enterprise setting. In this case, different people may have different levels of access to data, which translates into differences in who should be able to unlock secured ranges.
For example, assume the CEO should have access to all data, no matter the sensitivity – we call this “level 1 access” in xSkrape. Suppose that his assistant should have nearly complete access – she might use “level 2 access.” Someone else who should see less might have “level 3 access.” We further assume that anyone who should be able to lock or unlock data has the xSkrape add-in installed, and they have linked their add-in to the same xSkrape.com account. The owner of the xSkrape.com account is considered a super-user and a such they could establish each person’s access level, as illustrated here:
In this scenario, the CEO can unlock any secured range, regardless of its protection level. The executive assistant can unlock any ranges identified as levels 2, 3 or 4 – but not 1. The “Other User” can only unlock ranges identified as level 4. Ranges themselves have protection levels which supports the idea that different data can have different security requirements. The CEO may add a secured range identified as level 1 for something that only he should be able to unlock and see, whereas a range identified as level 4 may be something that anyone in the company can unlock to see, edit and relock – but perhaps the locked version is what can be shared outside of the company, as one example of usage. You can add secured ranges only at or below your own permission level.
By default, new client registrations for the add-in default to security level 4: the least privileged. This is appropriate since we should not assume all new users have the greatest privilege: you can adjust their privilege upward, as necessary using xSkrape.com. (You can change this default behavior, however, on the user profile page for the xSkrape.com account that’s being used.)
In terms of how the secured range task actually works, keep in mind:
- When you use a password for a range, we don’t store the password itself anywhere. We take the password and pass it to our Vault service (over SSL) which transforms it using an industry-standard algorithm (yes, with salting!) into not one, but two security features: an authentication key and it becomes part of the encryption key used to actually encrypt your data. The authentication key is used when it comes time to request an unlock of a range: if it does not match what you provide in the unlock request, the request fails. The implication is since the only way to get at your data from a public-facing standpoint is to use our services, the authentication key alone is very secure, when you actually do provide a password. The encryption of data is a second level of protection, which makes your data unreadable to us, and if our database were ever to be stolen, it’d unreadable by others without great effort. Also note, every cell has its own unique encryption key, which makes the process of loading and saving data slower, but is more secure.
- Suffice it to say, if you lose your password for any range – we cannot help you recover it, so please manage your passwords carefully. We have made the password optional, mainly to support those who are okay with protection that is offered based on secret keys that are linked to your client installation of the add-in, combined with a globally unique key for the range itself. Given that our communication is based on HTTPS, it’s very unlikely that an outsider could uncover these keys over the network, although if they had full access to your machine, they could in theory find these values with some considerable effort. This is why we suggest using a password for all ranges, but it is not required.
- We only store data that’s in the actual cell range specified. What this means is using the above example for salary data, our database would only see a bunch of (encrypted) numbers, with no correlation to names assuming you did not secure both names and salaries together. This concept offers yet another level of security for your data, since you have effectively secured a partial range, where important (but not sensitive) data is left in the workbook.
- Credit charges are applied based on the number of cells you lock and unlock – and also based on data size over 999 bytes of plaintext data. Current rates are always available here.
Q: If I don’t trust my
own people with certain data values, why should I trust you with them, in your
cloud storage?
A: As we’ve covered, there are multiple levels of protection afforded by our architecture, and we’re not that much different from other companies that offer secure cloud storage – search for “cloud vault” and you’ll see many examples. We currently use strong authentication when dealing with password protection, and a moderate level of symmetric encryption of actual values. In saying “moderate,” we are in no position to recover your data if you lose your password, but we also conform to the 5D992 NLR classification which is administered by the U.S. Department of Commerce's Bureau of Industry and Security. We’re open to pursuing stronger encryption levels if there’s sufficient customer demand. (Presently we don’t feel a need to add compliance overhead without the necessary demand.) We also encourage the vaulting of “partial ranges” as previously described. We use HTTPS for all communications with our API’s. All these factors taken together offer significant protection for your data. We’re happy to discuss any concerns you may have.
Q: I like the kind of
data generated with obfuscation. Can I generate some of this “sample data”
without using secured ranges?
A: Technically, you can use a secured range for an empty set of cells, with the following setting:
Locking with this will give you back unique random data in the desired format even when the original cells are blank. We’ll be looking at adding a specific task type to support random data generation, based on user feedback. As a side note: we offer comparable functionality in SQL-Hero today to create realistic sample data over complex relationships for SQL Server databases.
Q: What happens if I
change the structure of my worksheet that contains a secured range? For example,
I add a new row above the range.
A: You can adjust the “Target” address in the task appropriately. You would generally want to keep the size of the range constant, however, otherwise you’ll be only retrieving the smaller of what was actually stored during the lock operation and the current target range size. Your locked data is directly associated to the xSkrape secured range task through a key and so deleting the range means deleting your ability to recover locked data. This is why deleting a secured range task that is currently locked is a two-step process: unlock the range and only if that’s successful is the task removed from the task list. As such, if the unlock fails (e.g. due to permission issues), the task will not be deleted.
Q: What happens if there’s
data interruption, some catastrophe that stops your systems, etc.? Is my data
lost?
A: We’re looking at a possible “backup” feature that would let you download an encrypted file that when run through a utility would reconstitute your original data. We’d like to hear from you if this is an important feature for you. Also, please pay heed to your current credit balance when performing any lock operation: the cost to unlock is roughly the same as a lock and you’ll be unable to do this if your credit balance is too low.
Q: How long do you
retain locked data for?
A: Our present policy is to retain everything for at least two years. As long as you’ve used the product in the prior 12 months after that, we continue to retain your data. If you have not used the product (any feature) for over a year, we’ll make an effort to contact you by email multiple times and you’ll need to use the product in some way over the following three months or we many choose to archive your data. In archived form, we may choose to offer a way to restore this at an extra cost (and not in real-time), although we do not have details on this fleshed out as of 8/1/16 (the feature is brand new, after all!).
This is a powerful new feature that we’re excited about: it offers a much easier approach to cell-level security compared to alternatives today, and we offer it at a very reasonable price. In fact, with our limited time free usage tier, you can be locking and unlocking small data sets today for free. (Note: this is not a trial: it’s fully functional and continues indefinitely at the “free tier” level.)
Look for continued improvements to this task including new types of obfuscation patterns, and perhaps a Google Docs add-on version as well. Let us know what you think and by all means share this whitepaper to encourage continued growth of the feature.
Did you like this article? Please rate it!