Instant Blog: Just Add Water! – Live from Twitter: SaaS BI & Security
Recently (as in, yesterday afternoon) I took on a flurry of SaaS BI related questions on Twitter from fellow BI technologist Justin Swanhart pertaining to data safety and integration issues in SaaS BI. His questions were excellent ones (as usual) but I didn’t feel Twitter had enough bandwidth to properly address them. Consequently, I’d like to do so on what I call an “instablog” – This is a quick “follow-up” blog post motivated by the need to expand a discussion or Q&A session seeded on a social media site like Twitter. Once done, I will post this blog address on Twitter and hopefully, everyone can benefit from and comment on the exchange.
Justin’s initial twit was “How can you be absolutely sure your data is secure in a SaaS environment?”. To which I promptly replied “define absolutely”. My reply wasn’t meant to be a wisenheimer (well, ok maybe just a little), but notwithstanding marketing hooplah, few things are really “absolutely sure” in the realm of security. And one engineer’s “absolutely” is another’s “maybe”. Luckily Justin pursued with “How can I be reasonably sure? Is my data encrypted? How do I ensure SaaS employees don’t see my data, let alone malware?”. Re-phrased that way, the question is a tad easier to tackle. And I can certainly address this from a GoodData perspective.
Understandably, data protection and safekeeping is the number one concern most people have with SaaS. And no security mechanism can ensure data safety 100%, no matter what certain vendors will claim. The best case scenario is indeed a certain level of “reasonable” assurance. In my opinion, it’s all about risk level management and fomenting trust among parties.
When you entrust information to a SaaS vendor, there are typically two vulnerability zones: when it is traveling on the wire, and when it is stored inside the cloud. In bodyguard parlance, those are called kill zones. GoodData protects information on the wire by encrypting it (as Justin mentions) using standard SSL encryption over HTTP. And once inside the EC2 cloud, data benefits from enterprise-class protection mechanisms inherent to the Amazon cloud platform, which has recently attained SAS 70 Type II certification. So if you feel that EC2 provides “reasonable” assurance of data safety, then all is well on that front. Could Amazon EC2 get hacked? Sure, why not. But considering the level and sophistication of techniques used by Amazon to secure its cloud, I’m willing to bet it’s a lot easier to hack your enterprise’s data silos than punching a hole in EC2. And your data is probably safer inside EC2 than inside your own organization.
That being said, a great percentage of enterprise security breaches are linked to malicious insider jobs. So Justin’s question about employee vetting is appropriate. At GoodData, every employee goes through a background check. This is standard fare (or at least should be) for all responsible SaaS vendors. More importantly, no customer data ever resides on any GoodData premises (as a matter of fact, we don’t own a single server in the company). Additionally, there are 24/7 internal mechanisms (both manual and automated) monitoring and logging all usage. We run dedicated pattern-matching software to flag unusual behavior and activity. And without going into excessive proprietary details for obvious reasons, I like to think it’s harder for suspicious activity to go unchallenged at GoodData than finding a good American Martini in Prague (been there done that). Additional pertinent details regarding GoodData security are posted here.
Justin also followed up with “Is SaaS even an option when there are draconian security policies for specific pieces of data?” The honest answer to this, from my perspective, is a flat out “no”. To me, “draconian” means there is some specific legal industry regulation or compliance rule (government or corporate) that cannot be breached. In this case, you simply cannot use SaaS. It’s that simple. And trying to convince the public of the contrary is, in my opinion, disingenuous. Fact is, some data is simply not meant for the cloud, and I wish more vendors would admit that plain and simple.
Last but not least, Justin asked “And is there any reasonably easy way to integrate SaaS and OSBI tools if I can only ship some of my data around?” – This is a great question that transcends the “religious” aspects of both approaches. Beyond the fact that OSBI users are unlikely to simultaneously use SaaS BI tools and vice-versa, is there a technical best practice recommendation for doing so? Personally, I’ve not encountered one and thinking through it is challenging at best. I am still mulling it.
At issue is querying both private and “clouded” islands of data simultaneously. But where should results live? Are aggregates as protected as detailed data? Should query content also be protected? Can results be pulled back from the cloud and then integrated with local ones? Can you slice and dice on both local and remote dimensions? Can this be done quickly enough in real time? Certainly, you can export GoodData reports into Excel and then use those results internally. Better yet, you can use the REST API to automatically grab report results (the actual data) in YAML and integrate those into your on-premise tool (whether OSS or not). So my inclination would be to suggest that it is indeed possible to integrate some aspects of both platforms, albeit not easily (that’s the “reasonable” part), and likely with limitations – I’d love to see a set of requirements though, because this sounds like a really compelling, challenging, “front-line” project.
I want to thank Justin for taking the time to think of and post these challenging questions on Twitter and look forward to “instablogging” many more in the future with your help. To look up the above-mentioned twits, you can do a search on @gooddata, @jswanhart or @jeromepineau.
2 Comments
+1 for your comment that some data really doesn’t belong in the cloud. There are always trade-offs to consider, and sometimes it just doesn’t make the most sense. That said, the security of various cloud offerings is steadily improcing, and standards are changing, as well… so the best answer in any given instance today could be different tomorrow.
For example, the Windows Azure cloud, which runs in Microsoft’s SAS 70 certified data centers. Additionally, all of Microsoft’s data centers are also ISO 27001 certified. Going beyond data encryption over the wire, last month Microsoft also announced Project Sydney for secure channel connections between the Windows Azure cloud and on-premise/corporate environments. Net-net: lots of evolution on this front.
BTW, If you have a data originating from lots of different on-premises sources that you want to analyze you might consider using Windows Azure’s Service Bus—it will let you route data from a diverse array of source systems (even if they are only occasionally on the network) to other systems, such as GoodData.
And finally, if you’re looking for data sources that you can stream to either GoodData or on-premise BI environments, check out Codename Dallas - a new “Data-as-a-Service” marketplace for both commercial and public (free) data sets.
@John, thank you for taking the time to read and comment on the post. You’re right that it’s a fast moving target. I have to say that “glitches” like the Sidekick caper scare me more than malicious attempts at breaking & entering, although that was probably hyped as well. I think the bottom line is that security is a 2-party dance. Yes the host (SaaS vendor) needs to be airtight but the buyer must also do their homework, get an education (or reinforce it), compare accordingly and insist on the fundamental. Expecting the host to shoulder the entire process is, IMO, naive at best. Trust but verify
Thanks
J.
Leave a comment