Sunday, April 12, 2015

Learning about Cassandra and Coursera

Watched the presentation produced recently by Datastax and Coursera on Cassandra and how and some of why they chose Cassandra over Mongo and Hbase to replace some of their MySQL for the website.

Some key takeaways I saw was that Cassandra better fitted their data model over HBase or MongoDB. That Mysql Sharding was difficult for some of their needs.

Coursera has a very distributed customer base and MySQL wouldn't handle multi-site information capture and distribution.

Cassandra could do rolling upgrades without downtime,this is important for Coursera, if you have ever filled out a set of responses to an assignment and then not be able to save then because the application was offline due to a database maintenance window. This was a key tenet for what replaced MySQL.
Now if your geographic requirements, ie your customer base live local to Australia or Europe, then a single site of MySQL with Galera, or using AWS RDS, may well produce suitable results. You can also use a message bus within your application to provide some resilience to the database being offline.
Rolling updates are difficult for most RDBMS and Cassandra is strong in this area.

Like so much, it's not about being cool, it's about knowing the problem you are solving to choose the right product. There is no bad database only bad implementations using the wrong product in a given situation. MongoDB seems to cop a little flak, but this is probably more about a headlong rush to get away from a perceived problem with MySQL. Often early-stage web applications ar  built with some part as Key pair store or as document store mentality. This is often simple to handle from the web development position, however, it ultimately won't scale in MySQL, or other RDBMS for that matter. So then people start looking for alternate storage ie other databases. This often creates a solution to that part of the database which is a document but causes problems with relational parts of the application. Ultimately this ends up with MongoDB being a bad choice for solving the problem.

Cassandra will be a fantastic solution as will MongoDB to the right problems. It won't solve poor design or a bad understanding of application requirements or business needs to access the data for reporting purposes. For this reason think carefully about any decision to jump off an original choice of an RDBMS such as MySQL and into the arms of a NoSQL database. It might leave a very sour taste in your mouth in the NoSQL database you choose.

Disclaimer: I have taken courses from Coursera


See ya round

Peter



No comments: