NoSQL although by definition it was understood to be “No SQL ” at all, its more or less accepted as “Not Only SQL” and is a broad class of database management systems that differ from classic relational database management systems (RDBMS) in some significant ways. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally.
Basic Concepts and Techniques
The design of most of the NoSQL systems revolves around few concepts which are much talked about in the recent years.
- CAP Theorem
C – for “Consistency” :ability of a system to remain in consistent state after an update or an operationA – for “Availability” : availability of a system even in the event of adversity or system issues
P – for “Partition Tolerance” : ability of system to function in presence of network partitions even if partitions are added/deleted
CAP theorem as defined by Eric Brewer states that
“Any form of distributed system with state, of which a distributed database is the canonical example, can exhibit atmost two of the above mentioned desirable properties Consistency, Availability and Persistence”.
This means you can choose the two factors you need to achieve as per above theory to have either a NoSQL system that is “Consistent and Available (CA)” or “Consistent and Persistent (CP)” or “Available and Persistent (AP)”
The KEY assumption is that the system needs to persist data and/or has state of some type, if you don’t need either Data persistence or State ANYWHERE, you can get very close to having Consistency, Availability, and Partitioning simultaneously.
ACID vs BASE
I am sure ACID attributes are known entities working with Relational databases but just to summarize
- A – Atomicity (Any transaction should either be completely successful or failure)
- C – Consistency (System always moves from one consistent state to another consistent state)
- I – Isolation (A transaction executes in isolation and no external operation can access the transactional data being modified.)
- D – Durable (Ability of a system to recover the committed transaction updates against any kind of system failure)
The BASE approach according to Brewer forfeits the ACID properties of consistency and isolation in favorof “availability, graceful degradation, and performance”
- B – Basically Available
- S – Soft State
- E – Eventual consistency
The choice of NoSQL system therefore will revolve around above parameters
There are several data models that have driven the design of NoSQL datastores, however the key ones that are worth listing are
- Relational systems which we had been using so far supporting the ACID attributes and relationships/joins
- Key-Value systems where the data is stroed in Key-Value pairs and basically support get, put, and delete operations based on a primary key
- Column-Oriented systems which store data in tables and columns with no support to relationships or joins
- Document-Oriented systems which store data in structured “documents” such as JSON/XML/YAML with no support to relationships/joins
Why NoSQL Databases
- Make It Less Complex: Its a fact that not every enterprise or applications need the robust features and strict data consistency provided by traditional RDBMS systems. There is a possibility to come up with a system that is less complex with little lot of compromises and the one which more fits current needs. For example applications that are OK to compramise on reliability and have better performance.
- Horizontal Scalability: With the growing size of data that some big companies like Google, Twitter, Facebook and others are dealing with a good amount of scalability is required at lowest cost possible. It is not to say the traditional RDBMS cannot achieve this but question is at what cost. Can an affordable trade off on either of CAP parameters do wonders in reducing cost considrably. Probably answer is yes.
- Economics: NoSQL databases typically use clusters of cheap commodity servers to manage the exploding data and transaction volumes, while RDBMS tends to rely on expensive proprietary servers and storage systems. The result is that the cost per gigabyte or transaction/second for NoSQL can be many times less than the cost for RDBMS, allowing you to store and process more data at a much lower price point.
- Minimize Expensive Object Relational Mapping: It sounds a bit weird to write against relational mapping after using it for so many years but admittedly these are expensive compared to a non relational storage and data access. In particular those applications that do not benefit much from relational mapping they certainly dont need the robust features of traditional RDBMS.
- Compelling CLOUD Requirements: The biggest requirements to move into cloud is to have a data storage that is as much scalabale as possible horizontally and comes with lowest administrative overhead. This need which was a one off in few bigger organizations have now become a common problem with the advent of social networking both at personal and business front across all the layers.