What and Why NoSQL

NoSQL

"Not Only SQL" is a set of technologies designed to provide storage solutions that are not constrained to the traditional relational database model that is widely considered standard.

We wish to store data in a way that naturally matches the information we are processing instead of manipulating the data to fit into a relational model. Note: this is not a database class, so we will not focus on those models.

Some NoSQL technologies:

  • MongoDB - Document oriented database

  • Cassandra - Column oriented database

  • REDIS - Key/Value structured database

  • Neo4j - Graph structured database

  • ArangoDB - Multimodel database

Components of MongoDB

Documents

Documents are roughly similar to JavaScript objects. Technically, Mongo stores its documents in BSON format, but on the surface, this will look like Javascript/JSON.

Here’s an example document:

{
	_id: 1,
}

Data types

Mongo supports many data types natively through the use of its BSON encoding.

IDs

You get a free ID with every document that enters the database. These IDs should always be unique for the system that created them. You will see a key on your document labeled _id, and it might look like the following:

{
	"_id" : ObjectId("5ba02407573f8bc9ee8a65a5")
}

That ObjectId is made up of the following:

  • 4 bites representing the seconds since the UNIX epoch

  • 3 bytes representing the machine running mongo

  • 2 bytes representing the mongo process that added the document

  • 3 bytes representing an internal counter that started at a random number

Collections

Collections are groupings of documents. You may think of a collection as being similar to a table in a traditional SQL-powered database engine. Unlike the relational model, collections do not have a schema and documents within a collection have no requirement that they must be of the same structure. It is up to the client/programmer to ensure that documents within a system meet the specifications required for their use.

Administrative access

Start the server:

mkdir -p data
mongod --dbpath=./data

Keep this terminal running and in a separate terminal, you can run:

From a terminal, you may execute mongo to see a terminal prompt that will take commands. This prompt is a javascript interpreter.

Query system

Mongo actually has three query engines. Foremost is the Aggregation Pipeline engine. There is support for MapReduce and ad-hoc commands as well. Usually, it is best to use pipelines for any data retrieval queries you wish to issue.

Here are some simple pipeline commands you may want to issue:

db.scores.insertMany([
	{
		uid: '0000000000',
		name: 'Billy Jo Student',
		assignmentTitle: 'Homework 1',
		assignmentScore: 78.5,
		curve: 2.0,
		turnedIn: Date("2018-09-17 19:53")
	},
	{
		uid: '0000000001',
		name: 'Jamie Jay Student',
		assignmentTitle: 'Homework 1',
		assignmentScore: 34.0,
		curve: 0.0,
		turnedIn: Date("2018-09-19 13:43")
	},
	{
		uid: '0000000002',
		name: 'Bobby Tables Student',
		assignmentTitle: 'Homework 1',
		assignmentScore: 92.0,
		curve: 2.0,
		turnedIn: Date("2018-09-15 23:23")
	},

]);
db.scores.aggregate([
	{$match: {
		"assignmentTitle": {$exists: 1}
	}},
	{$project: {
		title: "$assignmentTitle",
		score: {$add: ["$assignmentScore", "$curve"]}
	}},
	{$group: {
		_id: "$title",
		avgScore: {$avg: "$score"}
	}},
	{$project: {
		_id: 0,
		assignment: "$_id",
		avgScore: 1
	}}
]).pretty();
[
	{
		uid: '0000000000',
		name: 'Billy Jo Student',
		assignments: [
			{
				assignmentTitle: 'Homework 2',
				assignmentScore: 78.5,
				curve: 2.0,
				turnedIn: Date("2018-09-17 19:53")
			},
			{
				assignmentTitle: 'Homework 3',
				assignmentScore: 78.5,
				curve: 2.0,
				turnedIn: Date("2018-09-17 19:53")
			}
		]
	},
	{
		uid: '0000000001',
		name: 'Jamie Jay Student',
		assignments: [
			{
				assignmentTitle: 'Homework 2',
				assignmentScore: 78.5,
				curve: 2.0,
				turnedIn: Date("2018-09-17 19:53")
			},
			{
				assignmentTitle: 'Homework 3',
				assignmentScore: 78.5,
				curve: 2.0,
				turnedIn: Date("2018-09-17 19:53")
			}
		]
	},
	{
		uid: '0000000002',
		name: 'Bobby Tables Student',
		assignments: [
			{
				assignmentTitle: 'Homework 2',
				assignmentScore: 78.5,
				curve: 2.0,
				turnedIn: Date("2018-09-17 19:53")
			},
			{
				assignmentTitle: 'Homework 3',
				assignmentScore: 78.5,
				curve: 2.0,
				turnedIn: Date("2018-09-17 19:53")
			}
		]
	},
]
db.scores.aggregate([
	{$unwind: "$assignments"},
	{$match: {
		"assignments.assignmentTitle": {$exists: 1}
	}},
	{$project: {
		title: "$assignments.assignmentTitle",
		score: {$add: ["$assignments.assignmentScore", "$assignments.curve"]}
	}},
	{$group: {
		_id: "$title",
		avgScore: {$avg: "$score"}
	}},
	{$project: {
		_id: 0,
		assignment: "$_id",
		avgScore: 1
	}}
]).pretty();

We will cover the aggregation pipeline language in further detail next time.