In part-one I started a journey on implementing and setting up an initial database using MongoDB. At the end of that post, I had a single collection, called “projects” with a defined schema and about 1000 records populated in it.
In this post, I will explore how query many tables/collections, how to update data, and finally how to remove data.
To get started, I created a “charters” table in postgres and gave the “projects” table a foreign key of charter_uuid:
CREATE TABLE public.charters ( uuid uuid NOT NULL, charter_id character varying(255) COLLATE pg_catalog."default" NOT NULL, creator character varying(255) COLLATE pg_catalog."default", sponsor character varying(255) COLLATE pg_catalog."default", expiration date, created_dt date, updated_dt date, CONSTRAINT charters_pkey PRIMARY KEY (uuid) ) WITH ( OIDS = FALSE ) TABLESPACE pg_default; CREATE TABLE public.projects ( uuid uuid NOT NULL, project_name character varying(255) COLLATE pg_catalog."default" NOT NULL, status character varying(255) COLLATE pg_catalog."default" NOT NULL, created_dt date, updated_dt date, charter_uuid uuid, CONSTRAINT projects_pkey PRIMARY KEY (uuid), CONSTRAINT charters_uuid FOREIGN KEY (charter_uuid) REFERENCES public.charters (uuid) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS = FALSE ) TABLESPACE pg_default;
Then, using Mockaroo, I generated dummy data that respected the new foreign key uuid. You can do this by first generating the charters data, then using “custom list” to detail out the valid charter UUIDs in the project data generation.
I think exported both as csv – and imported them into postgres & mongoDB.
Mongoimport makes this very easy:
mongoimport --db project-next --collection charters --type csv --file charters.csv --headerline
Retrieving data from many tables/collections
So, for this example, lets get a couple fields from the project table and a couple from the charters table and do a simple inner join:
select p.project_name, p.status, c.expiration, c.sponsor from projects p, charters c where p.charter_uuid = c.uuid
While postgres resolves the foreign key relationship for you, on the server side, this isn’t true for mongoDB. Like other NoSQL dbs, MongoDB is built for a large amount of read & write operations. The join operations get expensive for it to handle as well – so it pushes that client side. What this means is that we define the joins & verify their integrity more within our application code.
db.projects.aggregate([{ "$lookup":{ "localField":"charter_uuid", "from":"charters", "foreignField":"uuid", "as":"charterInfo"} }, {"$unwind":"$charterInfo"}, {"$project":{ "project_name":1, "status":1, "charterInfo.expiration":1, "charterInfo.sponsor":1} } ]).pretty()
Output:
{ "_id" : ObjectId("5ac651379b6f1fb3b8ec1841"), "project_name" : "Stracke, Lubowitz and Rath", "status" : "In Progress", "charterInfo" : { "sponsor" : "Wenonah Fossett", "expiration" : "2017-12-02" } }
There are parts to this query that need a bit more explanation.
The aggregation operation is the only place to do these kind of joins. It kind of implements a pipeline of post-query functions to send back the data that matches.
In the lookup structure, you indicate the table(s) you’d like to join with and which keys to use. This does a left-outer join by default. So in our example, you’d get back projects that had no charters… unless you add in unwind.
The unwind function removes any results that don’t have a charter – and creates additional single-record entries for projects that have more than one charter.
Finally, the project function indicates what data we would like returned. 1 indicates true. Alternatively, you can add in expressions to present altered (but not stored) data on the fly.
Updating data
It’d be quite something if all our delayed projects were completed magically. Let’s pretend that happened and update the data.
select count(*) from projects where status = 'Delayed'; --253 update projects set status = 'Done' where status ='Delayed'; --updated 253 select count(*) from projects where status = 'Delayed'; --0
For mongo, we pass in a filter first, what the data should be set to, and if we should do the replacement for all records found:
> db.projects.find({"status":"Delayed"}).count() 253 >db.projects.update( {status:"Delayed"}, {$set:{status:"Done"}}, {multi:true} ) WriteResult({ "nMatched" : 253, "nUpserted" : 0, "nModified" : 253 }) > db.projects.find({"status":"Delayed"}).count() 0
Deleting data
In our data set, there are a few projects that haven’t started tied to charters that have expired. Let’s see what’s needed to delete those:
select count(*) from projects where status='Not Started' and charter_uuid in ( select uuid from charters where expiration < current_date ); --17 delete from projects where status='Not Started' and charter_uuid in ( select uuid from charters where expiration < current_date ); -- DELETE 17 select count(*) from projects where status='Not Started' and charter_uuid in ( select uuid from charters where expiration < current_date ); --0
For mongo, we first need to find a list of charters that have expired. We save this as a var called “cursor” then loop through bringing in the other filter on the status:
var cursor=db.charters.find({"expiration":{$lte: new ISODate("2018-04-05T12:00:00Z")}}) cursor.forEach(function (doc){ db.projects.deleteMany({"status":"Not Started","charter_uuid": doc.uuid}) });
Conclusion
MongoDB was never meant to be a relational database, so SQL-like queries are a bit more difficult to construct. It does tie more cleanly to app languages like Node.js though making it more developer friendly to integrate with. If you’re developing a single app to hit a MongoDB instance, I see no concern with having the structure & object model more persistent in app-code.
As an aside, performance and scale wasn’t something I touched on here, but this presentation does a great job comparing the two.
Leave a Reply