4. Mongo

4.1. ezdb.mongo.Mongo

Nemesyst MongoDB abstraction/ Handler. This handler helps abstract some pymongo functionality to make it easier for us to use a MongoDB database for our deep learning purposes.

4.2. Example usage

This unit test also briefly shows how to use gridfs by dumping tuple items in the form (dict(), object), where the dict will become the files metadata and the object is some form of the data that can be sequentialized into the database.

Warning

Mongo uses subprocess.Popen in init, start, and stop, since these threads would otherwise lock up ezdb.mongo.Mongo, with time.sleep() to wait for the database to startup, and shutdown. Depending on the size of your database it may be necessary to extend the length of time time.sleep() as larger databases will take longer to startup and shutdown.

Setting up a basic database, and initializing it with a user.

    def setUp(self):
        """Predefined setUp function for preparing tests, in our case
        creating the database."""
        import os
        from ezdb.mongo import Mongo

        # the path and directory we want to use to store the database files
        db_path = "./unit_test_db"
        db = Mongo({"pylog": null_printer, "db_path": db_path,
                    "db_log_path": db_path})
        # initialise the database files and create a basic user
        db.init()
        # start the database with authenticaton
        db.start()

        # for tests only to check db directory is created
        self.assertTrue(os.path.isdir(db_path))

Connecting to and dumping data to a database using normal mongodb requests.

    def test_dump(self):
        """Test/ example of dump and retrieve from a MongoDB database."""
        from ezdb.mongo import Mongo

        db = Mongo({"pylog": null_printer})
        self.assertIsInstance(db, Mongo)
        db.connect()
        db.dump(db_collection_name="test", data={"success": 1})
        cursor = db.getCursor(db_collection_name="test")
        for batch in db.getBatches(db_data_cursor=cursor):
            self.assertEqual(len(batch), 1)
            for doc in batch:
                self.assertEqual(doc["success"], 1)

Using Gridfs to dump larger files in chunks to database.

    def test_gridfs(self):
        """Test/ example of gridfs dump and retrieve from MongoDB."""
        from ezdb.mongo import Mongo

        db = Mongo({"pylog": null_printer})
        self.assertIsInstance(db, Mongo)
        db.connect()
        db.dump(db_collection_name="test", data=({"success": 1}, b'success'))
        cursor = db.getCursor(db_collection_name="test.files")
        for batch in db.getFiles(db_data_cursor=cursor):
            for grid in batch:
                # check ids match
                self.assertEqual(grid["_id"], grid["metadata"]["_id"])
                # read file and check is equal to what we put in
                self.assertEqual(grid["gridout"].read(), b'success')

Completely removing the database, this completely removes all your data.

    def tearDown(self):
        """Predefined tearDown function for cleaning up after tests,
        in our case deleting any generated db files."""
        import os
        import shutil
        from ezdb.mongo import Mongo

        db_path = "./unit_test_db"
        db = Mongo({"pylog": null_printer, "db_path": db_path,
                    "db_log_path": db_path})
        db.stop()
        if(db_path is not None):
            shutil.rmtree(db_path)

        # for tests only to check db directory has been removed
        self.assertFalse(os.path.isdir(db_path))

4.3. API

class ezdb.mongo.Mongo(args: Optional[dict] = None, logger: Optional[print] = None)

Python2/3 compatible MongoDb utility wrapper.

This wrapper saves its state in an internal overridable dictionary such that you can adapt it to your requirements, if you should need to do something unique, the caveat being it becomes harder to read.

Parameters

args (dictionary) – Dictionary of overides.
logger (function address) – Function address to print/ log to (default: print).

Example

Mongo({“db_user_name”: “someUsername”, “db_password”: “somePassword”})

Example

Mongo()

connect(db_ip: Optional[str] = None, db_port: Optional[str] = None, db_authentication: Optional[str] = None, db_authentication_database=None, db_user_name: Optional[str] = None, db_password: Optional[str] = None, db_name: Optional[str] = None, db_replica_set_name: Optional[str] = None, db_replica_read_preference: Optional[str] = None, db_replica_max_staleness: Optional[str] = None, db_tls: Optional[bool] = None, db_tls_ca_file: Optional[str] = None, db_tls_certificate_key_file: Optional[str] = None, db_tls_certificate_key_file_password: Optional[str] = None, db_tls_crl_file: Optional[str] = None, db_collection_name: Optional[str] = None, db_server_selection_timeout=None) → pymongo.database.Database

Connect to a specific mongodb database.

This sets the internal db client which is neccessary to connect to and use the associated database. Without it operations such as dump into the database will fail. This is replica set capable.

Parameters

db_ip (string) – Database hostname or ip to connect to.
db_port (string) – Database port to connect to.
db_authentication (string) – The authentication method to use on db.
db_user_name (string) – Username to use for authentication to db_name.
db_password (string) – Password for db_user_name in database db_name.
db_name (string) – The name of the database to connect to.
db_replica_set_name (string) – Name of the replica set to connect to.
db_replica_read_preference (string) – What rep type to prefer reads from.
db_replica_max_staleness (string) – Max seconds behind is replica allowed.
db_tls (bool) – use TLS for db connection.
db_tls_ca_file (string) – Certificate authority file path.
db_tls_certificate_key_file (string) – Certificate and key file for tls.
db_tls_certificate_key_file_password (string) – Cert and key file pass.
db_tls_crl_file (string) – Certificate revocation list file path.
db_collection_name (string) – GridFS collection to use.

Returns

database client object

Return type

pymongo.database.Database

debug() → None

Log function to help track the internal state of the class.

Simply logs working state of args dict.

donate(other, other_collection, db_collection_name, db_data_cursor=None, sum: Optional[int] = None, frequency: Optional[int] = None, count: Optional[int] = None)

Donate documents to another db collection.

Like giving blood, we are not getting anything back to self, other than maybe gratification.

dump(db_collection_name: str, data: dict, db: Optional[pymongo.database.Database] = None) → None

Import data dictionary into database.

Parameters

db_collection_name (string) – Collection name to import into.
data (dictionary) – Data to import into database.
db (pymongo.database.Database) – Database to import data into.

Example

dump(db_collection_name=”test”, data={“subdict”:{“hello”: “world”}})

getBatches(db_batch_size: Optional[int] = None, db_data_cursor: Optional[pymongo.command_cursor.CommandCursor] = None) → list

Get database cursor data in batches.

Parameters

db_batch_size (integer) – The number of items to return in a single round.
db_data_cursor (command_cursor.CommandCursor) – The cursor to use to retrieve data from db.

Returns

yields a list of items requested.

Return type

list of dicts

Todo

desperateley needs a rewrite and correction of bug. Last value always fails. I want this in a magic function too to make it easy.

getCursor(db: Optional[pymongo.database.Database] = None, db_pipeline: Optional[list] = None, db_collection_name: Optional[str] = None) → pymongo.command_cursor.CommandCursor

Use aggregate pipeline to get a data-cursor from the database.

This cursor is what mongodb provides to allow you to request the data from the database in a manner you control, instead of just getting a big dump from the database.

Parameters

db_pipeline (list of dicts) – Mongodb aggregate pipeline data to transform and retrieve the data as you request.
db_collection_name (str) – The collection name which we will pull data from using the aggregate pipeline.
db (pymongo.database.Database) – Database object to operate pipeline on.

Returns

Command cursor to fetch the data with.

Return type

pymongo.command_cursor.CommandCursor

getFiles(db_batch_size: Optional[int] = None, db_data_cursor: Optional[pymongo.command_cursor.CommandCursor] = None, db_collection_name: Optional[str] = None, db: Optional[pymongo.database.Database] = None) → list

Get gridfs files from mongodb by id using cursor to .files.

Parameters

db_batch_size (integer) – The number of items to return in a single round.
db_data_cursor (command_cursor.CommandCursor) – The cursor to use to retrieve data from db.
db_collection_name (str) – The top level collecton name not including .chunks or .files where gridfs is to operate.
db (pymongo.database.Database) – Database object to operate pipeline on.

Returns

yields a list of tuples containing (item requested, metadata).

init(db_path: Optional[str] = None, db_log_path: Optional[str] = None, db_log_name: Optional[str] = None, db_config_path: Optional[str] = None) → None

Initialise the database.

Includes ensuring db path and db log path exist and generating, creating the DB files, and adding an authentication user. All of this should be done on a localhost port so that the unprotected database is never exposed.

Parameters

db_path (string) – Desired directory of MongoDB database files.
db_log_path (string) – Desired directory of MongoDB log files.
db_log_name (string) – Desired name of log file.
db_config_path (string) – Config file to pass to MongoDB.

login(db_port: Optional[str] = None, db_user_name: Optional[str] = None, db_password: Optional[str] = None, db_name: Optional[str] = None, db_ip: Optional[str] = None) → None

Log in to database, interrupt, and availiable via cli.

Parameters

db_port (string) – Database port to connect to.
db_user_name (string) – Database user to authenticate as.
db_password (string) – User password to authenticate with.
db_name (string) – Database to authenticate to, the authentication db.
db_ip (string) – Database ip to connect to.

start(db_ip: Optional[str] = None, db_port: Optional[str] = None, db_path: Optional[str] = None, db_log_path: Optional[str] = None, db_log_name: Optional[str] = None, db_cursor_timeout: Optional[int] = None, db_config_path: Optional[str] = None, db_replica_set_name: Optional[str] = None) → subprocess.Popen

Launch an on machine database with authentication.

Parameters

db_ip (list) – List of IPs to accept connectiongs from.
db_port (string) – Port desired for database.
db_path (string) – Path to parent dir of database.
db_log_path (string) – Path to parent dir of log files.
db_log_name (string) – Desired base name for log files.
db_cursor_timeout (integer) – Set timeout time for unused cursors.
db_path – Config file path to pass to MongoDB.

Return type

subprocess.Popen

Returns

Subprocess of running MongoDB.

stop(db_path=None) → subprocess.Popen

Stop a running local database.

Parameters: db_path (string) – The path to the database to shut down.
Returns: Subprocess of database closer.
Return type: subprocess.Popen

userAdd(username: str, password: str, roles: list) → None: Take new credentials and create new user in database