4. Mongo
4.1. ezdb.mongo.Mongo
Nemesyst MongoDB abstraction/ Handler. This handler helps abstract some pymongo functionality to make it easier for us to use a MongoDB database for our deep learning purposes.
4.2. Example usage
This unit test also briefly shows how to use gridfs by dumping tuple items in the form (dict(), object), where the dict will become the files metadata and the object is some form of the data that can be sequentialized into the database.
Warning
Mongo uses subprocess.Popen in init, start, and stop, since these threads would otherwise lock up ezdb.mongo.Mongo, with time.sleep() to wait for the database to startup, and shutdown. Depending on the size of your database it may be necessary to extend the length of time time.sleep() as larger databases will take longer to startup and shutdown.
Setting up a basic database, and initializing it with a user.
def setUp(self):
"""Predefined setUp function for preparing tests, in our case
creating the database."""
import os
from ezdb.mongo import Mongo
# the path and directory we want to use to store the database files
db_path = "./unit_test_db"
db = Mongo({"pylog": null_printer, "db_path": db_path,
"db_log_path": db_path})
# initialise the database files and create a basic user
db.init()
# start the database with authenticaton
db.start()
# for tests only to check db directory is created
self.assertTrue(os.path.isdir(db_path))
Connecting to and dumping data to a database using normal mongodb requests.
def test_dump(self):
"""Test/ example of dump and retrieve from a MongoDB database."""
from ezdb.mongo import Mongo
db = Mongo({"pylog": null_printer})
self.assertIsInstance(db, Mongo)
db.connect()
db.dump(db_collection_name="test", data={"success": 1})
cursor = db.getCursor(db_collection_name="test")
for batch in db.getBatches(db_data_cursor=cursor):
self.assertEqual(len(batch), 1)
for doc in batch:
self.assertEqual(doc["success"], 1)
Using Gridfs to dump larger files in chunks to database.
def test_gridfs(self):
"""Test/ example of gridfs dump and retrieve from MongoDB."""
from ezdb.mongo import Mongo
db = Mongo({"pylog": null_printer})
self.assertIsInstance(db, Mongo)
db.connect()
db.dump(db_collection_name="test", data=({"success": 1}, b'success'))
cursor = db.getCursor(db_collection_name="test.files")
for batch in db.getFiles(db_data_cursor=cursor):
for grid in batch:
# check ids match
self.assertEqual(grid["_id"], grid["metadata"]["_id"])
# read file and check is equal to what we put in
self.assertEqual(grid["gridout"].read(), b'success')
Completely removing the database, this completely removes all your data.
def tearDown(self):
"""Predefined tearDown function for cleaning up after tests,
in our case deleting any generated db files."""
import os
import shutil
from ezdb.mongo import Mongo
db_path = "./unit_test_db"
db = Mongo({"pylog": null_printer, "db_path": db_path,
"db_log_path": db_path})
db.stop()
if(db_path is not None):
shutil.rmtree(db_path)
# for tests only to check db directory has been removed
self.assertFalse(os.path.isdir(db_path))
4.3. API
- class ezdb.mongo.Mongo(args: Optional[dict] = None, logger: Optional[print] = None)
Python2/3 compatible MongoDb utility wrapper.
This wrapper saves its state in an internal overridable dictionary such that you can adapt it to your requirements, if you should need to do something unique, the caveat being it becomes harder to read.
- Parameters
args (dictionary) – Dictionary of overides.
logger (function address) – Function address to print/ log to (default: print).
- Example
Mongo({“db_user_name”: “someUsername”, “db_password”: “somePassword”})
- Example
Mongo()
- connect(db_ip: Optional[str] = None, db_port: Optional[str] = None, db_authentication: Optional[str] = None, db_authentication_database=None, db_user_name: Optional[str] = None, db_password: Optional[str] = None, db_name: Optional[str] = None, db_replica_set_name: Optional[str] = None, db_replica_read_preference: Optional[str] = None, db_replica_max_staleness: Optional[str] = None, db_tls: Optional[bool] = None, db_tls_ca_file: Optional[str] = None, db_tls_certificate_key_file: Optional[str] = None, db_tls_certificate_key_file_password: Optional[str] = None, db_tls_crl_file: Optional[str] = None, db_collection_name: Optional[str] = None, db_server_selection_timeout=None) pymongo.database.Database
Connect to a specific mongodb database.
This sets the internal db client which is neccessary to connect to and use the associated database. Without it operations such as dump into the database will fail. This is replica set capable.
- Parameters
db_ip (string) – Database hostname or ip to connect to.
db_port (string) – Database port to connect to.
db_authentication (string) – The authentication method to use on db.
db_user_name (string) – Username to use for authentication to db_name.
db_password (string) – Password for db_user_name in database db_name.
db_name (string) – The name of the database to connect to.
db_replica_set_name (string) – Name of the replica set to connect to.
db_replica_read_preference (string) – What rep type to prefer reads from.
db_replica_max_staleness (string) – Max seconds behind is replica allowed.
db_tls (bool) – use TLS for db connection.
db_tls_ca_file (string) – Certificate authority file path.
db_tls_certificate_key_file (string) – Certificate and key file for tls.
db_tls_certificate_key_file_password (string) – Cert and key file pass.
db_tls_crl_file (string) – Certificate revocation list file path.
db_collection_name (string) – GridFS collection to use.
- Returns
database client object
- Return type
pymongo.database.Database
- debug() None
Log function to help track the internal state of the class.
Simply logs working state of args dict.
- donate(other, other_collection, db_collection_name, db_data_cursor=None, sum: Optional[int] = None, frequency: Optional[int] = None, count: Optional[int] = None)
Donate documents to another db collection.
Like giving blood, we are not getting anything back to self, other than maybe gratification.
- dump(db_collection_name: str, data: dict, db: Optional[pymongo.database.Database] = None) None
Import data dictionary into database.
- Parameters
db_collection_name (string) – Collection name to import into.
data (dictionary) – Data to import into database.
db (pymongo.database.Database) – Database to import data into.
- Example
dump(db_collection_name=”test”, data={“subdict”:{“hello”: “world”}})
- getBatches(db_batch_size: Optional[int] = None, db_data_cursor: Optional[pymongo.command_cursor.CommandCursor] = None) list
Get database cursor data in batches.
- Parameters
db_batch_size (integer) – The number of items to return in a single round.
db_data_cursor (command_cursor.CommandCursor) – The cursor to use to retrieve data from db.
- Returns
yields a list of items requested.
- Return type
list of dicts
- Todo
desperateley needs a rewrite and correction of bug. Last value always fails. I want this in a magic function too to make it easy.
- getCursor(db: Optional[pymongo.database.Database] = None, db_pipeline: Optional[list] = None, db_collection_name: Optional[str] = None) pymongo.command_cursor.CommandCursor
Use aggregate pipeline to get a data-cursor from the database.
This cursor is what mongodb provides to allow you to request the data from the database in a manner you control, instead of just getting a big dump from the database.
- Parameters
db_pipeline (list of dicts) – Mongodb aggregate pipeline data to transform and retrieve the data as you request.
db_collection_name (str) – The collection name which we will pull data from using the aggregate pipeline.
db (pymongo.database.Database) – Database object to operate pipeline on.
- Returns
Command cursor to fetch the data with.
- Return type
pymongo.command_cursor.CommandCursor
- getFiles(db_batch_size: Optional[int] = None, db_data_cursor: Optional[pymongo.command_cursor.CommandCursor] = None, db_collection_name: Optional[str] = None, db: Optional[pymongo.database.Database] = None) list
Get gridfs files from mongodb by id using cursor to .files.
- Parameters
db_batch_size (integer) – The number of items to return in a single round.
db_data_cursor (command_cursor.CommandCursor) – The cursor to use to retrieve data from db.
db_collection_name (str) – The top level collecton name not including .chunks or .files where gridfs is to operate.
db (pymongo.database.Database) – Database object to operate pipeline on.
- Returns
yields a list of tuples containing (item requested, metadata).
- init(db_path: Optional[str] = None, db_log_path: Optional[str] = None, db_log_name: Optional[str] = None, db_config_path: Optional[str] = None) None
Initialise the database.
Includes ensuring db path and db log path exist and generating, creating the DB files, and adding an authentication user. All of this should be done on a localhost port so that the unprotected database is never exposed.
- Parameters
db_path (string) – Desired directory of MongoDB database files.
db_log_path (string) – Desired directory of MongoDB log files.
db_log_name (string) – Desired name of log file.
db_config_path (string) – Config file to pass to MongoDB.
- login(db_port: Optional[str] = None, db_user_name: Optional[str] = None, db_password: Optional[str] = None, db_name: Optional[str] = None, db_ip: Optional[str] = None) None
Log in to database, interrupt, and availiable via cli.
- Parameters
db_port (string) – Database port to connect to.
db_user_name (string) – Database user to authenticate as.
db_password (string) – User password to authenticate with.
db_name (string) – Database to authenticate to, the authentication db.
db_ip (string) – Database ip to connect to.
- start(db_ip: Optional[str] = None, db_port: Optional[str] = None, db_path: Optional[str] = None, db_log_path: Optional[str] = None, db_log_name: Optional[str] = None, db_cursor_timeout: Optional[int] = None, db_config_path: Optional[str] = None, db_replica_set_name: Optional[str] = None) subprocess.Popen
Launch an on machine database with authentication.
- Parameters
db_ip (list) – List of IPs to accept connectiongs from.
db_port (string) – Port desired for database.
db_path (string) – Path to parent dir of database.
db_log_path (string) – Path to parent dir of log files.
db_log_name (string) – Desired base name for log files.
db_cursor_timeout (integer) – Set timeout time for unused cursors.
db_path – Config file path to pass to MongoDB.
- Return type
subprocess.Popen
- Returns
Subprocess of running MongoDB.
- stop(db_path=None) subprocess.Popen
Stop a running local database.
- Parameters
db_path (string) – The path to the database to shut down.
- Returns
Subprocess of database closer.
- Return type
subprocess.Popen
- userAdd(username: str, password: str, roles: list) None
Take new credentials and create new user in database