Summary

The HDX Python Database Library provides utilities for connecting to databases in a standard way including through an ssh tunnel if needed. It is built on top of SQLAlchemy and simplifies its setup.

Information

This library is part of the Humanitarian Data Exchange (HDX) project. If you have humanitarian related data, please upload your datasets to HDX.

The code for the library is here. The library has detailed API documentation which can be found in the menu at the top.

Connecting to databases through an ssh tunnel is only available if this library is installed with extra sshtunnel:

pip install hdx-python-database[sshtunnel]

The wait_for_postgresql function is only available if this library is installed with extra postgresql:

pip install hdx-python-database[postgresql]

Breaking changes

From 1.2.8, the sshtunnel dependency is optional.

From 1.2.7, default table names are no longer plural. The camel case class name is converted to snake case, for example MyTable becomes my_table.

From 1.2.3, Base must be chosen from hdx.database.no_timezone (db_has_tz=False: the default) or hdx.database.with_timezone (db_has_tz=True).

From 1.2.2, database datetime columns are assumed to be timezoneless unless db_has_tz is set to True.

From 1.2.1, wait_for_postgresql takes connection URI not database parameters, get_params_from_sqlalchemy_url renamed to get_params_from_connection_uri and moved to dburi module, get_sqlalchemy_url renamed to get_connection_uri and moved to dburi module. New function remove_driver_from_uri in dburi module. Parameter driver replaced by dialect+driver. Supports Python 3.8 and later.

From 1.1.2, the postgres module is renamed postgresql and function wait_for_postgres is renamed wait_for_postgresql.

Versions from 1.0.8 support Python 3.6 and later.

Versions from 1.0.6 no longer support Python 2.7.

Description of Utilities

Database

Your SQLAlchemy database tables must inherit from Base in hdx.database.no_timezone or hdx.database.with_timezone eg. :

from hdx.database.no_timezone import Base
class MyTable(Base):
    my_col: Mapped[int] = mapped_column(Integer, ForeignKey(MyTable2.col2), primary_key=True)

A default table name is set which can be overridden: it is the camel case class name to converted to snake case, for example MyTable becomes my_table.

Then a connection can be made to a database as follows including through an SSH tunnel (which requires installing hdx-python-database[sshtunnel]):

# Get SQLAlchemy session object given database parameters and
# if needed SSH parameters. If database is PostgreSQL, will poll
# till it is up.
from hdx.database import Database
with Database(database="db", host="1.2.3.4", username="user",
              password="pass", dialect="dialect", driver="driver",
              ssh_host="5.6.7.8", ssh_port=2222, ssh_username="sshuser",
              ssh_private_key="path_to_key", db_has_tz=True,
              reflect=False) as session:
    session.query(...)

db_has_tz which defaults to False indicates whether database datetime columns have timezones. If db_has_tz is True, use Base from hdx.database.with_timezone, otherwise use Base from hdx.database.no_timezone. If db_has_tz is False, conversion occurs between Python datetimes with timezones to timezoneless database columns.

If reflect (which defaults to False) is True, classes will be reflected from an existing database and the reflected classes are returned in a variable reflected_classes in the returned Session object. Note that type annotation maps don't work with reflection and hence db_has_tz will be ignored ie. there will be no conversion between Python datetimes with timezones to timezoneless database columns.

Connection URI

There are functions to handle converting from connection URIs to parameters and vice-versa as well as a function to remove the driver string from a connection URI that contains both dialect and driver.

# Extract dictionary of parameters from database connection URI
result = get_params_from_connection_uri(
    "postgresql+psycopg://myuser:mypass@myserver:1234/mydatabase"
)

# Build database connection URI from dictionary of parameters
params_pg = {
    "database": "mydatabase",
    "host": "myserver",
    "port": 1234,
    "username": "myuser",
    "password": "mypass",
    "dialect": "postgresql",
    "driver": "psycopg",
}
result = get_connection_uri(**params_pg)

db_uri_nd = remove_driver_from_uri(
    "postgresql+psycopg://myuser:mypass@myserver:1234/mydatabase"
)

Views

The method to make views described here is available in this library.

This allows creating views like this:

class DBOrgType(Base):
    __tablename__ = "org_type"

    code: Mapped[str] = mapped_column(String(32), primary_key=True)
    description: Mapped[str] = mapped_column(String(512), nullable=False)


org_type_view = view(
    name="org_type_view",
    metadata=Base.metadata,
    selectable=select(*DBOrgType.__table__.columns),
)

class DBTestDate(Base):
    test_date: Mapped[datetime] = mapped_column(primary_key=True)

date_view_params = {
    "name": "date_view",
    "metadata": Base.metadata,
    "selectable": select(*DBTestDate.__table__.columns),
}

date_view = build_view(date_view_params)

date_view = build_views([date_view_params])[0]

PostgreSQL specific

There is a PostgreSQL specific call that only returns when the PostgreSQL database is available:

# Wait until PostgreSQL is up
# Library should be installed with hdx-python-database[postgresql]
wait_for_postgresql("mydatabase", "myserver", 5432, "myuser", "mypass")