MongoDf

Github Publish Status PyPi Documentation

A mongoDB to pandas DataFrame converter with a pandas filter style.

Example:

import mongodf

# create a dataframe from a mongoDB collection
df = mongodf.from_mongo("mongodb://mongo:27017", "DB", "Collection")

# filter values
df = df[(df["colA"] == "Test") & (df.ColB.isin([1, 2]))]

# filter columns
df = df[["colA", "colC"]]

# compute a pandas.DataFrame
df.compute()

Output:

|   | colA  | colC |
|---| ----- | ---- |
|0  | Test  |  NaN |
|1  | Test  |   12 |
mongodf.from_mongo(host, database, collection, columns=None, filter={}, array_expand=True, cached_meta=True, dict_expand_level=0, meta_collection=None, show_id=False)[source]

Fetch data from a MongoDB collection and return it as a DataFrame-like object.

Parameters:

hoststr

The MongoDB host address.

databasestr

The name of the MongoDB database.

collectionstr

The name of the MongoDB collection.

columnslist, optional

A list of column names to include in the result. If None, columns will be inferred.

filterdict, optional

A mongodf.Filter class. If None, no filter will be applied

array_expandbool, optional

Whether to expand arrays found in the documents into separate rows. Default is True.

cached_metabool, optional

Whether to use cached metadata for inferring columns. Default is True.

dict_expand_levelint, optional

The level of dictionary expansion to perform. Default is 0.

meta_collectionstr, optional

The name of the collection to use for cached metadata. If None, defaults to ‘__<collection>_meta’.

Returns:

DataFrame

A DataFrame-like object containing the data from the MongoDB collection.

Notes:

If cached_meta is True and columns is None, the function will attempt to retrieve column names from a meta collection (either specified by meta_collection or defaulting to ‘__<collection>_meta’). If no columns are found in the meta collection, it will then infer the columns by analyzing the collection’s documents. The dict_expand_level parameter controls how deeply nested dictionaries are expanded into separate columns.

Indices and tables