MongoDf
A mongoDB to pandas DataFrame converter with a pandas filter style.
Example:
import mongodf
# create a dataframe from a mongoDB collection
df = mongodf.from_mongo("mongodb://mongo:27017", "DB", "Collection")
# filter values
df = df[(df["colA"] == "Test") & (df.ColB.isin([1, 2]))]
# filter columns
df = df[["colA", "colC"]]
# compute a pandas.DataFrame
df.compute()
Output:
| | colA | colC |
|---| ----- | ---- |
|0 | Test | NaN |
|1 | Test | 12 |
- mongodf.from_mongo(host, database, collection, columns=None, filter={}, array_expand=True, cached_meta=True, dict_expand_level=0, meta_collection=None, show_id=False)[source]
Fetch data from a MongoDB collection and return it as a DataFrame-like object.
Parameters:
- hoststr
The MongoDB host address.
- databasestr
The name of the MongoDB database.
- collectionstr
The name of the MongoDB collection.
- columnslist, optional
A list of column names to include in the result. If None, columns will be inferred.
- filterdict, optional
A mongodf.Filter class. If None, no filter will be applied
- array_expandbool, optional
Whether to expand arrays found in the documents into separate rows. Default is True.
- cached_metabool, optional
Whether to use cached metadata for inferring columns. Default is True.
- dict_expand_levelint, optional
The level of dictionary expansion to perform. Default is 0.
- meta_collectionstr, optional
The name of the collection to use for cached metadata. If None, defaults to ‘__<collection>_meta’.
Returns:
- DataFrame
A DataFrame-like object containing the data from the MongoDB collection.
Notes:
If cached_meta is True and columns is None, the function will attempt to retrieve column names from a meta collection (either specified by meta_collection or defaulting to ‘__<collection>_meta’). If no columns are found in the meta collection, it will then infer the columns by analyzing the collection’s documents. The dict_expand_level parameter controls how deeply nested dictionaries are expanded into separate columns.