Skip to Main Content

MongoByte MongoDB Logo

Welcome to the new MongoDB Feedback Portal!

{Improvement: "Your idea"}
We’ve upgraded our system to better capture and act on your feedback.
Your feedback is meaningful and helps us build better products.

Status Submitted
Categories Data Federation
Created by Guest
Created on Feb 7, 2024

Simplified JSON support for $out to S3

The ability to $out to S3 from a federated database instance is a game-changer for those working with their own data warehouses and data lakes. One improvement that would make it better would be to support simplified JSON for json exports. Currently, $out uses extended json v2, which may not be compatible for systems reading from the destination S3 bucket, which require simplified JSON (which aligns with other tools like kafka source connector). Technically, it is possible to make this conversion yourself with clever use of the $toString aggregation pipeline operator in stages preceding $out. However there are several challenges: + Increased computation time + The more general a solution is needed (ie--in cases where you don't know/cannot make assumptions about the schema), the more complex the aggregation stages become. One such solution would be to $objectToArray the document, $map over the resulting array, converting the v field conditionally, then $arrayToObject back and $replaceRoot to recompose the document. This is already complex enough for most MongoDB users; handling nested arrays and objects makes it vastly more complex.
  • Guest
    Feb 7, 2024
    Rather than MongoDB users writing their own solution to convert the extended v2 format (extended v2 format is not something which is natively supported by other tools for direct consumption) into a simplified format, it would be easy and simple for the mongo users if mongo natively supports the simplified json format along with the other supported formats like ext v2, csv, parquet etc., in the $Out Custom conversation using aggregation pipeline gets really complicated with 1. Complex hierarchical structured documents 2. Collections which has documents with flexible schemas 3. Solution has to be rebuild for every different collections with different schema structure