Simplified JSON support for $out to S3

The ability to $out to S3 from a federated database instance is a game-changer for those working with their own data warehouses and data lakes. One improvement that would make it better would be to support simplified JSON for json exports. Currently, $out uses extended json v2, which may not be compatible for systems reading from the destination S3 bucket, which require simplified JSON (which aligns with other tools like kafka source connector). Technically, it is possible to make this conversion yourself with clever use of the $toString aggregation pipeline operator in stages preceding $out. However there are several challenges: + Increased computation time + The more general a solution is needed (ie--in cases where you don't know/cannot make assumptions about the schema), the more complex the aggregation stages become. One such solution would be to $objectToArray the document, $map over the resulting array, converting the v field conditionally, then $arrayToObject back and $replaceRoot to recompose the document. This is already complex enough for most MongoDB users; handling nested arrays and objects makes it vastly more complex.

Post comment

Guest

Feb 7, 2024

Rather than MongoDB users writing their own solution to convert the extended v2 format (extended v2 format is not something which is natively supported by other tools for direct consumption) into a simplified format, it would be easy and simple for the mongo users if mongo natively supports the simplified json format along with the other supported formats like ext v2, csv, parquet etc., in the $Out Custom conversation using aggregation pipeline gets really complicated with 1. Complex hierarchical structured documents 2. Collections which has documents with flexible schemas 3. Solution has to be rebuild for every different collections with different schema structure

Reply
Hide replies
Like

Please enter your email address

RELATED FEEDBACK

Simplified JSON support for $out to S3