Implement $bucket and $group on indexed values with sub-linear runtime

We noticed that sum $bucket and $group aggregations such as $min, $max, $count are unexpectedly slow even when fully covered by an index, (partially) because the DB scans through the entire index rather than employing optimization approaches such as binary search. An example pipeline that should return instantaneous but scans through the entire index (confirmed on v4.4 and v5): [ { $match: { status: "DELIVERED", }, }, { $group: { _id: { status: "$status", }, min: { $min: "$modify_time", }, }, }, ] with an index { status: 1, modify_time: 1} Another example is $bucket (same index): [ { $match: { status: "DELIVERED", }, }, { $bucket: { groupBy: "$creation_ts", boundaries: [ Date(0), ISODate("2020-01-01T00:00:30Z"), ISODate("2022-01-01T00:00:30Z"), ], default: "default", output: { count: { $sum: 1, }, status: { $first: "$status", }, }, }, }, ]

Attach files

Enter a subject

Please enter your email address

RELATED FEEDBACK

Implement $bucket and $group on indexed values with sub-linear runtime