Slow query performance: MongoDB with $lookup and $and/$or
I'm trying to find a way to create an engine that translates GraphQL query filters to MongoDB aggregations while keeping the performance. Our application has the requisite of limiting the results from collection A by applying filters to collection B, C and even D sometimes.
For better understanding, here's a sample about how a filter is translated to MongoDB.
This:
"filter":
"return": null,
"AND": [
"customer_WITH":
"OR": [
"code": "CUSTOMER NAME"
,
"commercialName_LIKE": "CUSTOMER NAME"
,
"corporateName_LIKE": "CUSTOMER NAME"
]
],
"OR": [
"dispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
,
"redispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
],
"reversal": null
Gets translated to this:
[
"$match":
"return":
"$eq": null
,
"reversal":
"$eq": null
,
"company":
"$eq": ObjectId("xxxxxxxxxxxxxxxxxxxxxxxx")
,
"$lookup":
"as": "dispatcher",
"from": "shippers",
"localField": "dispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "dispatcher.company",
"from": "companies",
"localField": "dispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher",
"from": "shippers",
"localField": "redispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher.company",
"from": "companies",
"localField": "redispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "customer",
"from": "customers",
"localField": "customer",
"foreignField": "_id"
,
"$match":
"$or": [
"dispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
,
"redispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
],
"$and": [
"$or": [
"customer.code":
"$eq": "CUSTOMER NAME"
,
"customer.commercialName":
"$regex": /CUSTOMERsNAME/
,
"customer.corporateName":
"$regex": /CUSTOMERsNAME/
]
]
,
"$unwind":
"path": "$customer",
"preserveNullAndEmptyArrays": true
,
"$group":
"_id": "$invoiceNo",
"__rootId":
"$first": "$_id"
,
"company":
"$first": "$company"
,
"customer":
"$first": "$customer._id"
,
"dispatcher":
"$first": "$dispatcher._id"
,
"redispatcher":
"$first": "$redispatcher._id"
,
"driverPlate":
"$first": "$driverPlate"
,
"key":
"$first": "$key"
,
"activities":
"$first": "$activities"
,
"serialNo":
"$first": "$serialNo"
,
"invoiceNo":
"$first": "$invoiceNo"
,
"incidents":
"$first": "$incidents"
,
"deliveries":
"$first": "$deliveries"
,
"return":
"$first": "$return"
,
"$project":
"_id": "$__rootId",
"company": "$company",
"customer": "$customer",
"dispatcher": "$dispatcher",
"redispatcher": "$redispatcher",
"driverPlate": "$driverPlate",
"key": "$key",
"activities": "$activities",
"serialNo": "$serialNo",
"invoiceNo": "$invoiceNo",
"incidents": "$incidents",
"deliveries": "$deliveries",
"return": "$return"
,
"$sort":
"invoiceNo": -1
,
"$limit": 51
]
The engine is smart enough to reallocate to the first position $match properties that don't require $lookups and right after $lookups if they do, however if they are within a $and/$or condition block, then they are reallocated after the last $lookup, regardless of what properties are there.
I could scan for what is used inside the $and and deconstruct it into new reallocated $match phases, but I need to figure how to handle the $or operator: I can't apply the same desconstruction idea on it because this would invalidate the condition.
So my question is: Is there an alternative way to use the phase $lookup along with $and/$or and improve the performance drastically?
Creating more indexes won't help because they're not used for the $lookup. Moving up $match phases, as the MongoDB team would suggest is also not possible because it would break the conditions. So I'm out of ideas now.
Best regards.
javascript mongodb performance graphql gql
add a comment |
I'm trying to find a way to create an engine that translates GraphQL query filters to MongoDB aggregations while keeping the performance. Our application has the requisite of limiting the results from collection A by applying filters to collection B, C and even D sometimes.
For better understanding, here's a sample about how a filter is translated to MongoDB.
This:
"filter":
"return": null,
"AND": [
"customer_WITH":
"OR": [
"code": "CUSTOMER NAME"
,
"commercialName_LIKE": "CUSTOMER NAME"
,
"corporateName_LIKE": "CUSTOMER NAME"
]
],
"OR": [
"dispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
,
"redispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
],
"reversal": null
Gets translated to this:
[
"$match":
"return":
"$eq": null
,
"reversal":
"$eq": null
,
"company":
"$eq": ObjectId("xxxxxxxxxxxxxxxxxxxxxxxx")
,
"$lookup":
"as": "dispatcher",
"from": "shippers",
"localField": "dispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "dispatcher.company",
"from": "companies",
"localField": "dispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher",
"from": "shippers",
"localField": "redispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher.company",
"from": "companies",
"localField": "redispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "customer",
"from": "customers",
"localField": "customer",
"foreignField": "_id"
,
"$match":
"$or": [
"dispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
,
"redispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
],
"$and": [
"$or": [
"customer.code":
"$eq": "CUSTOMER NAME"
,
"customer.commercialName":
"$regex": /CUSTOMERsNAME/
,
"customer.corporateName":
"$regex": /CUSTOMERsNAME/
]
]
,
"$unwind":
"path": "$customer",
"preserveNullAndEmptyArrays": true
,
"$group":
"_id": "$invoiceNo",
"__rootId":
"$first": "$_id"
,
"company":
"$first": "$company"
,
"customer":
"$first": "$customer._id"
,
"dispatcher":
"$first": "$dispatcher._id"
,
"redispatcher":
"$first": "$redispatcher._id"
,
"driverPlate":
"$first": "$driverPlate"
,
"key":
"$first": "$key"
,
"activities":
"$first": "$activities"
,
"serialNo":
"$first": "$serialNo"
,
"invoiceNo":
"$first": "$invoiceNo"
,
"incidents":
"$first": "$incidents"
,
"deliveries":
"$first": "$deliveries"
,
"return":
"$first": "$return"
,
"$project":
"_id": "$__rootId",
"company": "$company",
"customer": "$customer",
"dispatcher": "$dispatcher",
"redispatcher": "$redispatcher",
"driverPlate": "$driverPlate",
"key": "$key",
"activities": "$activities",
"serialNo": "$serialNo",
"invoiceNo": "$invoiceNo",
"incidents": "$incidents",
"deliveries": "$deliveries",
"return": "$return"
,
"$sort":
"invoiceNo": -1
,
"$limit": 51
]
The engine is smart enough to reallocate to the first position $match properties that don't require $lookups and right after $lookups if they do, however if they are within a $and/$or condition block, then they are reallocated after the last $lookup, regardless of what properties are there.
I could scan for what is used inside the $and and deconstruct it into new reallocated $match phases, but I need to figure how to handle the $or operator: I can't apply the same desconstruction idea on it because this would invalidate the condition.
So my question is: Is there an alternative way to use the phase $lookup along with $and/$or and improve the performance drastically?
Creating more indexes won't help because they're not used for the $lookup. Moving up $match phases, as the MongoDB team would suggest is also not possible because it would break the conditions. So I'm out of ideas now.
Best regards.
javascript mongodb performance graphql gql
Seems from the content of the question that you already asked this question elsewhere, being either MongoDB support or the MongoDB mailing list. I'm not sure why you would be expecting a different response here. Ideally the$match
block(s) ( other than the initial one ) would be directly after each$unwind
, however it's an$or
condition over multiple sub-document paths ( now linked and extracted ). Bottom line is that this structure probably should not be denormalized into separate collections in the first place. The modelling appears very relational, and that's very wrong.
– Neil Lunn
Nov 12 '18 at 20:46
Actually not, but I did my fair amount of research before posting it here, but I still didn't noticed any question that would fit 100% my criteria. Now about the relationships: if you're talking about how I should use embedded documents would that improve the performance greatly? If you say so I could run a function that overwrites each ObjectId reference with the document itself. That would take a bit of time and testing but I don't think it would be impossible to be worked out. However there are some few references that should be kept to prevent data duplication.
– Rubens Felipe
Nov 13 '18 at 0:10
Would that improve performance greatly? Well most likely so considering the data would already be in one collection and sort of "pre-joined" as a simple way of putting it. Point here is that if you use MongoDB just the same as an RDBMS then A. There was probably little point in not using an RDBMS. 2. You just end up with the same performance problems or indeed worse, considering the thing is not really designed to be "relational" in the first place.$lookup
is just "a way to get around things", and should not be used as a "central design point". This seems your basic problem.
– Neil Lunn
Nov 13 '18 at 0:15
Of course "How do you measure such performance?", well you test of course, and keep testing until you see the structure which actually gives the best performance/functionality split. Which is actually why you use something other than an RDBMS in the first place.
– Neil Lunn
Nov 13 '18 at 0:17
add a comment |
I'm trying to find a way to create an engine that translates GraphQL query filters to MongoDB aggregations while keeping the performance. Our application has the requisite of limiting the results from collection A by applying filters to collection B, C and even D sometimes.
For better understanding, here's a sample about how a filter is translated to MongoDB.
This:
"filter":
"return": null,
"AND": [
"customer_WITH":
"OR": [
"code": "CUSTOMER NAME"
,
"commercialName_LIKE": "CUSTOMER NAME"
,
"corporateName_LIKE": "CUSTOMER NAME"
]
],
"OR": [
"dispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
,
"redispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
],
"reversal": null
Gets translated to this:
[
"$match":
"return":
"$eq": null
,
"reversal":
"$eq": null
,
"company":
"$eq": ObjectId("xxxxxxxxxxxxxxxxxxxxxxxx")
,
"$lookup":
"as": "dispatcher",
"from": "shippers",
"localField": "dispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "dispatcher.company",
"from": "companies",
"localField": "dispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher",
"from": "shippers",
"localField": "redispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher.company",
"from": "companies",
"localField": "redispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "customer",
"from": "customers",
"localField": "customer",
"foreignField": "_id"
,
"$match":
"$or": [
"dispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
,
"redispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
],
"$and": [
"$or": [
"customer.code":
"$eq": "CUSTOMER NAME"
,
"customer.commercialName":
"$regex": /CUSTOMERsNAME/
,
"customer.corporateName":
"$regex": /CUSTOMERsNAME/
]
]
,
"$unwind":
"path": "$customer",
"preserveNullAndEmptyArrays": true
,
"$group":
"_id": "$invoiceNo",
"__rootId":
"$first": "$_id"
,
"company":
"$first": "$company"
,
"customer":
"$first": "$customer._id"
,
"dispatcher":
"$first": "$dispatcher._id"
,
"redispatcher":
"$first": "$redispatcher._id"
,
"driverPlate":
"$first": "$driverPlate"
,
"key":
"$first": "$key"
,
"activities":
"$first": "$activities"
,
"serialNo":
"$first": "$serialNo"
,
"invoiceNo":
"$first": "$invoiceNo"
,
"incidents":
"$first": "$incidents"
,
"deliveries":
"$first": "$deliveries"
,
"return":
"$first": "$return"
,
"$project":
"_id": "$__rootId",
"company": "$company",
"customer": "$customer",
"dispatcher": "$dispatcher",
"redispatcher": "$redispatcher",
"driverPlate": "$driverPlate",
"key": "$key",
"activities": "$activities",
"serialNo": "$serialNo",
"invoiceNo": "$invoiceNo",
"incidents": "$incidents",
"deliveries": "$deliveries",
"return": "$return"
,
"$sort":
"invoiceNo": -1
,
"$limit": 51
]
The engine is smart enough to reallocate to the first position $match properties that don't require $lookups and right after $lookups if they do, however if they are within a $and/$or condition block, then they are reallocated after the last $lookup, regardless of what properties are there.
I could scan for what is used inside the $and and deconstruct it into new reallocated $match phases, but I need to figure how to handle the $or operator: I can't apply the same desconstruction idea on it because this would invalidate the condition.
So my question is: Is there an alternative way to use the phase $lookup along with $and/$or and improve the performance drastically?
Creating more indexes won't help because they're not used for the $lookup. Moving up $match phases, as the MongoDB team would suggest is also not possible because it would break the conditions. So I'm out of ideas now.
Best regards.
javascript mongodb performance graphql gql
I'm trying to find a way to create an engine that translates GraphQL query filters to MongoDB aggregations while keeping the performance. Our application has the requisite of limiting the results from collection A by applying filters to collection B, C and even D sometimes.
For better understanding, here's a sample about how a filter is translated to MongoDB.
This:
"filter":
"return": null,
"AND": [
"customer_WITH":
"OR": [
"code": "CUSTOMER NAME"
,
"commercialName_LIKE": "CUSTOMER NAME"
,
"corporateName_LIKE": "CUSTOMER NAME"
]
],
"OR": [
"dispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
,
"redispatcher_WITH":
"company_WITH":
"corporateName_LIKE": "COMPANY NAME"
],
"reversal": null
Gets translated to this:
[
"$match":
"return":
"$eq": null
,
"reversal":
"$eq": null
,
"company":
"$eq": ObjectId("xxxxxxxxxxxxxxxxxxxxxxxx")
,
"$lookup":
"as": "dispatcher",
"from": "shippers",
"localField": "dispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "dispatcher.company",
"from": "companies",
"localField": "dispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$dispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher",
"from": "shippers",
"localField": "redispatcher",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "redispatcher.company",
"from": "companies",
"localField": "redispatcher.company",
"foreignField": "_id"
,
"$unwind":
"path": "$redispatcher.company",
"preserveNullAndEmptyArrays": true
,
"$lookup":
"as": "customer",
"from": "customers",
"localField": "customer",
"foreignField": "_id"
,
"$match":
"$or": [
"dispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
,
"redispatcher.company.corporateName":
"$regex": /sCOMPANYsNAME/
],
"$and": [
"$or": [
"customer.code":
"$eq": "CUSTOMER NAME"
,
"customer.commercialName":
"$regex": /CUSTOMERsNAME/
,
"customer.corporateName":
"$regex": /CUSTOMERsNAME/
]
]
,
"$unwind":
"path": "$customer",
"preserveNullAndEmptyArrays": true
,
"$group":
"_id": "$invoiceNo",
"__rootId":
"$first": "$_id"
,
"company":
"$first": "$company"
,
"customer":
"$first": "$customer._id"
,
"dispatcher":
"$first": "$dispatcher._id"
,
"redispatcher":
"$first": "$redispatcher._id"
,
"driverPlate":
"$first": "$driverPlate"
,
"key":
"$first": "$key"
,
"activities":
"$first": "$activities"
,
"serialNo":
"$first": "$serialNo"
,
"invoiceNo":
"$first": "$invoiceNo"
,
"incidents":
"$first": "$incidents"
,
"deliveries":
"$first": "$deliveries"
,
"return":
"$first": "$return"
,
"$project":
"_id": "$__rootId",
"company": "$company",
"customer": "$customer",
"dispatcher": "$dispatcher",
"redispatcher": "$redispatcher",
"driverPlate": "$driverPlate",
"key": "$key",
"activities": "$activities",
"serialNo": "$serialNo",
"invoiceNo": "$invoiceNo",
"incidents": "$incidents",
"deliveries": "$deliveries",
"return": "$return"
,
"$sort":
"invoiceNo": -1
,
"$limit": 51
]
The engine is smart enough to reallocate to the first position $match properties that don't require $lookups and right after $lookups if they do, however if they are within a $and/$or condition block, then they are reallocated after the last $lookup, regardless of what properties are there.
I could scan for what is used inside the $and and deconstruct it into new reallocated $match phases, but I need to figure how to handle the $or operator: I can't apply the same desconstruction idea on it because this would invalidate the condition.
So my question is: Is there an alternative way to use the phase $lookup along with $and/$or and improve the performance drastically?
Creating more indexes won't help because they're not used for the $lookup. Moving up $match phases, as the MongoDB team would suggest is also not possible because it would break the conditions. So I'm out of ideas now.
Best regards.
javascript mongodb performance graphql gql
javascript mongodb performance graphql gql
edited Nov 12 '18 at 18:00
Rubens Felipe
asked Nov 12 '18 at 17:40
Rubens FelipeRubens Felipe
112
112
Seems from the content of the question that you already asked this question elsewhere, being either MongoDB support or the MongoDB mailing list. I'm not sure why you would be expecting a different response here. Ideally the$match
block(s) ( other than the initial one ) would be directly after each$unwind
, however it's an$or
condition over multiple sub-document paths ( now linked and extracted ). Bottom line is that this structure probably should not be denormalized into separate collections in the first place. The modelling appears very relational, and that's very wrong.
– Neil Lunn
Nov 12 '18 at 20:46
Actually not, but I did my fair amount of research before posting it here, but I still didn't noticed any question that would fit 100% my criteria. Now about the relationships: if you're talking about how I should use embedded documents would that improve the performance greatly? If you say so I could run a function that overwrites each ObjectId reference with the document itself. That would take a bit of time and testing but I don't think it would be impossible to be worked out. However there are some few references that should be kept to prevent data duplication.
– Rubens Felipe
Nov 13 '18 at 0:10
Would that improve performance greatly? Well most likely so considering the data would already be in one collection and sort of "pre-joined" as a simple way of putting it. Point here is that if you use MongoDB just the same as an RDBMS then A. There was probably little point in not using an RDBMS. 2. You just end up with the same performance problems or indeed worse, considering the thing is not really designed to be "relational" in the first place.$lookup
is just "a way to get around things", and should not be used as a "central design point". This seems your basic problem.
– Neil Lunn
Nov 13 '18 at 0:15
Of course "How do you measure such performance?", well you test of course, and keep testing until you see the structure which actually gives the best performance/functionality split. Which is actually why you use something other than an RDBMS in the first place.
– Neil Lunn
Nov 13 '18 at 0:17
add a comment |
Seems from the content of the question that you already asked this question elsewhere, being either MongoDB support or the MongoDB mailing list. I'm not sure why you would be expecting a different response here. Ideally the$match
block(s) ( other than the initial one ) would be directly after each$unwind
, however it's an$or
condition over multiple sub-document paths ( now linked and extracted ). Bottom line is that this structure probably should not be denormalized into separate collections in the first place. The modelling appears very relational, and that's very wrong.
– Neil Lunn
Nov 12 '18 at 20:46
Actually not, but I did my fair amount of research before posting it here, but I still didn't noticed any question that would fit 100% my criteria. Now about the relationships: if you're talking about how I should use embedded documents would that improve the performance greatly? If you say so I could run a function that overwrites each ObjectId reference with the document itself. That would take a bit of time and testing but I don't think it would be impossible to be worked out. However there are some few references that should be kept to prevent data duplication.
– Rubens Felipe
Nov 13 '18 at 0:10
Would that improve performance greatly? Well most likely so considering the data would already be in one collection and sort of "pre-joined" as a simple way of putting it. Point here is that if you use MongoDB just the same as an RDBMS then A. There was probably little point in not using an RDBMS. 2. You just end up with the same performance problems or indeed worse, considering the thing is not really designed to be "relational" in the first place.$lookup
is just "a way to get around things", and should not be used as a "central design point". This seems your basic problem.
– Neil Lunn
Nov 13 '18 at 0:15
Of course "How do you measure such performance?", well you test of course, and keep testing until you see the structure which actually gives the best performance/functionality split. Which is actually why you use something other than an RDBMS in the first place.
– Neil Lunn
Nov 13 '18 at 0:17
Seems from the content of the question that you already asked this question elsewhere, being either MongoDB support or the MongoDB mailing list. I'm not sure why you would be expecting a different response here. Ideally the
$match
block(s) ( other than the initial one ) would be directly after each $unwind
, however it's an $or
condition over multiple sub-document paths ( now linked and extracted ). Bottom line is that this structure probably should not be denormalized into separate collections in the first place. The modelling appears very relational, and that's very wrong.– Neil Lunn
Nov 12 '18 at 20:46
Seems from the content of the question that you already asked this question elsewhere, being either MongoDB support or the MongoDB mailing list. I'm not sure why you would be expecting a different response here. Ideally the
$match
block(s) ( other than the initial one ) would be directly after each $unwind
, however it's an $or
condition over multiple sub-document paths ( now linked and extracted ). Bottom line is that this structure probably should not be denormalized into separate collections in the first place. The modelling appears very relational, and that's very wrong.– Neil Lunn
Nov 12 '18 at 20:46
Actually not, but I did my fair amount of research before posting it here, but I still didn't noticed any question that would fit 100% my criteria. Now about the relationships: if you're talking about how I should use embedded documents would that improve the performance greatly? If you say so I could run a function that overwrites each ObjectId reference with the document itself. That would take a bit of time and testing but I don't think it would be impossible to be worked out. However there are some few references that should be kept to prevent data duplication.
– Rubens Felipe
Nov 13 '18 at 0:10
Actually not, but I did my fair amount of research before posting it here, but I still didn't noticed any question that would fit 100% my criteria. Now about the relationships: if you're talking about how I should use embedded documents would that improve the performance greatly? If you say so I could run a function that overwrites each ObjectId reference with the document itself. That would take a bit of time and testing but I don't think it would be impossible to be worked out. However there are some few references that should be kept to prevent data duplication.
– Rubens Felipe
Nov 13 '18 at 0:10
Would that improve performance greatly? Well most likely so considering the data would already be in one collection and sort of "pre-joined" as a simple way of putting it. Point here is that if you use MongoDB just the same as an RDBMS then A. There was probably little point in not using an RDBMS. 2. You just end up with the same performance problems or indeed worse, considering the thing is not really designed to be "relational" in the first place.
$lookup
is just "a way to get around things", and should not be used as a "central design point". This seems your basic problem.– Neil Lunn
Nov 13 '18 at 0:15
Would that improve performance greatly? Well most likely so considering the data would already be in one collection and sort of "pre-joined" as a simple way of putting it. Point here is that if you use MongoDB just the same as an RDBMS then A. There was probably little point in not using an RDBMS. 2. You just end up with the same performance problems or indeed worse, considering the thing is not really designed to be "relational" in the first place.
$lookup
is just "a way to get around things", and should not be used as a "central design point". This seems your basic problem.– Neil Lunn
Nov 13 '18 at 0:15
Of course "How do you measure such performance?", well you test of course, and keep testing until you see the structure which actually gives the best performance/functionality split. Which is actually why you use something other than an RDBMS in the first place.
– Neil Lunn
Nov 13 '18 at 0:17
Of course "How do you measure such performance?", well you test of course, and keep testing until you see the structure which actually gives the best performance/functionality split. Which is actually why you use something other than an RDBMS in the first place.
– Neil Lunn
Nov 13 '18 at 0:17
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267388%2fslow-query-performance-mongodb-with-lookup-and-and-or%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267388%2fslow-query-performance-mongodb-with-lookup-and-and-or%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Seems from the content of the question that you already asked this question elsewhere, being either MongoDB support or the MongoDB mailing list. I'm not sure why you would be expecting a different response here. Ideally the
$match
block(s) ( other than the initial one ) would be directly after each$unwind
, however it's an$or
condition over multiple sub-document paths ( now linked and extracted ). Bottom line is that this structure probably should not be denormalized into separate collections in the first place. The modelling appears very relational, and that's very wrong.– Neil Lunn
Nov 12 '18 at 20:46
Actually not, but I did my fair amount of research before posting it here, but I still didn't noticed any question that would fit 100% my criteria. Now about the relationships: if you're talking about how I should use embedded documents would that improve the performance greatly? If you say so I could run a function that overwrites each ObjectId reference with the document itself. That would take a bit of time and testing but I don't think it would be impossible to be worked out. However there are some few references that should be kept to prevent data duplication.
– Rubens Felipe
Nov 13 '18 at 0:10
Would that improve performance greatly? Well most likely so considering the data would already be in one collection and sort of "pre-joined" as a simple way of putting it. Point here is that if you use MongoDB just the same as an RDBMS then A. There was probably little point in not using an RDBMS. 2. You just end up with the same performance problems or indeed worse, considering the thing is not really designed to be "relational" in the first place.
$lookup
is just "a way to get around things", and should not be used as a "central design point". This seems your basic problem.– Neil Lunn
Nov 13 '18 at 0:15
Of course "How do you measure such performance?", well you test of course, and keep testing until you see the structure which actually gives the best performance/functionality split. Which is actually why you use something other than an RDBMS in the first place.
– Neil Lunn
Nov 13 '18 at 0:17