Query writing performance on neo4j with py2neo










2















Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.



I tried multiple ways to solve the issue right now. The best working approach for me was the following one:



from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
t.run(q)
if idx % 100 == 0:
t.commit()
t = graph.begin(autocommit=False)
t.commit()


It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?



-- Edit: Additional information:



I'm using Neo4j 3.4, Py2neo v4 and Python 3.7










share|improve this question
























  • Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.

    – jacob.mccrumb
    Nov 13 '18 at 19:32











  • I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.

    – Bierbarbar
    Nov 14 '18 at 6:45











  • Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)

    – jacob.mccrumb
    Nov 14 '18 at 14:38






  • 1





    Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.

    – Bierbarbar
    Nov 15 '18 at 9:33






  • 1





    Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.

    – jacob.mccrumb
    Nov 15 '18 at 22:04















2















Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.



I tried multiple ways to solve the issue right now. The best working approach for me was the following one:



from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
t.run(q)
if idx % 100 == 0:
t.commit()
t = graph.begin(autocommit=False)
t.commit()


It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?



-- Edit: Additional information:



I'm using Neo4j 3.4, Py2neo v4 and Python 3.7










share|improve this question
























  • Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.

    – jacob.mccrumb
    Nov 13 '18 at 19:32











  • I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.

    – Bierbarbar
    Nov 14 '18 at 6:45











  • Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)

    – jacob.mccrumb
    Nov 14 '18 at 14:38






  • 1





    Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.

    – Bierbarbar
    Nov 15 '18 at 9:33






  • 1





    Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.

    – jacob.mccrumb
    Nov 15 '18 at 22:04













2












2








2








Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.



I tried multiple ways to solve the issue right now. The best working approach for me was the following one:



from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
t.run(q)
if idx % 100 == 0:
t.commit()
t = graph.begin(autocommit=False)
t.commit()


It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?



-- Edit: Additional information:



I'm using Neo4j 3.4, Py2neo v4 and Python 3.7










share|improve this question
















Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.



I tried multiple ways to solve the issue right now. The best working approach for me was the following one:



from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
t.run(q)
if idx % 100 == 0:
t.commit()
t = graph.begin(autocommit=False)
t.commit()


It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?



-- Edit: Additional information:



I'm using Neo4j 3.4, Py2neo v4 and Python 3.7







python neo4j graph-databases py2neo






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 9:35







Bierbarbar

















asked Nov 12 '18 at 11:54









BierbarbarBierbarbar

751420




751420












  • Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.

    – jacob.mccrumb
    Nov 13 '18 at 19:32











  • I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.

    – Bierbarbar
    Nov 14 '18 at 6:45











  • Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)

    – jacob.mccrumb
    Nov 14 '18 at 14:38






  • 1





    Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.

    – Bierbarbar
    Nov 15 '18 at 9:33






  • 1





    Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.

    – jacob.mccrumb
    Nov 15 '18 at 22:04

















  • Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.

    – jacob.mccrumb
    Nov 13 '18 at 19:32











  • I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.

    – Bierbarbar
    Nov 14 '18 at 6:45











  • Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)

    – jacob.mccrumb
    Nov 14 '18 at 14:38






  • 1





    Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.

    – Bierbarbar
    Nov 15 '18 at 9:33






  • 1





    Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.

    – jacob.mccrumb
    Nov 15 '18 at 22:04
















Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.

– jacob.mccrumb
Nov 13 '18 at 19:32





Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.

– jacob.mccrumb
Nov 13 '18 at 19:32













I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.

– Bierbarbar
Nov 14 '18 at 6:45





I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.

– Bierbarbar
Nov 14 '18 at 6:45













Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)

– jacob.mccrumb
Nov 14 '18 at 14:38





Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)

– jacob.mccrumb
Nov 14 '18 at 14:38




1




1





Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.

– Bierbarbar
Nov 15 '18 at 9:33





Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.

– Bierbarbar
Nov 15 '18 at 9:33




1




1





Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.

– jacob.mccrumb
Nov 15 '18 at 22:04





Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.

– jacob.mccrumb
Nov 15 '18 at 22:04












1 Answer
1






active

oldest

votes


















2














You may want to read up on Michael Hunger's tips and tricks for fast batched updates.



The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.



There are supporting functions that can easily create lists for you, like range().



As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:



UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id


Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.






share|improve this answer























  • Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

    – Bierbarbar
    Nov 16 '18 at 7:46










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261625%2fquery-writing-performance-on-neo4j-with-py2neo%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You may want to read up on Michael Hunger's tips and tricks for fast batched updates.



The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.



There are supporting functions that can easily create lists for you, like range().



As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:



UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id


Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.






share|improve this answer























  • Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

    – Bierbarbar
    Nov 16 '18 at 7:46















2














You may want to read up on Michael Hunger's tips and tricks for fast batched updates.



The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.



There are supporting functions that can easily create lists for you, like range().



As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:



UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id


Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.






share|improve this answer























  • Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

    – Bierbarbar
    Nov 16 '18 at 7:46













2












2








2







You may want to read up on Michael Hunger's tips and tricks for fast batched updates.



The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.



There are supporting functions that can easily create lists for you, like range().



As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:



UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id


Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.






share|improve this answer













You may want to read up on Michael Hunger's tips and tricks for fast batched updates.



The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.



There are supporting functions that can easily create lists for you, like range().



As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:



UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id


Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 15 '18 at 10:18









InverseFalconInverseFalcon

18.5k21829




18.5k21829












  • Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

    – Bierbarbar
    Nov 16 '18 at 7:46

















  • Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

    – Bierbarbar
    Nov 16 '18 at 7:46
















Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

– Bierbarbar
Nov 16 '18 at 7:46





Thanks for your answer. I will try it next week and if it works i will mark it as accepted.

– Bierbarbar
Nov 16 '18 at 7:46

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261625%2fquery-writing-performance-on-neo4j-with-py2neo%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo