Elsaticsearch 6.3.1 provides different results on cloud and local despite using dfs_query_then_fetch. Query using python's elasticsearch package










0














I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:



es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ


The problem is when I run it on cloud my output is totally different.



I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.



Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux



Any help would be really appreciated.










share|improve this question





















  • Same number of shards for the index locally and on cloud?
    – Russ Cam
    Nov 12 '18 at 10:18










  • Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
    – Sagar Dawda
    Nov 12 '18 at 10:33















0














I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:



es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ


The problem is when I run it on cloud my output is totally different.



I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.



Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux



Any help would be really appreciated.










share|improve this question





















  • Same number of shards for the index locally and on cloud?
    – Russ Cam
    Nov 12 '18 at 10:18










  • Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
    – Sagar Dawda
    Nov 12 '18 at 10:33













0












0








0







I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:



es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ


The problem is when I run it on cloud my output is totally different.



I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.



Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux



Any help would be really appreciated.










share|improve this question













I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:



es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ


The problem is when I run it on cloud my output is totally different.



I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.



Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux



Any help would be really appreciated.







elasticsearch amazon-emr elasticsearch-py






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 12 '18 at 7:05









Sagar DawdaSagar Dawda

44529




44529











  • Same number of shards for the index locally and on cloud?
    – Russ Cam
    Nov 12 '18 at 10:18










  • Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
    – Sagar Dawda
    Nov 12 '18 at 10:33
















  • Same number of shards for the index locally and on cloud?
    – Russ Cam
    Nov 12 '18 at 10:18










  • Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
    – Sagar Dawda
    Nov 12 '18 at 10:33















Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 '18 at 10:18




Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 '18 at 10:18












Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 '18 at 10:33




Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 '18 at 10:33












2 Answers
2






active

oldest

votes


















0














For those who responded to my questions, thanks for the efforts.



I figured out what the problem was.



There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.



Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.



Hope this would be helpful for those running elasticsearch on Amazon EMR.



Cheers!






share|improve this answer




























    0














    Try using the "preference" parameter while querying the data. Something like this:



    es.search(index="myindex",
    body="query": "match": "text_field": "search_term",
    preference="_primary_first"
    )


    Update:
    Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0






    share|improve this answer






















    • Will run this query and let you know how it goes.
      – Sagar Dawda
      Nov 12 '18 at 10:32










    • This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
      – Sagar Dawda
      Nov 12 '18 at 12:29










    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257307%2felsaticsearch-6-3-1-provides-different-results-on-cloud-and-local-despite-using%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    For those who responded to my questions, thanks for the efforts.



    I figured out what the problem was.



    There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.



    Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.



    Hope this would be helpful for those running elasticsearch on Amazon EMR.



    Cheers!






    share|improve this answer

























      0














      For those who responded to my questions, thanks for the efforts.



      I figured out what the problem was.



      There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.



      Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.



      Hope this would be helpful for those running elasticsearch on Amazon EMR.



      Cheers!






      share|improve this answer























        0












        0








        0






        For those who responded to my questions, thanks for the efforts.



        I figured out what the problem was.



        There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.



        Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.



        Hope this would be helpful for those running elasticsearch on Amazon EMR.



        Cheers!






        share|improve this answer












        For those who responded to my questions, thanks for the efforts.



        I figured out what the problem was.



        There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.



        Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.



        Hope this would be helpful for those running elasticsearch on Amazon EMR.



        Cheers!







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 '18 at 18:19









        Sagar DawdaSagar Dawda

        44529




        44529























            0














            Try using the "preference" parameter while querying the data. Something like this:



            es.search(index="myindex",
            body="query": "match": "text_field": "search_term",
            preference="_primary_first"
            )


            Update:
            Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0






            share|improve this answer






















            • Will run this query and let you know how it goes.
              – Sagar Dawda
              Nov 12 '18 at 10:32










            • This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
              – Sagar Dawda
              Nov 12 '18 at 12:29















            0














            Try using the "preference" parameter while querying the data. Something like this:



            es.search(index="myindex",
            body="query": "match": "text_field": "search_term",
            preference="_primary_first"
            )


            Update:
            Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0






            share|improve this answer






















            • Will run this query and let you know how it goes.
              – Sagar Dawda
              Nov 12 '18 at 10:32










            • This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
              – Sagar Dawda
              Nov 12 '18 at 12:29













            0












            0








            0






            Try using the "preference" parameter while querying the data. Something like this:



            es.search(index="myindex",
            body="query": "match": "text_field": "search_term",
            preference="_primary_first"
            )


            Update:
            Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0






            share|improve this answer














            Try using the "preference" parameter while querying the data. Something like this:



            es.search(index="myindex",
            body="query": "match": "text_field": "search_term",
            preference="_primary_first"
            )


            Update:
            Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 13 '18 at 7:10

























            answered Nov 12 '18 at 10:14









            Abhilash BollaAbhilash Bolla

            548




            548











            • Will run this query and let you know how it goes.
              – Sagar Dawda
              Nov 12 '18 at 10:32










            • This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
              – Sagar Dawda
              Nov 12 '18 at 12:29
















            • Will run this query and let you know how it goes.
              – Sagar Dawda
              Nov 12 '18 at 10:32










            • This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
              – Sagar Dawda
              Nov 12 '18 at 12:29















            Will run this query and let you know how it goes.
            – Sagar Dawda
            Nov 12 '18 at 10:32




            Will run this query and let you know how it goes.
            – Sagar Dawda
            Nov 12 '18 at 10:32












            This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
            – Sagar Dawda
            Nov 12 '18 at 12:29




            This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
            – Sagar Dawda
            Nov 12 '18 at 12:29

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257307%2felsaticsearch-6-3-1-provides-different-results-on-cloud-and-local-despite-using%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Use pre created SQLite database for Android project in kotlin

            Darth Vader #20

            Ondo