Can I issue a query rather than specify a table when using the BigQuery connector for Spark?









up vote
1
down vote

favorite
1












I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:



conf = 
# Input Parameters.
'mapred.bq.project.id': project,
'mapred.bq.gcs.bucket': bucket,
'mapred.bq.temp.gcs.path': input_directory,
'mapred.bq.input.project.id': 'publicdata',
'mapred.bq.input.dataset.id': 'samples',
'mapred.bq.input.table.id': 'shakespeare',


# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'

# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
'org.apache.hadoop.io.LongWritable',
'com.google.gson.JsonObject',
conf=conf)


copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?










share|improve this question

























    up vote
    1
    down vote

    favorite
    1












    I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:



    conf = 
    # Input Parameters.
    'mapred.bq.project.id': project,
    'mapred.bq.gcs.bucket': bucket,
    'mapred.bq.temp.gcs.path': input_directory,
    'mapred.bq.input.project.id': 'publicdata',
    'mapred.bq.input.dataset.id': 'samples',
    'mapred.bq.input.table.id': 'shakespeare',


    # Output Parameters.
    output_dataset = 'wordcount_dataset'
    output_table = 'wordcount_output'

    # Load data in from BigQuery.
    table_data = sc.newAPIHadoopRDD(
    'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
    'org.apache.hadoop.io.LongWritable',
    'com.google.gson.JsonObject',
    conf=conf)


    copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?










    share|improve this question























      up vote
      1
      down vote

      favorite
      1









      up vote
      1
      down vote

      favorite
      1






      1





      I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:



      conf = 
      # Input Parameters.
      'mapred.bq.project.id': project,
      'mapred.bq.gcs.bucket': bucket,
      'mapred.bq.temp.gcs.path': input_directory,
      'mapred.bq.input.project.id': 'publicdata',
      'mapred.bq.input.dataset.id': 'samples',
      'mapred.bq.input.table.id': 'shakespeare',


      # Output Parameters.
      output_dataset = 'wordcount_dataset'
      output_table = 'wordcount_output'

      # Load data in from BigQuery.
      table_data = sc.newAPIHadoopRDD(
      'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
      'org.apache.hadoop.io.LongWritable',
      'com.google.gson.JsonObject',
      conf=conf)


      copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?










      share|improve this question













      I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:



      conf = 
      # Input Parameters.
      'mapred.bq.project.id': project,
      'mapred.bq.gcs.bucket': bucket,
      'mapred.bq.temp.gcs.path': input_directory,
      'mapred.bq.input.project.id': 'publicdata',
      'mapred.bq.input.dataset.id': 'samples',
      'mapred.bq.input.table.id': 'shakespeare',


      # Output Parameters.
      output_dataset = 'wordcount_dataset'
      output_table = 'wordcount_output'

      # Load data in from BigQuery.
      table_data = sc.newAPIHadoopRDD(
      'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
      'org.apache.hadoop.io.LongWritable',
      'com.google.gson.JsonObject',
      conf=conf)


      copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?







      google-bigquery google-cloud-dataproc






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 9 at 13:49









      jamiet

      2,19812345




      2,19812345






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
          https://cloud.google.com/bigquery/docs/exporting-data
          https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract






          share|improve this answer




















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226978%2fcan-i-issue-a-query-rather-than-specify-a-table-when-using-the-bigquery-connecto%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
            https://cloud.google.com/bigquery/docs/exporting-data
            https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract






            share|improve this answer
























              up vote
              0
              down vote













              Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
              https://cloud.google.com/bigquery/docs/exporting-data
              https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract






              share|improve this answer






















                up vote
                0
                down vote










                up vote
                0
                down vote









                Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
                https://cloud.google.com/bigquery/docs/exporting-data
                https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract






                share|improve this answer












                Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
                https://cloud.google.com/bigquery/docs/exporting-data
                https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 9 at 15:27









                Igor Dvorzhak

                589313




                589313



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226978%2fcan-i-issue-a-query-rather-than-specify-a-table-when-using-the-bigquery-connecto%23new-answer', 'question_page');

                    );

                    Post as a guest














































































                    Popular posts from this blog

                    Darth Vader #20

                    How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

                    Ondo