Can I issue a query rather than specify a table when using the BigQuery connector for Spark?

up vote
1
down vote

favorite

I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:

conf = 
 # Input Parameters.
 'mapred.bq.project.id': project,
 'mapred.bq.gcs.bucket': bucket,
 'mapred.bq.temp.gcs.path': input_directory,
 'mapred.bq.input.project.id': 'publicdata',
 'mapred.bq.input.dataset.id': 'samples',
 'mapred.bq.input.table.id': 'shakespeare',


# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'

# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
 'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
 'org.apache.hadoop.io.LongWritable',
 'com.google.gson.JsonObject',
 conf=conf)

copies the entirety of the named table into input_directory. The table I need to extract data from contains >500m rows and I don't need all of those rows. Is there a way to instead issue a query (as opposed to specifying a table) so that I can copy a subset of the data from a table?

asked Nov 9 at 13:49

jamiet

2,19812345

add a comment |

up vote
1
down vote

favorite

I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:

conf = 
 # Input Parameters.
 'mapred.bq.project.id': project,
 'mapred.bq.gcs.bucket': bucket,
 'mapred.bq.temp.gcs.path': input_directory,
 'mapred.bq.input.project.id': 'publicdata',
 'mapred.bq.input.dataset.id': 'samples',
 'mapred.bq.input.table.id': 'shakespeare',


# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'

# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
 'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
 'org.apache.hadoop.io.LongWritable',
 'com.google.gson.JsonObject',
 conf=conf)

asked Nov 9 at 13:49

jamiet

2,19812345

add a comment |

up vote
1
down vote

favorite

I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:

conf = 
 # Input Parameters.
 'mapred.bq.project.id': project,
 'mapred.bq.gcs.bucket': bucket,
 'mapred.bq.temp.gcs.path': input_directory,
 'mapred.bq.input.project.id': 'publicdata',
 'mapred.bq.input.dataset.id': 'samples',
 'mapred.bq.input.table.id': 'shakespeare',


# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'

# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
 'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
 'org.apache.hadoop.io.LongWritable',
 'com.google.gson.JsonObject',
 conf=conf)

asked Nov 9 at 13:49

jamiet

2,19812345

I have used the Use the BigQuery connector with Spark to extract data from a table in BigQuery by running the code on Google Dataproc. As far as I'm aware the code shared there:

conf = 
 # Input Parameters.
 'mapred.bq.project.id': project,
 'mapred.bq.gcs.bucket': bucket,
 'mapred.bq.temp.gcs.path': input_directory,
 'mapred.bq.input.project.id': 'publicdata',
 'mapred.bq.input.dataset.id': 'samples',
 'mapred.bq.input.table.id': 'shakespeare',


# Output Parameters.
output_dataset = 'wordcount_dataset'
output_table = 'wordcount_output'

# Load data in from BigQuery.
table_data = sc.newAPIHadoopRDD(
 'com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat',
 'org.apache.hadoop.io.LongWritable',
 'com.google.gson.JsonObject',
 conf=conf)

google-bigquery google-cloud-dataproc

asked Nov 9 at 13:49

jamiet

2,19812345

asked Nov 9 at 13:49

jamiet

2,19812345

asked Nov 9 at 13:49

jamiet

2,19812345

asked Nov 9 at 13:49

jamiet

2,19812345

asked Nov 9 at 13:49

jamiet

2,19812345

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Doesn't look like BigQuery supports any kind of filtering/querying for tables export at the moment:
https://cloud.google.com/bigquery/docs/exporting-data
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.extract

answered Nov 9 at 15:27

Igor Dvorzhak

589313

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226978%2fcan-i-issue-a-query-rather-than-specify-a-table-when-using-the-bigquery-connecto%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

answered Nov 9 at 15:27

Igor Dvorzhak

589313

add a comment |

up vote
0
down vote

answered Nov 9 at 15:27

Igor Dvorzhak

589313

add a comment |

up vote
0
down vote

answered Nov 9 at 15:27

Igor Dvorzhak

589313

answered Nov 9 at 15:27

Igor Dvorzhak

589313

answered Nov 9 at 15:27

Igor Dvorzhak

589313

answered Nov 9 at 15:27

Igor Dvorzhak

589313

answered Nov 9 at 15:27

Igor Dvorzhak

589313

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

This page is only for reference, If you need detailed information, please check here

DxkDM,kxJ0AG,fURyvDhRuy6L7,pCrPYHFY7PJdJ6dzMlMrKh6 U,9VlSW0USlZY,tT g2 rudF5q3I Xz,OT99p6mhq BIR5UVeO7o

搜尋此網誌

Pfthb