Parquet binary UTF8 as string on hive










0















There is a parquet file with a binary (UTF8) column named abc in it.

Is it possible to create an external table on hive that will contain the same column abc, but casted to string?



The structure of the parquet file:



$ parquet-tools schema ~/Downloads/dataset.gz.parquet
message spark_schema
optional binary abc (UTF8);










share|improve this question
























  • Try select cast(abc as string) from mytable or similar? There's also decode function or base64. If you explain what is inside binary data that would be more useful.

    – Hitobat
    Nov 12 '18 at 18:24











  • @Hitobat I want to remove the burden of casting types from the users. Do you think it might be possible?

    – dmigo
    Nov 13 '18 at 12:40











  • Oh, I understand what you mean now. In this case, you can have an external-table to read the data, and then define a view based on the table for your users to query from.

    – Hitobat
    Nov 13 '18 at 14:00















0















There is a parquet file with a binary (UTF8) column named abc in it.

Is it possible to create an external table on hive that will contain the same column abc, but casted to string?



The structure of the parquet file:



$ parquet-tools schema ~/Downloads/dataset.gz.parquet
message spark_schema
optional binary abc (UTF8);










share|improve this question
























  • Try select cast(abc as string) from mytable or similar? There's also decode function or base64. If you explain what is inside binary data that would be more useful.

    – Hitobat
    Nov 12 '18 at 18:24











  • @Hitobat I want to remove the burden of casting types from the users. Do you think it might be possible?

    – dmigo
    Nov 13 '18 at 12:40











  • Oh, I understand what you mean now. In this case, you can have an external-table to read the data, and then define a view based on the table for your users to query from.

    – Hitobat
    Nov 13 '18 at 14:00













0












0








0








There is a parquet file with a binary (UTF8) column named abc in it.

Is it possible to create an external table on hive that will contain the same column abc, but casted to string?



The structure of the parquet file:



$ parquet-tools schema ~/Downloads/dataset.gz.parquet
message spark_schema
optional binary abc (UTF8);










share|improve this question
















There is a parquet file with a binary (UTF8) column named abc in it.

Is it possible to create an external table on hive that will contain the same column abc, but casted to string?



The structure of the parquet file:



$ parquet-tools schema ~/Downloads/dataset.gz.parquet
message spark_schema
optional binary abc (UTF8);







hive parquet sqldatatypes






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 14:23







dmigo

















asked Nov 12 '18 at 18:06









dmigodmigo

5571036




5571036












  • Try select cast(abc as string) from mytable or similar? There's also decode function or base64. If you explain what is inside binary data that would be more useful.

    – Hitobat
    Nov 12 '18 at 18:24











  • @Hitobat I want to remove the burden of casting types from the users. Do you think it might be possible?

    – dmigo
    Nov 13 '18 at 12:40











  • Oh, I understand what you mean now. In this case, you can have an external-table to read the data, and then define a view based on the table for your users to query from.

    – Hitobat
    Nov 13 '18 at 14:00

















  • Try select cast(abc as string) from mytable or similar? There's also decode function or base64. If you explain what is inside binary data that would be more useful.

    – Hitobat
    Nov 12 '18 at 18:24











  • @Hitobat I want to remove the burden of casting types from the users. Do you think it might be possible?

    – dmigo
    Nov 13 '18 at 12:40











  • Oh, I understand what you mean now. In this case, you can have an external-table to read the data, and then define a view based on the table for your users to query from.

    – Hitobat
    Nov 13 '18 at 14:00
















Try select cast(abc as string) from mytable or similar? There's also decode function or base64. If you explain what is inside binary data that would be more useful.

– Hitobat
Nov 12 '18 at 18:24





Try select cast(abc as string) from mytable or similar? There's also decode function or base64. If you explain what is inside binary data that would be more useful.

– Hitobat
Nov 12 '18 at 18:24













@Hitobat I want to remove the burden of casting types from the users. Do you think it might be possible?

– dmigo
Nov 13 '18 at 12:40





@Hitobat I want to remove the burden of casting types from the users. Do you think it might be possible?

– dmigo
Nov 13 '18 at 12:40













Oh, I understand what you mean now. In this case, you can have an external-table to read the data, and then define a view based on the table for your users to query from.

– Hitobat
Nov 13 '18 at 14:00





Oh, I understand what you mean now. In this case, you can have an external-table to read the data, and then define a view based on the table for your users to query from.

– Hitobat
Nov 13 '18 at 14:00












2 Answers
2






active

oldest

votes


















1














There are three different types involved:



  • There is a SQL schema for the table in Hive. Each column has a type, like STRING or DECIMAL. Each table (or partition in case it's a partitioned table) consist of multiple files that must be of the same file format, for example PLAINTEXT, AVRO or PARQUET.

  • Each of the individual files will have type information as well (except PLAINTEXT). In the case of Parquet, this means two further levels:

    • The physical type describes the storage size, for example INT32, BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY.

    • The logical type tells applications how to interpret the data, for example UTF8 or DECIMAL.


The STRING column type of Hive is stored as a BYTE_ARRAY physical type (called binary in Parquet schema definitions) with the UTF8 logical type annotation.






share|improve this answer
































    0














    Apparently, you can simply specify string as the type of the column and it will be taken care of.



    CREATE EXTERNAL TABLE `dataset`( 
    `abc` string)
    STORED AS parquet
    LOCATION
    '...';





    share|improve this answer
























      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267746%2fparquet-binary-utf8-as-string-on-hive%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      There are three different types involved:



      • There is a SQL schema for the table in Hive. Each column has a type, like STRING or DECIMAL. Each table (or partition in case it's a partitioned table) consist of multiple files that must be of the same file format, for example PLAINTEXT, AVRO or PARQUET.

      • Each of the individual files will have type information as well (except PLAINTEXT). In the case of Parquet, this means two further levels:

        • The physical type describes the storage size, for example INT32, BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY.

        • The logical type tells applications how to interpret the data, for example UTF8 or DECIMAL.


      The STRING column type of Hive is stored as a BYTE_ARRAY physical type (called binary in Parquet schema definitions) with the UTF8 logical type annotation.






      share|improve this answer





























        1














        There are three different types involved:



        • There is a SQL schema for the table in Hive. Each column has a type, like STRING or DECIMAL. Each table (or partition in case it's a partitioned table) consist of multiple files that must be of the same file format, for example PLAINTEXT, AVRO or PARQUET.

        • Each of the individual files will have type information as well (except PLAINTEXT). In the case of Parquet, this means two further levels:

          • The physical type describes the storage size, for example INT32, BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY.

          • The logical type tells applications how to interpret the data, for example UTF8 or DECIMAL.


        The STRING column type of Hive is stored as a BYTE_ARRAY physical type (called binary in Parquet schema definitions) with the UTF8 logical type annotation.






        share|improve this answer



























          1












          1








          1







          There are three different types involved:



          • There is a SQL schema for the table in Hive. Each column has a type, like STRING or DECIMAL. Each table (or partition in case it's a partitioned table) consist of multiple files that must be of the same file format, for example PLAINTEXT, AVRO or PARQUET.

          • Each of the individual files will have type information as well (except PLAINTEXT). In the case of Parquet, this means two further levels:

            • The physical type describes the storage size, for example INT32, BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY.

            • The logical type tells applications how to interpret the data, for example UTF8 or DECIMAL.


          The STRING column type of Hive is stored as a BYTE_ARRAY physical type (called binary in Parquet schema definitions) with the UTF8 logical type annotation.






          share|improve this answer















          There are three different types involved:



          • There is a SQL schema for the table in Hive. Each column has a type, like STRING or DECIMAL. Each table (or partition in case it's a partitioned table) consist of multiple files that must be of the same file format, for example PLAINTEXT, AVRO or PARQUET.

          • Each of the individual files will have type information as well (except PLAINTEXT). In the case of Parquet, this means two further levels:

            • The physical type describes the storage size, for example INT32, BYTE_ARRAY or FIXED_LEN_BYTE_ARRAY.

            • The logical type tells applications how to interpret the data, for example UTF8 or DECIMAL.


          The STRING column type of Hive is stored as a BYTE_ARRAY physical type (called binary in Parquet schema definitions) with the UTF8 logical type annotation.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 18:33

























          answered Nov 14 '18 at 17:32









          ZoltanZoltan

          1,164313




          1,164313























              0














              Apparently, you can simply specify string as the type of the column and it will be taken care of.



              CREATE EXTERNAL TABLE `dataset`( 
              `abc` string)
              STORED AS parquet
              LOCATION
              '...';





              share|improve this answer





























                0














                Apparently, you can simply specify string as the type of the column and it will be taken care of.



                CREATE EXTERNAL TABLE `dataset`( 
                `abc` string)
                STORED AS parquet
                LOCATION
                '...';





                share|improve this answer



























                  0












                  0








                  0







                  Apparently, you can simply specify string as the type of the column and it will be taken care of.



                  CREATE EXTERNAL TABLE `dataset`( 
                  `abc` string)
                  STORED AS parquet
                  LOCATION
                  '...';





                  share|improve this answer















                  Apparently, you can simply specify string as the type of the column and it will be taken care of.



                  CREATE EXTERNAL TABLE `dataset`( 
                  `abc` string)
                  STORED AS parquet
                  LOCATION
                  '...';






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 13 '18 at 14:36

























                  answered Nov 13 '18 at 14:24









                  dmigodmigo

                  5571036




                  5571036



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53267746%2fparquet-binary-utf8-as-string-on-hive%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Use pre created SQLite database for Android project in kotlin

                      Darth Vader #20

                      Ondo