In Scala, how to read bytes from binary file delimited by characters?










0















In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.



For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.



How can I get a list of Array[Byte] for each item?










share|improve this question
























  • Did you check this? VecBinaryReader

    – Rcordoval
    Nov 15 '18 at 3:56











  • is using external libraries allowed for the solution ?..

    – stack0114106
    Nov 15 '18 at 12:48











  • Yes external libraries is fine

    – codeshark
    Nov 15 '18 at 18:55















0















In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.



For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.



How can I get a list of Array[Byte] for each item?










share|improve this question
























  • Did you check this? VecBinaryReader

    – Rcordoval
    Nov 15 '18 at 3:56











  • is using external libraries allowed for the solution ?..

    – stack0114106
    Nov 15 '18 at 12:48











  • Yes external libraries is fine

    – codeshark
    Nov 15 '18 at 18:55













0












0








0


1






In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.



For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.



How can I get a list of Array[Byte] for each item?










share|improve this question
















In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.



For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.



How can I get a list of Array[Byte] for each item?







java scala apache-spark inputstream apache-flink






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 3:18







codeshark

















asked Nov 15 '18 at 3:42









codesharkcodeshark

1,94173875




1,94173875












  • Did you check this? VecBinaryReader

    – Rcordoval
    Nov 15 '18 at 3:56











  • is using external libraries allowed for the solution ?..

    – stack0114106
    Nov 15 '18 at 12:48











  • Yes external libraries is fine

    – codeshark
    Nov 15 '18 at 18:55

















  • Did you check this? VecBinaryReader

    – Rcordoval
    Nov 15 '18 at 3:56











  • is using external libraries allowed for the solution ?..

    – stack0114106
    Nov 15 '18 at 12:48











  • Yes external libraries is fine

    – codeshark
    Nov 15 '18 at 18:55
















Did you check this? VecBinaryReader

– Rcordoval
Nov 15 '18 at 3:56





Did you check this? VecBinaryReader

– Rcordoval
Nov 15 '18 at 3:56













is using external libraries allowed for the solution ?..

– stack0114106
Nov 15 '18 at 12:48





is using external libraries allowed for the solution ?..

– stack0114106
Nov 15 '18 at 12:48













Yes external libraries is fine

– codeshark
Nov 15 '18 at 18:55





Yes external libraries is fine

– codeshark
Nov 15 '18 at 18:55












1 Answer
1






active

oldest

votes


















0














Functional solution, with help of java.nio:



import java.nio.file.Files, Paths

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))

case class Accumulator(result: List[List[Byte]], current: List[Byte])

val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)

items.foreach(item => println(new String(item)))





This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:



import java.io.BufferedInputStream, FileInputStream

import scala.collection.mutable.ArrayBuffer

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)


items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()








share|improve this answer

























  • Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

    – codeshark
    Nov 15 '18 at 19:32











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312113%2fin-scala-how-to-read-bytes-from-binary-file-delimited-by-characters%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Functional solution, with help of java.nio:



import java.nio.file.Files, Paths

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))

case class Accumulator(result: List[List[Byte]], current: List[Byte])

val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)

items.foreach(item => println(new String(item)))





This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:



import java.io.BufferedInputStream, FileInputStream

import scala.collection.mutable.ArrayBuffer

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)


items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()








share|improve this answer

























  • Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

    – codeshark
    Nov 15 '18 at 19:32















0














Functional solution, with help of java.nio:



import java.nio.file.Files, Paths

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))

case class Accumulator(result: List[List[Byte]], current: List[Byte])

val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)

items.foreach(item => println(new String(item)))





This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:



import java.io.BufferedInputStream, FileInputStream

import scala.collection.mutable.ArrayBuffer

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)


items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()








share|improve this answer

























  • Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

    – codeshark
    Nov 15 '18 at 19:32













0












0








0







Functional solution, with help of java.nio:



import java.nio.file.Files, Paths

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))

case class Accumulator(result: List[List[Byte]], current: List[Byte])

val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)

items.foreach(item => println(new String(item)))





This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:



import java.io.BufferedInputStream, FileInputStream

import scala.collection.mutable.ArrayBuffer

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)


items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()








share|improve this answer















Functional solution, with help of java.nio:



import java.nio.file.Files, Paths

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))

case class Accumulator(result: List[List[Byte]], current: List[Byte])

val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)

items.foreach(item => println(new String(item)))





This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:



import java.io.BufferedInputStream, FileInputStream

import scala.collection.mutable.ArrayBuffer

object Main

private val delimiter = 'n'.toByte

def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)


items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()









share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 10:20

























answered Nov 15 '18 at 10:04









ygorygor

1,1321616




1,1321616












  • Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

    – codeshark
    Nov 15 '18 at 19:32

















  • Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

    – codeshark
    Nov 15 '18 at 19:32
















Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

– codeshark
Nov 15 '18 at 19:32





Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?

– codeshark
Nov 15 '18 at 19:32



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312113%2fin-scala-how-to-read-bytes-from-binary-file-delimited-by-characters%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kleinkühnau

Makov (Slowakei)

Deutsches Schauspielhaus