In Scala, how to read bytes from binary file delimited by characters?
In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.
For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.
How can I get a list of Array[Byte] for each item?
java scala apache-spark inputstream apache-flink
add a comment |
In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.
For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.
How can I get a list of Array[Byte] for each item?
java scala apache-spark inputstream apache-flink
Did you check this? VecBinaryReader
– Rcordoval
Nov 15 '18 at 3:56
is using external libraries allowed for the solution ?..
– stack0114106
Nov 15 '18 at 12:48
Yes external libraries is fine
– codeshark
Nov 15 '18 at 18:55
add a comment |
In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.
For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.
How can I get a list of Array[Byte] for each item?
java scala apache-spark inputstream apache-flink
In Scala, given a binary file, I am interested in retrieving a list of Array[Byte] items.
For example, the binary file has items delimited by the characters/bytes 'my-delimiter'.
How can I get a list of Array[Byte] for each item?
java scala apache-spark inputstream apache-flink
java scala apache-spark inputstream apache-flink
edited Nov 16 '18 at 3:18
codeshark
asked Nov 15 '18 at 3:42
codesharkcodeshark
1,94173875
1,94173875
Did you check this? VecBinaryReader
– Rcordoval
Nov 15 '18 at 3:56
is using external libraries allowed for the solution ?..
– stack0114106
Nov 15 '18 at 12:48
Yes external libraries is fine
– codeshark
Nov 15 '18 at 18:55
add a comment |
Did you check this? VecBinaryReader
– Rcordoval
Nov 15 '18 at 3:56
is using external libraries allowed for the solution ?..
– stack0114106
Nov 15 '18 at 12:48
Yes external libraries is fine
– codeshark
Nov 15 '18 at 18:55
Did you check this? VecBinaryReader
– Rcordoval
Nov 15 '18 at 3:56
Did you check this? VecBinaryReader
– Rcordoval
Nov 15 '18 at 3:56
is using external libraries allowed for the solution ?..
– stack0114106
Nov 15 '18 at 12:48
is using external libraries allowed for the solution ?..
– stack0114106
Nov 15 '18 at 12:48
Yes external libraries is fine
– codeshark
Nov 15 '18 at 18:55
Yes external libraries is fine
– codeshark
Nov 15 '18 at 18:55
add a comment |
1 Answer
1
active
oldest
votes
Functional solution, with help of java.nio:
import java.nio.file.Files, Paths
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))
case class Accumulator(result: List[List[Byte]], current: List[Byte])
val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
items.foreach(item => println(new String(item)))
This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:
import java.io.BufferedInputStream, FileInputStream
import scala.collection.mutable.ArrayBuffer
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)
items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312113%2fin-scala-how-to-read-bytes-from-binary-file-delimited-by-characters%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Functional solution, with help of java.nio:
import java.nio.file.Files, Paths
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))
case class Accumulator(result: List[List[Byte]], current: List[Byte])
val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
items.foreach(item => println(new String(item)))
This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:
import java.io.BufferedInputStream, FileInputStream
import scala.collection.mutable.ArrayBuffer
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)
items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
add a comment |
Functional solution, with help of java.nio:
import java.nio.file.Files, Paths
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))
case class Accumulator(result: List[List[Byte]], current: List[Byte])
val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
items.foreach(item => println(new String(item)))
This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:
import java.io.BufferedInputStream, FileInputStream
import scala.collection.mutable.ArrayBuffer
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)
items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
add a comment |
Functional solution, with help of java.nio:
import java.nio.file.Files, Paths
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))
case class Accumulator(result: List[List[Byte]], current: List[Byte])
val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
items.foreach(item => println(new String(item)))
This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:
import java.io.BufferedInputStream, FileInputStream
import scala.collection.mutable.ArrayBuffer
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)
items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()
Functional solution, with help of java.nio:
import java.nio.file.Files, Paths
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val byteArray = Files.readAllBytes(Paths.get(args(0)))
case class Accumulator(result: List[List[Byte]], current: List[Byte])
val items: List[Array[Byte]] = byteArray.foldLeft(Accumulator(Nil, Nil))
case (Accumulator(result, current), nextByte) =>
if (nextByte == delimiter)
Accumulator(current :: result, Nil)
else
Accumulator(result, nextByte :: current)
match
case Accumulator(result, current) => (current :: result).reverse.map(_.reverse.toArray)
items.foreach(item => println(new String(item)))
This solution is expected to have poor performance though. How important is that for you ? How many files, of what size and how often will you read? If performance is important, than you should rather use input streams and mutable collections:
import java.io.BufferedInputStream, FileInputStream
import scala.collection.mutable.ArrayBuffer
object Main
private val delimiter = 'n'.toByte
def main(args: Array[String]): Unit =
val items = ArrayBuffer.empty[Array[Byte]]
val item = ArrayBuffer.empty[Byte]
val bis = new BufferedInputStream(new FileInputStream(args(0)))
var nextByte: Int = -1
while ( nextByte = bis.read(); nextByte != -1)
if (nextByte == delimiter)
items.append(item.toArray)
item.clear()
else
item.append(nextByte.toByte)
items.append(item.toArray)
items.foreach(item => println(new String(item)))
bis.close()
edited Nov 15 '18 at 10:20
answered Nov 15 '18 at 10:04
ygorygor
1,1321616
1,1321616
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
add a comment |
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
Thanks Igor. If the delimiter was more than a single byte with multiple characters, how can this solution be extended/modified to achieve that?
– codeshark
Nov 15 '18 at 19:32
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312113%2fin-scala-how-to-read-bytes-from-binary-file-delimited-by-characters%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Did you check this? VecBinaryReader
– Rcordoval
Nov 15 '18 at 3:56
is using external libraries allowed for the solution ?..
– stack0114106
Nov 15 '18 at 12:48
Yes external libraries is fine
– codeshark
Nov 15 '18 at 18:55