Raw Time Series Data storing POST CREATION

up vote
-1
down vote

favorite

I am designing a system that, among other things, will store Time Series data (until they somehow expire).

Data is coming from different sensors that read vibrations at very high frequencies (10+ kHz); my system will be designed for analytics purposes, so MANY MANY MANY reads, expecially because of range queries. Inserts will be rare: for each event another middleware system records metrics and save then onto different .csv files, given as output ONLY AFTER the event end (so 0 possibilities of an "online" data ingestion).

Right now I am a little bit confused.

I first tried a NoSQL solution (Cassandra), using Pentaho DI as the ETL tool (I also have to add, for each record/row, some additional info, like timestamp, sensor, etc.).

For 1000M rows, expected finish time was 32 (THIRTY-TWO) hours.

I am sure I messed things up while transforming data, but I am however ready to bet that ETL + Cassandra is not the best fit for my problem (probably high-latency queries while reading).

Cassandra is a best-bet when insertion-rate is very high, and this is not my case.

What system should I prefer, then???

Just a quick recap of my constraints:

High amount of data to be stored (100M rows for a day)

ETL tool needed (better if open sourced)

High reading rate

Low (but "big", during no-event periods) insertion rate

Not "powerful only when scaled" (I have only a single available node for this system)

Strong Data aging/retention policies

EDIT FOR THE FUNNY GUY WHO DOWNVOTED MY QUESTION

Dude, I certainly appreciate advices and corrections; but please, please!!! Don't just downvote questions without telling why!

edited 15 hours ago

asked 19 hours ago

LucaF

What exactly did you try with PDI ? Could you share the KTR's/KJB's without exposing sensitive data ? That could help a lot.
– Cristian Curti
14 hours ago

add a comment |

up vote
-1
down vote

favorite

I am designing a system that, among other things, will store Time Series data (until they somehow expire).

Right now I am a little bit confused.

I first tried a NoSQL solution (Cassandra), using Pentaho DI as the ETL tool (I also have to add, for each record/row, some additional info, like timestamp, sensor, etc.).

For 1000M rows, expected finish time was 32 (THIRTY-TWO) hours.

I am sure I messed things up while transforming data, but I am however ready to bet that ETL + Cassandra is not the best fit for my problem (probably high-latency queries while reading).

Cassandra is a best-bet when insertion-rate is very high, and this is not my case.

What system should I prefer, then???

Just a quick recap of my constraints:

High amount of data to be stored (100M rows for a day)

ETL tool needed (better if open sourced)

High reading rate

Low (but "big", during no-event periods) insertion rate

Not "powerful only when scaled" (I have only a single available node for this system)

Strong Data aging/retention policies

EDIT FOR THE FUNNY GUY WHO DOWNVOTED MY QUESTION

Dude, I certainly appreciate advices and corrections; but please, please!!! Don't just downvote questions without telling why!

edited 15 hours ago

asked 19 hours ago

LucaF

What exactly did you try with PDI ? Could you share the KTR's/KJB's without exposing sensitive data ? That could help a lot.
– Cristian Curti
14 hours ago

add a comment |

up vote
-1
down vote

favorite

I am designing a system that, among other things, will store Time Series data (until they somehow expire).

Right now I am a little bit confused.

I first tried a NoSQL solution (Cassandra), using Pentaho DI as the ETL tool (I also have to add, for each record/row, some additional info, like timestamp, sensor, etc.).

For 1000M rows, expected finish time was 32 (THIRTY-TWO) hours.

I am sure I messed things up while transforming data, but I am however ready to bet that ETL + Cassandra is not the best fit for my problem (probably high-latency queries while reading).

Cassandra is a best-bet when insertion-rate is very high, and this is not my case.

What system should I prefer, then???

Just a quick recap of my constraints:

High amount of data to be stored (100M rows for a day)

ETL tool needed (better if open sourced)

High reading rate

Low (but "big", during no-event periods) insertion rate

Not "powerful only when scaled" (I have only a single available node for this system)

Strong Data aging/retention policies

EDIT FOR THE FUNNY GUY WHO DOWNVOTED MY QUESTION

Dude, I certainly appreciate advices and corrections; but please, please!!! Don't just downvote questions without telling why!

edited 15 hours ago

asked 19 hours ago

LucaF

I am designing a system that, among other things, will store Time Series data (until they somehow expire).

Right now I am a little bit confused.

I first tried a NoSQL solution (Cassandra), using Pentaho DI as the ETL tool (I also have to add, for each record/row, some additional info, like timestamp, sensor, etc.).

For 1000M rows, expected finish time was 32 (THIRTY-TWO) hours.

I am sure I messed things up while transforming data, but I am however ready to bet that ETL + Cassandra is not the best fit for my problem (probably high-latency queries while reading).

Cassandra is a best-bet when insertion-rate is very high, and this is not my case.

What system should I prefer, then???

Just a quick recap of my constraints:

High amount of data to be stored (100M rows for a day)

ETL tool needed (better if open sourced)

High reading rate

Low (but "big", during no-event periods) insertion rate

Not "powerful only when scaled" (I have only a single available node for this system)

Strong Data aging/retention policies

EDIT FOR THE FUNNY GUY WHO DOWNVOTED MY QUESTION

Dude, I certainly appreciate advices and corrections; but please, please!!! Don't just downvote questions without telling why!

database performance cassandra time-series etl

edited 15 hours ago

asked 19 hours ago

LucaF

edited 15 hours ago

asked 19 hours ago

LucaF

edited 15 hours ago

asked 19 hours ago

LucaF

asked 19 hours ago

LucaF

asked 19 hours ago

LucaF

What exactly did you try with PDI ? Could you share the KTR's/KJB's without exposing sensitive data ? That could help a lot.
– Cristian Curti
14 hours ago

add a comment |

What exactly did you try with PDI ? Could you share the KTR's/KJB's without exposing sensitive data ? That could help a lot.
– Cristian Curti
14 hours ago

What exactly did you try with PDI ? Could you share the KTR's/KJB's without exposing sensitive data ? That could help a lot.
– Cristian Curti
14 hours ago

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53224137%2fraw-time-series-data-storing-post-creation%23new-answer', 'question_page');

);

Post as a guest

Name

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

This page is only for reference, If you need detailed information, please check here

fj5wT2dWmkXUYCT,q,GbX5o2s2kVZEIh3ygR,yo2zwNCdr

搜尋此網誌

Pfthb