Custom parser for file similar to json python
I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).
I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.
It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.
An example of the file can be seen below, along with what the end result is meant to resemble.
FILE: ABC.querty
Dict_abc_1
Dict_abc_2
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
Dict_abc_3
FamilyName="John Doe";
Dict_abc_4
Array_abc
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
Dict_abc_5
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
Code=999999;
FILE: ABC.json
"Dict_abc_1":
"Dict_abc_2":
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
,
"Dict_abc_3":
"FamilyName":"John Doe"
,
"Dict_abc_4":
"Array_abc":[
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""
],
"Dict_abc_5":
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
,
"Code":999999
Additional token information,
Token types can either be (with possible values)
IDENTIFIER contain the name of the variable identifier
VARIABLE containing actual data belonging to the parent IDENTIFIER
OPERATOR equal "="
OPEN_BRACKET equal ""
CLOSE_BRACKET equal ""
An example of ABC.querty's lexemes can be seen HERE
fundamental logical extract of main.py
def main():
content = open_file(file_name) ## read file
lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin
parser = Parser(tokens).parse() ## create parser class given tokens
print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))
parser.py
import re
class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object =
self.current_object =
self.path = [self.json_object]
def parse(self):
while self.token_index < len(self.tokens):
token = self.getToken()
token_type = token[0]
token_value = token[1]
print("%s t %s" % (token_type, token_value))
if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass
elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass
elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass
if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()
else:
#end script
pass
self.increment()
print(self.path)
return self.json_object
def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1
def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1
def getToken(self, x=0):
return self.tokens[self.token_index+x]
Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.
Any suggestions or solutions are very much appreciated,
Thanks.
python json algorithm parsing lexical-analysis
add a comment |
I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).
I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.
It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.
An example of the file can be seen below, along with what the end result is meant to resemble.
FILE: ABC.querty
Dict_abc_1
Dict_abc_2
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
Dict_abc_3
FamilyName="John Doe";
Dict_abc_4
Array_abc
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
Dict_abc_5
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
Code=999999;
FILE: ABC.json
"Dict_abc_1":
"Dict_abc_2":
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
,
"Dict_abc_3":
"FamilyName":"John Doe"
,
"Dict_abc_4":
"Array_abc":[
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""
],
"Dict_abc_5":
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
,
"Code":999999
Additional token information,
Token types can either be (with possible values)
IDENTIFIER contain the name of the variable identifier
VARIABLE containing actual data belonging to the parent IDENTIFIER
OPERATOR equal "="
OPEN_BRACKET equal ""
CLOSE_BRACKET equal ""
An example of ABC.querty's lexemes can be seen HERE
fundamental logical extract of main.py
def main():
content = open_file(file_name) ## read file
lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin
parser = Parser(tokens).parse() ## create parser class given tokens
print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))
parser.py
import re
class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object =
self.current_object =
self.path = [self.json_object]
def parse(self):
while self.token_index < len(self.tokens):
token = self.getToken()
token_type = token[0]
token_value = token[1]
print("%s t %s" % (token_type, token_value))
if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass
elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass
elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass
if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()
else:
#end script
pass
self.increment()
print(self.path)
return self.json_object
def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1
def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1
def getToken(self, x=0):
return self.tokens[self.token_index+x]
Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.
Any suggestions or solutions are very much appreciated,
Thanks.
python json algorithm parsing lexical-analysis
In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56
Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34
add a comment |
I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).
I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.
It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.
An example of the file can be seen below, along with what the end result is meant to resemble.
FILE: ABC.querty
Dict_abc_1
Dict_abc_2
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
Dict_abc_3
FamilyName="John Doe";
Dict_abc_4
Array_abc
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
Dict_abc_5
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
Code=999999;
FILE: ABC.json
"Dict_abc_1":
"Dict_abc_2":
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
,
"Dict_abc_3":
"FamilyName":"John Doe"
,
"Dict_abc_4":
"Array_abc":[
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""
],
"Dict_abc_5":
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
,
"Code":999999
Additional token information,
Token types can either be (with possible values)
IDENTIFIER contain the name of the variable identifier
VARIABLE containing actual data belonging to the parent IDENTIFIER
OPERATOR equal "="
OPEN_BRACKET equal ""
CLOSE_BRACKET equal ""
An example of ABC.querty's lexemes can be seen HERE
fundamental logical extract of main.py
def main():
content = open_file(file_name) ## read file
lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin
parser = Parser(tokens).parse() ## create parser class given tokens
print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))
parser.py
import re
class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object =
self.current_object =
self.path = [self.json_object]
def parse(self):
while self.token_index < len(self.tokens):
token = self.getToken()
token_type = token[0]
token_value = token[1]
print("%s t %s" % (token_type, token_value))
if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass
elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass
elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass
if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()
else:
#end script
pass
self.increment()
print(self.path)
return self.json_object
def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1
def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1
def getToken(self, x=0):
return self.tokens[self.token_index+x]
Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.
Any suggestions or solutions are very much appreciated,
Thanks.
python json algorithm parsing lexical-analysis
I'm attempting to create a parser to translate a "custom" file into JSON so I can more easily manipulate its contents (For argument's sake, call the "custom" formate a .qwerty).
I've already created a Lexer which breaks down the file into individual lexemes (tokens) which structure is [token_type, token_value]. Now I am struggling to parse the lexemes into their correct dictionaries, as it is difficult to insert data into a sub-sub-dictionary since Keys aren't constant. As well as insert data into arrays stored in dictionaries.
It should be noted I am attempting to sequentially parse tokens into an actual python json object then dump the json object.
An example of the file can be seen below, along with what the end result is meant to resemble.
FILE: ABC.querty
Dict_abc_1
Dict_abc_2
HeaderGUID="";
Version_TPI="999";
EncryptionType="0";
Dict_abc_3
FamilyName="John Doe";
Dict_abc_4
Array_abc
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 01:00:00"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
TimeStamp="2018-11-07 02:53:57"; otherinfo="";
Dict_abc_5
LastContact="2018-11-08 01:00:00";
BatteryStatus=99;
BUStatus=PowerOn;
LastCallTime="2018-11-08 01:12:46";
LastSuccessPoll="2018-11-08 01:12:46";
CallResult=Successful;
Code=999999;
FILE: ABC.json
"Dict_abc_1":
"Dict_abc_2":
"HeaderGUID":"",
"Version_TPI":"999",
"EncryptionType":"0"
,
"Dict_abc_3":
"FamilyName":"John Doe"
,
"Dict_abc_4":
"Array_abc":[
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 01:00:00", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":"",
"TimeStamp":"2018-11-07 02:53:57", "otherinfo":""
],
"Dict_abc_5":
"LastContact":"2018-11-08 01:00:00",
"BatteryStatus":99,
"BUStatus":"PowerOn",
"LastCallTime":"2018-11-08 01:12:46",
"LastSuccessPoll":"2018-11-08 01:12:46",
"CallResult":"Successful"
,
"Code":999999
Additional token information,
Token types can either be (with possible values)
IDENTIFIER contain the name of the variable identifier
VARIABLE containing actual data belonging to the parent IDENTIFIER
OPERATOR equal "="
OPEN_BRACKET equal ""
CLOSE_BRACKET equal ""
An example of ABC.querty's lexemes can be seen HERE
fundamental logical extract of main.py
def main():
content = open_file(file_name) ## read file
lexer = Lexer(content) ## create lexer class
tokens = lexer.tokenize() ## create lexems as seen in pastebin
parser = Parser(tokens).parse() ## create parser class given tokens
print(json.dumps(parser, sort_keys=True,indent=4, separators=(',', ': ')))
parser.py
import re
class Parser(object):
def __init__(self, tokens):
self.tokens = tokens
self.token_index = 0
self.json_object =
self.current_object =
self.path = [self.json_object]
def parse(self):
while self.token_index < len(self.tokens):
token = self.getToken()
token_type = token[0]
token_value = token[1]
print("%s t %s" % (token_type, token_value))
if token_type in "IDENTIFIER":
self.increment()
identifier_type = self.getToken()
if identifier_type[0] in "OPEN_BRACKET":
identifier_two_type = self.getToken(1)
if identifier_two_type[0] in ["OPERATOR","IDENTIFIER"]:
## make dict in current dict
pass
elif identifier_two_type[0] in "OPEN_BRACKET":
## make array in current dict
pass
elif identifier_type[0] in "OPERATOR":
## insert data into current dict
pass
if token_type in "CLOSE_BRACKET":
identifier_type = self.getToken()
if "OPEN_BRACKET" in identifier_type[0]:
#still in array of current dict
pass
elif "IDENTIFIER" in identifier_type[0]:
self.changeDirectory()
else:
#end script
pass
self.increment()
print(self.path)
return self.json_object
def changeDirectory(self):
if len(self.path) > 0:
self.path = self.path.pop()
self.current_object = -1
def increment(self):
if self.token_index < len(self.tokens):
self.token_index+=1
def getToken(self, x=0):
return self.tokens[self.token_index+x]
Additional parse information,
Currently, I was trying to store the current dictionary in a path array to allow me to insert into dictionaries and arrays within dictionaries.
Any suggestions or solutions are very much appreciated,
Thanks.
python json algorithm parsing lexical-analysis
python json algorithm parsing lexical-analysis
asked Nov 13 '18 at 0:38
AlexMikaAlexMika
857
857
In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56
Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34
add a comment |
In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56
Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34
In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56
In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56
Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34
Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34
add a comment |
1 Answer
1
active
oldest
votes
Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.
For example "FamilyName":"John Doe"
. Tokens are "FamilyName"
, :
and "John Doe"
.
You add first token on stack. stack = ["FamilyName"]
.
Rule 1: str_obj -> E
. So you create Expression(type='str', value="FamilyName")
and stack is now stack = [Expression]
.
Then you add next token.stack = [Expression, ':']
. No rules for ':'
. Go next.
stack = [Expression, ':', "FamilyName"]
. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]
. Then we see another rule. Rule 2: E:E -> E
. Use it like Expression(type='kv_pair, value=(Expression, Expression))
. And stack becomes stack=[Expression]
.
And if you describes all the rules it will work like that. Hope it helps.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272128%2fcustom-parser-for-file-similar-to-json-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.
For example "FamilyName":"John Doe"
. Tokens are "FamilyName"
, :
and "John Doe"
.
You add first token on stack. stack = ["FamilyName"]
.
Rule 1: str_obj -> E
. So you create Expression(type='str', value="FamilyName")
and stack is now stack = [Expression]
.
Then you add next token.stack = [Expression, ':']
. No rules for ':'
. Go next.
stack = [Expression, ':', "FamilyName"]
. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]
. Then we see another rule. Rule 2: E:E -> E
. Use it like Expression(type='kv_pair, value=(Expression, Expression))
. And stack becomes stack=[Expression]
.
And if you describes all the rules it will work like that. Hope it helps.
add a comment |
Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.
For example "FamilyName":"John Doe"
. Tokens are "FamilyName"
, :
and "John Doe"
.
You add first token on stack. stack = ["FamilyName"]
.
Rule 1: str_obj -> E
. So you create Expression(type='str', value="FamilyName")
and stack is now stack = [Expression]
.
Then you add next token.stack = [Expression, ':']
. No rules for ':'
. Go next.
stack = [Expression, ':', "FamilyName"]
. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]
. Then we see another rule. Rule 2: E:E -> E
. Use it like Expression(type='kv_pair, value=(Expression, Expression))
. And stack becomes stack=[Expression]
.
And if you describes all the rules it will work like that. Hope it helps.
add a comment |
Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.
For example "FamilyName":"John Doe"
. Tokens are "FamilyName"
, :
and "John Doe"
.
You add first token on stack. stack = ["FamilyName"]
.
Rule 1: str_obj -> E
. So you create Expression(type='str', value="FamilyName")
and stack is now stack = [Expression]
.
Then you add next token.stack = [Expression, ':']
. No rules for ':'
. Go next.
stack = [Expression, ':', "FamilyName"]
. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]
. Then we see another rule. Rule 2: E:E -> E
. Use it like Expression(type='kv_pair, value=(Expression, Expression))
. And stack becomes stack=[Expression]
.
And if you describes all the rules it will work like that. Hope it helps.
Last time I solved this problem I find out that finite-state machine is very helpful. I want to recommend the way after you have tokens but I don't know how it's called in english. The principle is: you go through tokens and add one by one on stack. After adding on stack you are checking stack for some rules. Like you combine primitive tokens into expressions that might be a part of more complex expressions.
For example "FamilyName":"John Doe"
. Tokens are "FamilyName"
, :
and "John Doe"
.
You add first token on stack. stack = ["FamilyName"]
.
Rule 1: str_obj -> E
. So you create Expression(type='str', value="FamilyName")
and stack is now stack = [Expression]
.
Then you add next token.stack = [Expression, ':']
. No rules for ':'
. Go next.
stack = [Expression, ':', "FamilyName"]
. Again we meet rule 1. So stack becomes stack = [Expression, ':', Expression]
. Then we see another rule. Rule 2: E:E -> E
. Use it like Expression(type='kv_pair, value=(Expression, Expression))
. And stack becomes stack=[Expression]
.
And if you describes all the rules it will work like that. Hope it helps.
answered Nov 13 '18 at 1:09
sashaaerosashaaero
8531720
8531720
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272128%2fcustom-parser-for-file-similar-to-json-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
In querty files there doesn't seem to be a way to distinguish empty dictionaries from empty arrays. Maybe you want to fix that?
– Matt Timmermans
Nov 13 '18 at 4:56
Unfortunately, it's not my format @MattTimmermans but thanks for the heads up.
– AlexMika
Nov 13 '18 at 23:34