compiler construction - Trying to understand the Lex syntax for Standard ML (ml-lex) -


i'm writing compiler. i'm @ first phase, trying tokenize everything. wrote up, error. i've read docs (smlnj) 3 or 4 times, , errors not informative.

i think must messing state change aspect of program, works fine things create tokens, when change state using yybegin, blows up.

here lex file:

type pos = int; type lexresult = tokens.token;  val linenum = errormsg.linenum; val linepos = errormsg.linepos; val commentdepth = ref 0;  fun inccom(cmdepth) = cmdepth := !cmdepth + 1; fun deccom(cmdepth) = cmdepth := !cmdepth - 1;  fun err(p1,p2) = errormsg.error p1;  fun eof() = let val pos = hd(!linepos) in tokens.eof(pos,pos) end;    %%  digits=[0-9]+;  %s comment string;  %%  <initial,comment>\n         => (linenum := !linenum+1; linepos := yypos :: !linepos; continue()); <initial>"type"             => (tokens.type(yypos, yypos+4)); <initial>"var"              => (tokens.var(yypos,yypos+3)); <initial>"function"         => (tokens.function(yypos, yypos+8)); <initial>"break"            => (tokens.break(yypos, yypos+5)); <initial>"of"               => (tokens.of(yypos, yypos+2)); <initial>"end"              => (tokens.end(yypos, yypos+3)); <initial>"in"               => (tokens.in(yypos, yypos+2)); <initial>"nil"              => (tokens.nil(yypos, yypos+3)); <initial>"let"              => (tokens.let(yypos, yypos+3)); <initial>"do"               => (tokens.do(yypos, yypos+2)); <initial>"to"               => (tokens.to(yypos, yypos+2)); <initial>"for"              => (tokens.for(yypos, yypos+3)); <initial>"while"            => (tokens.while(yypos, yypos+5)); <initial>"else"             => (tokens.else(yypos, yypos+4)); <initial>"then"             => (tokens.then(yypos, yypos+4)); <initial>"if"               => (tokens.if(yypos, yypos+2)); <initial>"array"            => (tokens.array(yypos, yypos+5)); <initial>":="               => (tokens.assign(yypos, yypos+2)); <initial>"|"                => (tokens.or(yypos, yypos+1)); <initial>"&"                => (tokens.and(yypos, yypos+1)); <initial>">="               => (tokens.ge(yypos, yypos+2)); <initial>">"                => (tokens.gt(yypos, yypos+1)); <initial>"<="               => (tokens.le(yypos, yypos+2)); <initial>"<"                => (tokens.lt(yypos, yypos+1)); <initial>"<>"               => (tokens.neq(yypos, yypos+2)); <initial>"="                => (tokens.eq(yypos, yypos+1)); <initial>"/"                => (tokens.divide(yypos, yypos+1)); <initial>"*"                => (tokens.times(yypos, yypos+1)); <initial>"-"                => (tokens.minus(yypos, yypos+1)); <initial>"+"                => (tokens.plus(yypos, yypos+1)); <initial>"."                => (tokens.dot(yypos, yypos+1)); <initial>"}"                => (tokens.rbrace(yypos, yypos+1)); <initial>"{"                => (tokens.lbrace(yypos, yypos+1)); <initial>"]"                => (tokens.rbrack(yypos, yypos+1)); <initial>"["                => (tokens.lbrack(yypos, yypos+1)); <initial>")"                => (tokens.rparen(yypos, yypos+1)); <initial>"("                => (tokens.lparen(yypos, yypos+1)); <initial>";"                => (tokens.semicolon(yypos, yypos+1)); <initial>":"                => (tokens.colon(yypos, yypos+1)); <initial>","                => (tokens.comma(yypos,yypos+1));   <initial>{digits}           => (tokens.int(valof(int.fromstring(yytext)), yypos, yypos + (size yytext))); <initial>[a-z][a-z0-9_]*    => (tokens.id(yytext, yypos, yypos + (size yytext))); <initial>(").*(")           => (tokens.string(yytext, yypos, yypos + (size yytext))); <initial>"\""               => (yybegin string; continue()); <string>"\""                => (yybegin initial; continue());  <initial>"/*"       => (inccom commentdepth; yybegin comment; continue()); <comment>"/*"       => (inccom commentdepth; continue()); <comment>"*/"       => (print "other trace!\n"; deccom commentdepth; if !commentdepth <= 0 yybegin initial else (); continue());  <initial,comment>[\ \t]+    => (print "trace 22222\n"; continue()); <initial>.                  => (errormsg.error yypos ("illegal character " ^ yytext); continue()); 

and here source file i'm tokenizing:

var , 123 /* comment */ 234 "d" 

it doesn't comments , doesn't strings. help.

edit: here updated lex file. have pinpointed breaks. detects start of new comment fine, switches comment state fine, detects space after comment fine, breaks, never gets point eats int.

comments terminated */, not *\. (<comment>"*\\" =>). , surely need <comment>. rule deal comment itself.

i don't see lexical rule state <string>; if there isn't one, problem strings. otherwise, it's rules, think.

edit based on edited question (not best use of so, imho):

i'm not expert in sml lexing, seems me need rule deal contents of comments , strings (as said above in first paragraph). in other words, there no rule apply in state <comment> or state <string> when character other terminating sequence encountered (or, in case of comments, whitespace.)


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -