Explain Stream Tokenizer



Stream Tokenizer :

StreamTokenizer is a direct subclass of Object class. StreamTokenizer is included in java.io. Package. The StreamTokenizer class takes an input stream and parses it into “tokens” , allowing the tokens to be read one at a time. The parsing process is controlled by a table and a number of flags ( like TT_WORD, TT-EOF,TT-EOL etc., all representing some integer value) that can be set to various states. The StreamTokenizer can recongnize identifiers, numbers, quoted strings and various comment styles. Tokenizing is a feature of compilers, interpreters and parsers.

A stream can contain three types of tokens.

  • Words ( that is, multiple character tokens )
  • Single-character tokens
  • Whitespace( including C/C++/Java-style comments )

Some constants, defined in StreamTokenizer, used as flags to identity the tokens :

int TT-EOL : A constant indicating that the end of the line has been read.

Int TT-EOF : A constant indicating that the end of the stream has been read.

int TT-WORD : A constant indicating that a word token has been read.

Int ttype : After a call to the nextToken method, this field contains the type of the token just read

Aim : To count the number of words in a file using StreamTokenizer and whitespace as delimiter File name is passes as command-line argument.

Sample program of StreamTokenizer as follows :

import java.io.*
public class StreamTokenizerDemo {
static int words = 0;
public static void wordCount(Reader r)throws IOException
{
StreamTokenizer st = new StreamTokenizer(r);
st.wordChars(33,255);

// if token in not EOF
while(st.nextToken()!=st.TT_EOF)
{
//if token is word
if(st.ttype == st.TT_WORD)
words++;
}

}
public static void main(String args[])throws IOException
{
// pass file name as command-line
FileReader fr = new FileReader(args[0]);
wordCount(fr);
System.out.pritnln(” Total words in file :”+words);
}
}

Method signature of wordChars();

public void wordChars(int low, int high);

Specifies that all characters between thew range low and high are word constituents. A word token consists of a word constituent followed by zero or more word constituents or number constituents

Share with SociBook.com

Related Post

Post a Comment