c++ - Is there any way to get "\n" from streams? -


i trying work file, , convert kind of data structure (text "array" of paragraphs, paragraph "array" of sentences , sentence "array" of words, char*).

to make easy myself using data streams (ifstream, exact), 1 of problems met defining paragraphs end (2 '\n' considered end of paragraph). simple way go char char on text , check each 1 of them if space or '\n', that's long , kind of painful.

the code looks this:

    std::ifstream fd(filename);     char buffer[128];      while(fd >> buffer)     {         /* code in here things buffer */     } 

and - well, works, ignores paragraphs completely. fd.get(buffer, 128, '\n') doesn't work needed either - cuts off after reading 1 time.

so - there way easier reading char char? can't use getline() since task forbids use vectors or strings.

update

so seems std::istream::getline may trick me, still not quite expected. reads, well, first line, , after weird happens.

the code looks that:

std::ifstream fd(fl); char buffer[128]; fd.getline(buffer, 128); std::cout << "555 - [" << buffer << "]" << std::endl; std::cout << fd.gcount() << std::endl; fd.getline(buffer, 128); std::cout << "777 - [" << buffer << "]" << std::endl; std::cout << fd.gcount() << std::endl; 

and output looks that

]55 - [text file 23 ]77 - [ 2 

and - yeah, don't think understand what's going on.

from understood, may not use of std containers.

so think possible:

  1. read entire file buffer
  2. tokenize buffer paragraphs
  3. tokenize each paragraph sentences
  4. tokenize each sentence words

for first part, may use:

//! reads file buffer, must deleted afterwards char* readfile(const char *filename) {   std::ifstream ifs(filename, std::ifstream::binary);    if (!filename.good())     return null;    ifs.seekg(0, ifs.end);   size_t len = ifs.tellg();   ifs.seekg(0, ifs.beg);    char* buffer = new char[len];   if (!buffer) { // check failed alocation     ifs.close();     return null;   }    if (ifs.read(buffer, len) != len) { // check if entire file read     delete[] buffer;     buffer = null;   }   ifs.close();   return buffer; } 

with function ready, need use , tokenize string. that, must define our types (basing on linked lists, using c coding format)

struct word {   char *contents;   word *next; };  struct sentence {   word *first;   sentence *next; };  struct paragraph {   sentence *first;   paragraph *next; };  struct text {   paragraph *first; }; 

with types defined, can start reading our text:

//! splits sentence in many word elements possible void readsentence(char *buffer, size_t len, word **target) {     if (!buffer || *buffer == '\0' || len == 0) return;      *target = new word;     (*target)->next = null;      char *end = strpbrk(buffer, " \t\r\n");      if (end != null) {         (*target)->contents = new char[end - buffer + 1];         strncpy((*target)->contents, buffer, end - buffer);         (*target)->contents[end - buffer] = '\0';         readsentence(end + 1, strlen(end + 1), &(*target)->next);     }     else {         (*target)->contents = _strdup(buffer);     } }  //! splits paragraph text buffer in many sentence possible void readparagraph(char *buffer, size_t len, sentence **target) {     if (!buffer || *buffer == '\0' || len == 0) return;      *target = new sentence;     (*target)->next = null;      char *end = strpbrk(buffer, ".;:?!");      if (end != null) {         char *t = new char[end - buffer + 2];         strncpy(t, buffer, end - buffer + 1);         t[end - buffer + 1] = '\0';         readsentence(t, (size_t)(end - buffer + 1), &(*target)->first);         delete[] t;          readparagraph(end + 1, len - (end - buffer + 1), &(*target)->next);     }     else {         readsentence(buffer, len, &(*target)->first);     } }  //! splits many paragraph possible text buffer void readtext(char *buffer, paragraph **target) {     if (!buffer || *buffer == '\0') return;      *target = new paragraph;     (*target)->next = null;      char *end = strstr(buffer, "\n\n"); // this, have pointer end of paragraph. pass our sentence parser.     if (end != null) {         char *t = new char[end - buffer + 1];         strncpy(t, buffer, end - buffer);         t[end - buffer] = '\0';         readparagraph(t, (size_t)(end - buffer), &(*target)->first);         delete[] t;          readtext(end + 2, &(*target)->next);     }     else         readparagraph(buffer, strlen(buffer), &(*target)->first); }  text* createtext(char *contents) {     text *text = new text;     readtext(contents, &text->first);     return text; } 

as example, may use this:

int main(int argc, char **argv) {     char *buffer = readfile("mytext.txt");     text *text = createtext(buffer);     delete[] buffer;      (paragraph* p = text->first; p != null; p = p->next) {         (sentence* s = p->first; s != null; s = s->next) {             (word* w = s->first; w != null; w = w->next) {                 std::cout << w->contents << " ";             }         }         std::cout << std::endl << std::endl;     }      return 0; } 

please keep in mind code might or might not work, since did not test this.

sources:


Comments

Popular posts from this blog

javascript - jquery or ashx not working -

opencv - DataType<cv::detail::deriv_type>::depth what is it used for -

python 3.x - Mapping specific letters onto a list of words -