c++ - Is there any way to get "\n" from streams? -
i trying work file, , convert kind of data structure (text "array" of paragraphs, paragraph "array" of sentences , sentence "array" of words, char*).
to make easy myself using data streams (ifstream, exact), 1 of problems met defining paragraphs end (2 '\n' considered end of paragraph). simple way go char char on text , check each 1 of them if space or '\n', that's long , kind of painful.
the code looks this:
std::ifstream fd(filename); char buffer[128]; while(fd >> buffer) { /* code in here things buffer */ } and - well, works, ignores paragraphs completely. fd.get(buffer, 128, '\n') doesn't work needed either - cuts off after reading 1 time.
so - there way easier reading char char? can't use getline() since task forbids use vectors or strings.
update
so seems std::istream::getline may trick me, still not quite expected. reads, well, first line, , after weird happens.
the code looks that:
std::ifstream fd(fl); char buffer[128]; fd.getline(buffer, 128); std::cout << "555 - [" << buffer << "]" << std::endl; std::cout << fd.gcount() << std::endl; fd.getline(buffer, 128); std::cout << "777 - [" << buffer << "]" << std::endl; std::cout << fd.gcount() << std::endl; and output looks that
]55 - [text file 23 ]77 - [ 2 and - yeah, don't think understand what's going on.
from understood, may not use of std containers.
so think possible:
- read entire file buffer
- tokenize buffer paragraphs
- tokenize each paragraph sentences
- tokenize each sentence words
for first part, may use:
//! reads file buffer, must deleted afterwards char* readfile(const char *filename) { std::ifstream ifs(filename, std::ifstream::binary); if (!filename.good()) return null; ifs.seekg(0, ifs.end); size_t len = ifs.tellg(); ifs.seekg(0, ifs.beg); char* buffer = new char[len]; if (!buffer) { // check failed alocation ifs.close(); return null; } if (ifs.read(buffer, len) != len) { // check if entire file read delete[] buffer; buffer = null; } ifs.close(); return buffer; } with function ready, need use , tokenize string. that, must define our types (basing on linked lists, using c coding format)
struct word { char *contents; word *next; }; struct sentence { word *first; sentence *next; }; struct paragraph { sentence *first; paragraph *next; }; struct text { paragraph *first; }; with types defined, can start reading our text:
//! splits sentence in many word elements possible void readsentence(char *buffer, size_t len, word **target) { if (!buffer || *buffer == '\0' || len == 0) return; *target = new word; (*target)->next = null; char *end = strpbrk(buffer, " \t\r\n"); if (end != null) { (*target)->contents = new char[end - buffer + 1]; strncpy((*target)->contents, buffer, end - buffer); (*target)->contents[end - buffer] = '\0'; readsentence(end + 1, strlen(end + 1), &(*target)->next); } else { (*target)->contents = _strdup(buffer); } } //! splits paragraph text buffer in many sentence possible void readparagraph(char *buffer, size_t len, sentence **target) { if (!buffer || *buffer == '\0' || len == 0) return; *target = new sentence; (*target)->next = null; char *end = strpbrk(buffer, ".;:?!"); if (end != null) { char *t = new char[end - buffer + 2]; strncpy(t, buffer, end - buffer + 1); t[end - buffer + 1] = '\0'; readsentence(t, (size_t)(end - buffer + 1), &(*target)->first); delete[] t; readparagraph(end + 1, len - (end - buffer + 1), &(*target)->next); } else { readsentence(buffer, len, &(*target)->first); } } //! splits many paragraph possible text buffer void readtext(char *buffer, paragraph **target) { if (!buffer || *buffer == '\0') return; *target = new paragraph; (*target)->next = null; char *end = strstr(buffer, "\n\n"); // this, have pointer end of paragraph. pass our sentence parser. if (end != null) { char *t = new char[end - buffer + 1]; strncpy(t, buffer, end - buffer); t[end - buffer] = '\0'; readparagraph(t, (size_t)(end - buffer), &(*target)->first); delete[] t; readtext(end + 2, &(*target)->next); } else readparagraph(buffer, strlen(buffer), &(*target)->first); } text* createtext(char *contents) { text *text = new text; readtext(contents, &text->first); return text; } as example, may use this:
int main(int argc, char **argv) { char *buffer = readfile("mytext.txt"); text *text = createtext(buffer); delete[] buffer; (paragraph* p = text->first; p != null; p = p->next) { (sentence* s = p->first; s != null; s = s->next) { (word* w = s->first; w != null; w = w->next) { std::cout << w->contents << " "; } } std::cout << std::endl << std::endl; } return 0; } please keep in mind code might or might not work, since did not test this.
sources:
Comments
Post a Comment