I thought I'd make this available, as someone mentioned the horror of ebook formatting in the opie-reader thread. By the way, one of the best features of opie-reader: reading gzipped text. I have everything in that format.
textbath zaurus executableIt's my Textbath program for re-formatting text files. It's getting pretty fat now. The only thing missing is that is doesn't convert HTML/XML entities fully yet (it will only replace about three of them at the moment).
At it's most basic, it will remove all of the line-breaks and replace multi-line breaks with a single one, indenting new paragraphs if defined. Thus the text comes out much cleaner and easier to read on word-wrapping applications. It can decide when to add a line-break or paragraph based on the length of the current line, start-of-line string matching, and capitalization. It can also re-join hyphenated lines, remove tabs and HTML/XML tags (and convert some HTML/XML entities), add <p></p> and <br> tags, convert all non-ASCII characters into pure ASCII (e.g. the © symbol into
(C), that annoying binary apostrophe into an ASCII one, and all others), display file stats.
You can also use it to convert files between Unix/DOS/OldMac text formats without any editing. I don't know what I would have done without it.
It needs to be run from the terminal. Let me know if you find it useful, or have any problems with it.