Author Topic: E-book Formatter Available  (Read 2944 times)

Fromwithin

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
    • http://fromwithin.com/
E-book Formatter Available
« on: November 01, 2005, 04:08:40 pm »
I thought I'd make this available, as someone mentioned the horror of ebook formatting in the opie-reader thread. By the way, one of the best features of opie-reader: reading gzipped text. I have everything in that format.

textbath zaurus executable

It's my Textbath program for re-formatting text files. It's getting pretty fat now. The only thing missing is that is doesn't convert HTML/XML entities fully yet (it will only replace about three of them at the moment).

At it's most basic, it will remove all of the line-breaks and replace multi-line breaks with a single one, indenting new paragraphs if defined. Thus the text comes out much cleaner and easier to read on word-wrapping applications. It can decide when to add a line-break or paragraph based on the length of the current line, start-of-line string matching, and capitalization. It can also re-join hyphenated lines, remove tabs and HTML/XML tags (and convert some HTML/XML entities), add <p></p> and <br> tags, convert all non-ASCII characters into pure ASCII (e.g. the © symbol into (C), that annoying binary apostrophe into an ASCII one, and all others), display file stats.

You can also use it to convert files between Unix/DOS/OldMac text formats without any editing. I don't know what I would have done without it.

It needs to be run from the terminal. Let me know if you find it useful, or have any problems with it.
« Last Edit: November 03, 2005, 02:49:01 pm by Fromwithin »

Fromwithin

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
    • http://fromwithin.com/
E-book Formatter Available
« Reply #1 on: November 03, 2005, 02:43:54 pm »
I've just updated it a bit, it now converts almost all numeric HTML entities, and the most common text-based ones (&quot; &lt; &gt and all that). Use the link as above.

Fromwithin

  • Jr. Member
  • **
  • Posts: 59
    • View Profile
    • http://fromwithin.com/
E-book Formatter Available
« Reply #2 on: November 19, 2005, 06:10:47 pm »
Updated again. It will now convert all HTML/XML entities that are in CP-1252 or ISO 8859-1 range, and all non-ascii chars will now convert to their nearest match or relevant string. Previously, it would use a space when it didn't understand a character.

Also, when converting from HTML, it will look for certain tags as hints to format the text, thus retaining formatting.

There's not much else I can think of doing to it now. The only thing left on the list is to only allow line breaks on certain punctuation. After that, I don't know, so if anyone has any ideas let me know.

Give it a whirl.

kurochka

  • Sr. Member
  • ****
  • Posts: 301
    • View Profile
E-book Formatter Available
« Reply #3 on: December 01, 2005, 11:53:36 am »
I will try it tonight.      Hopefully, it is simple enough for non-programmers.
SL-C3100 (from PriceJapan.com): modified Sharp Rom (couldn't make Japanese input work in Cacko Rom)

ex-SL-C3000; ex-SL-5600; ex-Simpad