OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> E-book Formatter Available, Textbath for download
Fromwithin
post Nov 1 2005, 01:08 PM
Post #1





Group: Members
Posts: 59
Joined: 17-February 04
From: Wirral, UK
Member No.: 1,907



I thought I'd make this available, as someone mentioned the horror of ebook formatting in the opie-reader thread. By the way, one of the best features of opie-reader: reading gzipped text. I have everything in that format.

textbath zaurus executable

It's my Textbath program for re-formatting text files. It's getting pretty fat now. The only thing missing is that is doesn't convert HTML/XML entities fully yet (it will only replace about three of them at the moment).

At it's most basic, it will remove all of the line-breaks and replace multi-line breaks with a single one, indenting new paragraphs if defined. Thus the text comes out much cleaner and easier to read on word-wrapping applications. It can decide when to add a line-break or paragraph based on the length of the current line, start-of-line string matching, and capitalization. It can also re-join hyphenated lines, remove tabs and HTML/XML tags (and convert some HTML/XML entities), add <p></p> and <br> tags, convert all non-ASCII characters into pure ASCII (e.g. the © symbol into (C), that annoying binary apostrophe into an ASCII one, and all others), display file stats.

You can also use it to convert files between Unix/DOS/OldMac text formats without any editing. I don't know what I would have done without it.

It needs to be run from the terminal. Let me know if you find it useful, or have any problems with it.
Go to the top of the page
 
+Quote Post
Fromwithin
post Nov 3 2005, 11:43 AM
Post #2





Group: Members
Posts: 59
Joined: 17-February 04
From: Wirral, UK
Member No.: 1,907



I've just updated it a bit, it now converts almost all numeric HTML entities, and the most common text-based ones (&quot; &lt; &gt and all that). Use the link as above.
Go to the top of the page
 
+Quote Post
Fromwithin
post Nov 19 2005, 03:10 PM
Post #3





Group: Members
Posts: 59
Joined: 17-February 04
From: Wirral, UK
Member No.: 1,907



Updated again. It will now convert all HTML/XML entities that are in CP-1252 or ISO 8859-1 range, and all non-ascii chars will now convert to their nearest match or relevant string. Previously, it would use a space when it didn't understand a character.

Also, when converting from HTML, it will look for certain tags as hints to format the text, thus retaining formatting.

There's not much else I can think of doing to it now. The only thing left on the list is to only allow line breaks on certain punctuation. After that, I don't know, so if anyone has any ideas let me know.

Give it a whirl.
Go to the top of the page
 
+Quote Post
kurochka
post Dec 1 2005, 08:53 AM
Post #4





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



I will try it tonight. biggrin.gif Hopefully, it is simple enough for non-programmers.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd November 2014 - 05:16 PM