![]() ![]() |
Nov 1 2005, 01:08 PM
Post
#1
|
|
|
Group: Members Posts: 59 Joined: 17-February 04 From: Wirral, UK Member No.: 1,907 |
I thought I'd make this available, as someone mentioned the horror of ebook formatting in the opie-reader thread. By the way, one of the best features of opie-reader: reading gzipped text. I have everything in that format.
textbath zaurus executable It's my Textbath program for re-formatting text files. It's getting pretty fat now. The only thing missing is that is doesn't convert HTML/XML entities fully yet (it will only replace about three of them at the moment). At it's most basic, it will remove all of the line-breaks and replace multi-line breaks with a single one, indenting new paragraphs if defined. Thus the text comes out much cleaner and easier to read on word-wrapping applications. It can decide when to add a line-break or paragraph based on the length of the current line, start-of-line string matching, and capitalization. It can also re-join hyphenated lines, remove tabs and HTML/XML tags (and convert some HTML/XML entities), add <p></p> and <br> tags, convert all non-ASCII characters into pure ASCII (e.g. the © symbol into (C), that annoying binary apostrophe into an ASCII one, and all others), display file stats. You can also use it to convert files between Unix/DOS/OldMac text formats without any editing. I don't know what I would have done without it. It needs to be run from the terminal. Let me know if you find it useful, or have any problems with it. |
|
|
|
Nov 3 2005, 11:43 AM
Post
#2
|
|
|
Group: Members Posts: 59 Joined: 17-February 04 From: Wirral, UK Member No.: 1,907 |
I've just updated it a bit, it now converts almost all numeric HTML entities, and the most common text-based ones (" < > and all that). Use the link as above.
|
|
|
|
Nov 19 2005, 03:10 PM
Post
#3
|
|
|
Group: Members Posts: 59 Joined: 17-February 04 From: Wirral, UK Member No.: 1,907 |
Updated again. It will now convert all HTML/XML entities that are in CP-1252 or ISO 8859-1 range, and all non-ascii chars will now convert to their nearest match or relevant string. Previously, it would use a space when it didn't understand a character.
Also, when converting from HTML, it will look for certain tags as hints to format the text, thus retaining formatting. There's not much else I can think of doing to it now. The only thing left on the list is to only allow line breaks on certain punctuation. After that, I don't know, so if anyone has any ideas let me know. Give it a whirl. |
|
|
|
Dec 1 2005, 08:53 AM
Post
#4
|
|
|
Group: Members Posts: 303 Joined: 6-February 04 Member No.: 1,740 |
I will try it tonight.
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 17th May 2013 - 09:35 PM |