With a short line length (30 characters) a word-wrapped paragraph might look like this:
<< Emma Woodhouse, handsome,
clever, and rich, with a
comfortable home and happy
disposition, seemed to unite
some of the best blessings
of existence; and had lived
nearly twenty-one years in
the world with very little
to distress or vex her. >>
With a longer line length (50 characters) the same word-wrapped paragraph would look like this:
<< Emma Woodhouse, handsome, clever, and rich, with
a comfortable home and happy disposition, seemed
to unite some of the best blessings of existence;
and had lived nearly twenty-one years in the world
with very little to distress or vex her. >>
By 'reflowing' a paragraph I mean transforming a block of text in memory from the first to the second example above simply by specifying a new maximum line length (known as the 'the measure' in type-setting).
It turned out to be particularly hard because of the convention of representing line-ends with two ASCII bytes: CR+LF. This convention means that the total number of characters in a paragraph changes when you reflow it since one-byte spaces become two-byte line endings and two-byte line endings become one-byte spaces. This means that when reflowing to a narrower measure the paragraph occupies more bytes than before (because it gains line endings) and when reflowing
to a wider measure the paragraph occupies fewer bytes than before (because it loses line endings).
This is a pain for memory management--you have to leave some space after the paragraph for it to grow--and it makes the programming fiddly. If only there were a single character to mark the end of a line the programming for reflowing would be easy and the paragraph would stay the same length in terms of bytes: all you'd do is change some spaces for line-ends and vice versa.
Then I remembered that Apple Macintosh computers use a single character for line endings (whereas MS-DOS/Windows machines use CR+LF) and I got to wondering if this is connected to the first Apple Macs being considered amazing because their GUIs could reflow paragraphs as you altered the size of the window the text was in. Until then, all that personal computers could do was 'reflow' as a command, not as a dynamic and visible on-screen GUI activity. Maybe, I thought, Apple made the decision to have one character line endings precisely so they could have fast, simple reflowing of paragraphs in their GUI
Through the kind offices of his publisher, O'Reilly Books, I was able to ask Andy Hertzfeld, the designer of the system software on the original Macintosh, whether this consideration was part of his thinking, and in fact it wasn't. He explained that the choice of a single byte line terminator as a standard on the Mac was inherited from the Apple Lisa computer, which itself inherited the standard from the USCD Pascal system used to develop the Lisa and Mac. He expressed a preference for one-byte line termination in any case, since "it's simpler and smaller (one character vs two) and cleaner and better defined (no opportunity for ambiguity -- in the two character approach, what happens if one character is missing?)"
My paragraph reflower is copied below. It's a subroutine that, for the purposes of testing and showing to others, is here accompanied by other subroutines (using CP/M BDOS system calls) for entering the paragraph at the keyboard and for displaying it after it has been reflowed. My reflower takes three passes through the paragraph: the first strips out the existing line endings and puts spaces in their places thus shortening the byte-count), the second applies the new measure by changing some of those spaces to single-byte line terminators (just a CR), and then the third pass turns these one-byte CR line terminators into two-byte CR+LF terminators, thus increasing the byte-count.
I reckon that with some thought I could combine the first two passes, stripping out the existing line endings and applying the new ones on-the-fly. But I believe that the third pass has to remain a separate operation because it needs to work backwards from the end of the paragraph, since the paragraph may grow. (At least, that's essential if the reflower works on the paragraph 'in-situ', as mine done, rather than making a copy of it elsewhere in memory.)
I'd be interested in critiques of this code, and in particular:
1) Is there a more efficient algorithm I could have used? I don't mean better reflowing in itself, as in Donald Knuth's 'Minimum raggedness' routine that counts the sum of the squares of the number of spaces at the ends of the lines. I mean just a smarter way to achieve the simple 'Minimum number of lines' result used in my program.
2) Does the program commit any stylistic errors that a beginner should seek to avoid? I've tried to be scrupulous about each subroutine leaving every register the way it found it and may have overdone the PUSHing and POPping needed to achieve this. To make sure I didn't overflow the CP/M allocated stack I set up my own 64-byte stack as the guide I'm using, 'Soul of CP/M' by Michael Waite and Robert Lafore, recommends doing that.
Last thought: am I wrong to think that the CP/M BDOS routines are really annoying in the way that they trash the registers each time you call them? Surely good programming practice would have been to avoid doing this? I've had to put a lot of work into PUSHing and POPping registers every time I call a BDOS routine because of this problem. CP/M is brilliant and all, but isn't this bad practice? Or is there a good reason that they do it that I haven't though of?
Regards
Gabriel Egan
- Code: Select all
;---------------------------------------------------------------------
; IN-SITU PARAGRAPH REFLOWER. Parameters are passed to this
; subroutine thus:
;
; On entry, BC must contain the address of the start of the paragraph
; (hereafter called SOP)
;
; On entry DE must contains the new width ('measure') to be applied
;
; On entry, the paragraph must have ASCII CR/LF at the end of every
; line and CR/LF/CR/LF to mark the end of the paragraph. The same
; formatting will structure the reflowed paragraph on exit.
;
; During execution the passed parameters are stored in
; the subroutine's memory locations SOP and MEAS (allocated
; by the assembler), registers BC and DE get used as a pair of
; pointers to locations within the paragraph, and HL gets
; used to count the number of CRs inserted to reflow the
; paragraph. The use of 16-bit values for 'measure'
; and CR-count means there is no limit (other than the
; processor's own) on the width or length of the
; paragraph. Because two-byte line-endings (CR/LF) are
; necessarily changed to one-byte spaces and vice versa to
; perform reflowing, the resulting paragraph will be shorter
; in total bytes (if reflowed to a wider 'measure') or
; longer in total bytes (if reflowed to a narrower 'measure')
; than the original paragraph. It is the user's responsibility
; to ensure that enough memory locations are free directly
; after the end of the original paragraph to allow for any
; expansion.
;
; On exit, all registers are restored to their on-entry values.
;
; CP/M's stack pointer is saved in memory and another, larger stack
; is created for this program (using the 64 bytes after the program's
; end), and then CP/M's stack pointer is restored on exit
;
;--------------------------------------------------------------------
;--------------------------------------------------------------------
; Assign some symbols to make the code more legible
;--------------------------------------------------------------------
BDOS equ 05h ; The jump vector into BDOS
GETCHAR equ 01h ; BDOS code for getting one char from keyboard
WRTCHAR equ 02h ; BDOS code for writing a char to console
WRTLINE equ 09h ; BDOS code for writing a line to the console
CR equ 0Dh ; ASCII carriage return character
LF equ 0Ah ; ASCII line feed character
SPACE equ 20h ; ASCII space character
CTRLC equ 03h ; ASCII Control-C character
;----------------------------------------------------------------
; Here a short bit of code to call the REFLOW routine to show
; it works. First call GETKEY to have the user enter a
; paragraph, then call REFLOW to reflow it, and then call
; DISPlay to show the reflowed paragraph
;----------------------------------------------------------------
org 0100h
; Save all registers we came in with
PUSH PSW ; First lets save all the registers
PUSH B ; on the stack so we can restore
PUSH D ; them upon exitting the routine
PUSH H
; Set up our own stack starting 64 bytes after this program
LXI H, 0h ; Init HL to 0
DAD SP ; HL = HL + SP (so LH = SP)
SHLD OLDSP ; Stash away the CP/M stack pointer
LXI SP, STKTOP ; And put the address of our stack into SP
; Set the variables for REFLOW, call DISP, then REFLOW, then DISP again
LXI B, 8000h ; Put address of string-start into BC
LXI D, 30h ; Put desired measure (here 48 decimal) into DE
CALL GETKEY ; Let use type in a paragraph
CALL REFLOW ; Call the subroutine for flowing
CALL DISP ; Throw the flowed paragraph onto console
; Put back CP/M stack
LHLD OLDSP ; Put into HL the 16-bit address of CP/M's
; stack pointer we stashed away earlier
SPHL ; And put in the SP register
; Restore all the registers we came in with
POP H ; Before we Return lets get back all the registers
POP D
POP B
POP PSW
RET ; go back to CP/M
;-------------------------------------------------------------------------
; END OF MAIN PROGRAM. The rest is subroutines called by above.
;-------------------------------------------------------------------------
;--------------------------------------------------------------------------
; REFLOW actually starts here
;-----------------------------------------------------------------
REFLOW PUSH PSW ; First lets save all the registers
PUSH B ; on the stack so we can restore
PUSH D ; them upon exitting the routine
PUSH H
; Now save in memory the two variables (SOP and 'measure')
; passed to the subroutine via regs BC and DE
LXI H, SOP ; Put the addresss of the variable SOP in HL
MOV M, B ; and store there the contents of reg B
LXI H, SOP+1 ; Put the address of second btye of SOP in HL
MOV M, C ; and store there the contents of reg C
LXI H, MEAS ; Do the same for 'measure' passed in DE
MOV M, D
LXI H, MEAS+1
MOV M, E
;-------------------------------------------------------------------
; Because we've saved their contents, we can use BC and DE as pointers
; and HL as the count of CRs. BC already contains the address of the
; start-of-block we wish to reflow
;--------------------------------------------------------------------
;------------------------------------------------------------------
; FIRST PASS. Turn all CR/LF pairs (except the two at the paragraph
; end) into single spaces, contracting the paragraph as necessary.
; BC will be the lo-pointer and DE the hi-pointer above it that
; skips over LFs we don't want to copy down to the lo-pointer
; location. In this pass we work our way up thru memory.
;------------------------------------------------------------------
MOV D, B ; Init. hi-pointer DE by
MOV E, C ; copying lo-pointer BC
LOOP1 LDAX D ; Copy one character from hi-pointer to
STAX B ; lo-pointer
CPI CR ; Is the char we just moved a CR?
JNZ NOTCR ; If not, skip CR-handling
INX D ; Ok, we hit a CR, so is it the beginning of
INX D ; a CR/LR/CR/LF run? Look two chars ahead
LDAX D ; and see if that is a CR
CPI CR ;
JNZ LINEND ; No, then skip ahead to line-end handler
; Yes, so we are at paragraph-end so let's tidy
; up the the end of new paragraph at lo-pointer.
; We already put a CR at lo-pointer to start
MVI A, LF ; the CR/LF/CR/LF run at lo-pointer,
INX B ; so increment lo-pointer and
STAX B ; put a LR there
MVI A, CR ; Then increment lo-pointer again and
INX B ; put a CR there
STAX B ;
MVI A, LF ; Then do the same to put in the
INX B ; last LF of CR/LF/CR/LF
STAX B ;
JMP PASS2 ; The para-end is now tidy so proceed to PASS2
LINEND ; If we get here the current char is CR but
; it's not the start of a CR/LF/CR/LF run.
MVI A, SPACE ; We just placed at lo-pointer a CR that we
STAX B ; need to make a space, so do that.
DCX D ; We moved hi-pointer up 2 bytes to check for
DCX D ; CR/LF/CR/LF so lets put it back, and then
INX D ; move it up one to jump over the unwanted LF
NOTCR INX B ; Increment lo-pointer
INX D ; and hi-pointer
JMP LOOP1 ; And go back for next character
;---------------------------------------------------------------------------------
; SECOND PASS. We apply the new measure by turning the space after the
; last-word-that-fits (=the space before the first-word-that-won't-fit)
; on each line into a CR. (In the third pass we will turn these CRs into
; CR/LFs.) In this pass, BC will be lo-pointer, there is no hi-pointer,
; and DE (=space left) will start out holding the passed 'measure' and
; be decremented each time we accept a character and when it hits
; zero we've run out of room on the present line. DE (= space left)
; will be reset to 'measure' each time we start to process another
; HL will hold a count of the CRs we insert in this pass, which info
; is needed for the final pass.
;---------------------------------------------------------------------------------
PASS2 LXI H, SOP ; Lets get SOP pointer back into BC (since we used
MOV B, M ; BC as lo-pointer in PASS1) by recovering it
LXI H, SOP+1 ; from memory locations SOP and SOP+1
MOV C, M
LXI H, MEAS ; And do the same for DE to hold Measure (since
MOV D, M ; we used DE as hi-pointer in PASS1)
LXI H, MEAS+1 ;
MOV E, M
LXI H, 0 ; Set HL (counter of CRs inserted) to zero
LOOP2 LDAX B ; Get the char. currently lo-pointed to.
CPI CR ; Is it a CR? If so we've reach end-of-para
JZ PASS3 ; and should proceed to PASS3
; So, it's not a CR. Do we need to put in a
; line-break because DE (=space left) = 0?
MOV A, D ; Put MSB of DE (=space left) into accumm.
ORA E ; and OR with LSB of DE. If the result is not
JNZ GOTROOM ; zero then DE (space left) still has room
; Okay, DE (=space left) is zero so we need to
; back up one char at a time to find a space
; and put a CR there
NOTSPC DCX B ; Move lo-pointer back one character
LDAX B ; Get the char. pointed to
CPI SPACE ; Is it a space?
JNZ NOTSPC ; No, go around again to back up another char
MVI A, CR ; Yes, it's a space so lets put a CR here,
STAX B ; overwriting that space
LDA MEAS ; Since we are thus starting a new line we
MOV D, A ; need to restore DE (=space left)
LDA MEAS+1 ; to the passed value of Measure
MOV E, A
INX B ; Move lo-pointer up on to first byte after
; the CR we just placed, so it points to
; the zeroeth char of this new line
INX H ; Increment count of how many CRs we inserted
GOTROOM INX B ; Point to next character in paragraph
DCX D ; Decrement DE (= space left) to allow for
; the char we just accepted
JMP LOOP2 ; Go back and process the next char.
;-----------------------------------------------------------------------
; THIRD PASS. Turn the single CRs that mark line-endings in the reflowed
; paragraph into CR/LF pairs, expanding the paragraph as necessary.
; We will use BC as lo-pointer and DE as hi-pointer. BC already
; points to the CR in the paragraph-ending CR/LF/CR/LF string and
; HL already contains the count of how many CRs we inserted in PASS2.
; We will work backwards in memory, starting with hi-pointer =
; lo-pointer + 3 (to allow for LR/CR/LF) + HL (to allow for one
; extra byte for each CR that we have to expand to CR/LF).
;-----------------------------------------------------------------------
PASS3 MOV D, B ; Init point hi-pointer DE by copying
MOV E, C ; lo-pointer BC
INX D ; Move hi-pointer DE up 3 bytes up so that it
INX D ; it points to the last byte in the
INX D ; paragraph (= last LF of the CR/LR/CR/LF run)
XCHG ; Add HL (count of CRs) to hi-pointer, bearing
DAD D ; in mind that DAD (which adds DE to HL)
XCHG ; leaves its result in HL and we want it in
; DE instead, so we swap HL<>DE before we do
; the addition and swap them again after
MVI A, LF ; Now write out the CR/LF/CR/LF string that ends
STAX D ; the new reflowed paragraph. In reverse order
MVI A, CR ; we put LF, CR, LF, CR into hi-pointer,
DCX D ; hi-pointer-1, hi-pointer-2, and hi-pointer-3,
STAX D ; by decrementing hi-pointer each time
MVI A, LF
DCX D
STAX D
MVI A, CR
DCX D
STAX D
; lo-pointer BC now points to the first CR at
; the end of the old paragraph and hi-pointer DE
; points to the first CR at the end of the new
; longer paragraph
LOOP3 DCX B ; Move both pointers down one byte
DCX D ; to get ready to process next char.
LDAX B ; Get the BE lo-pointer character into the accum.
STAX D ; Put it in the DE hi-pointer location
CPI 0Dh ; Was it a CR?
JNZ LOOP3 ; No, then go back for next char
MVI A, LF ; Yes, so we have CR->CR/LF handling to do.
; First, put an LF in the hi-pointer location to
STAX D ; overwrite the CR we just stored there
DCX D ; Then back up one byte to the location before
MVI A, CR ; that LF and put an CR there
STAX D ;
DCX H ; We've just handled a CR->CR/LF so lower the
; count of line-ends still to do
MOV A, H ; Is HL=0? Find out by putting MSB of HL
ORA L ; into accumm and ORing with LSB of HL
JNZ LOOP3 ; If the result not 0 we've more lines to do
; If result is zero, we've just done the last
; CR->CR/LF expansion and are done
; Write out closing message "Reflow done."
LXI D, DONE ; Load DE with the address of the start
; of the prompt string
MVI C, WRTLINE ; Load C with the BDOS call number for
; outputting a string to console
CALL BDOS ; Make the BDOS call
POP H ; Before we Return lets get back all the registers
POP D
POP B
POP PSW
RET
;---------------------------------------------------------------------
; End of REFLOW subroutine
;---------------------------------------------------------------------
;---------------------------------------------------------------------
; DISPlay. This subroutine displays the paragraph that begins at the
; location pointed to by BC on entry. We will use the CP/M
; BDOS call "Console Output" in which the character to be displayed
; must be put in register E and the value 02h must be put in
; register C before CALLing BDOS (at location 0005h). All registers
; are restored before return
;---------------------------------------------------------------------
DISP PUSH PSW ; First lets save all the registers
PUSH B ; on the stack so we can restore
PUSH D ; them upon exitting the routine
PUSH H
LOOP0 PUSH B ; Save BC (because BDOS call will trash it)
LDAX B ; Get the character pointed to by BC
MOV E, A ; Put it into E
MVI C, WRTCHAR ; Put 02h into C (thereby trashing BC)
CALL BDOS ; Make the call that displays this character
POP B ; Restore BC
LDAX B ; Get the character we just displayed
INX B ; Point to next character
CPI LF ; Was it a line-feed we just displayed?
JNZ LOOP0 ; No, so go back for the next character
; Yes, so lets see if it was the LF at
; the end of a CR/LF/CR/LF run
DCX B ; Back up three memory locations
DCX B ; (not two because we already incremented BC)
DCX B ;
LDAX B ; Get that character
INX B ;
INX B ; Put BC back where it was
INX B ;
CPI LF ; Is that also a LF?
JNZ LOOP0 ; No, then merely a line-end, so go round again
; Yes, so we've displayed the entire paragraph
; and can stop displaying
POP H ; Before we Return lets get back all the registers
POP D
POP B
POP PSW
RET
;-----------------------------------------------------------------------
; End of DISPlay subroutine
;-----------------------------------------------------------------------
;-----------------------------------------------------------------------
; GETKEY subroutine. This uses the get-one-char BDOS call
;-----------------------------------------------------------------------
GETKEY PUSH PSW ; First lets save all the registers
PUSH B ; on the stack so we can restore
PUSH D l; them upon exitting the routine
PUSH H
LOOP PUSH B ; Save BC because BDOS call will trash it
MVI C, 01h ; Load up C with the BDOS call number
CALL 05h ; Make BDOS call to get a char and display it
POP B ; Get back the BC that BDOS just trashed
CPI 03h ; Did the user just hit CTRL-C?
JZ CTC ; Yes, then we're done
STAX B ; No, then store at end-of-block the incoming char
CPI 0Dh ; Was it a carriage-return?
JNZ NOTLE ; No, then skip line-end handler
INX B ; Yes, then point BC to next free memory location
MVI A, 0Ah ; Get a line-feed
STAX B ; Save it after the CR we just save
PUSH B ; We need another BDOS call to show this LF
MOV E, A ; so save BC, put the LF into E,
MVI C, 02h ; load up C with the BDOS call number,
CALL 05h ; and make the BDOS call to display this LF
POP B ; Get back the BC that the BDOS call trashed
NOTLE INX B ; Point BC to the next free memory location
JMP LOOP ; Go back for another character from keyboard
CTC POP H ; Before we Return lets get back all the registers
POP D
POP B
POP PSW
RET
;------------------------------------------------------------------------
; End of GETKEY subroutine
;------------------------------------------------------------------------
;------------------------------------------------------------------------
; Data Area
;------------------------------------------------------------------------
DONE db CR,LF,'Reflow done.',CR,LF,CR,LF,'$'
SOP DS 2 ; Memory location SOP stores passed-B,
; and memory location SOP+1 stores passed-C
MEAS DS 2 ; MEAS stores passed-D, MEAS+1 store passed-E
OLDSP DS 2 ; Two bytes to stash CP/M's stack pointer
DS 64 ; Reserve 64 bytes for our stack
STKTOP: ; Set this label to the address at the top of
; our stack