AltairClone.com

by **mail@gabrielegan.com** » May 4th, 2016, 4:05 pm

Well that was harder than I expected! I've been working on an assembler routine to 'reflow' a text paragraph, by which I mean changing the maximum allowed length of a line while word-wrapping so that line-breaks occur only between words.

With a short line length (30 characters) a word-wrapped paragraph might look like this:

<< Emma Woodhouse, handsome,
clever, and rich, with a
comfortable home and happy
disposition, seemed to unite
some of the best blessings
of existence; and had lived
nearly twenty-one years in
the world with very little
to distress or vex her. >>

With a longer line length (50 characters) the same word-wrapped paragraph would look like this:

<< Emma Woodhouse, handsome, clever, and rich, with
a comfortable home and happy disposition, seemed
to unite some of the best blessings of existence;
and had lived nearly twenty-one years in the world
with very little to distress or vex her. >>

By 'reflowing' a paragraph I mean transforming a block of text in memory from the first to the second example above simply by specifying a new maximum line length (known as the 'the measure' in type-setting).

It turned out to be particularly hard because of the convention of representing line-ends with two ASCII bytes: CR+LF. This convention means that the total number of characters in a paragraph changes when you reflow it since one-byte spaces become two-byte line endings and two-byte line endings become one-byte spaces. This means that when reflowing to a narrower measure the paragraph occupies more bytes than before (because it gains line endings) and when reflowing
to a wider measure the paragraph occupies fewer bytes than before (because it loses line endings).

This is a pain for memory management--you have to leave some space after the paragraph for it to grow--and it makes the programming fiddly. If only there were a single character to mark the end of a line the programming for reflowing would be easy and the paragraph would stay the same length in terms of bytes: all you'd do is change some spaces for line-ends and vice versa.

Then I remembered that Apple Macintosh computers use a single character for line endings (whereas MS-DOS/Windows machines use CR+LF) and I got to wondering if this is connected to the first Apple Macs being considered amazing because their GUIs could reflow paragraphs as you altered the size of the window the text was in. Until then, all that personal computers could do was 'reflow' as a command, not as a dynamic and visible on-screen GUI activity. Maybe, I thought, Apple made the decision to have one character line endings precisely so they could have fast, simple reflowing of paragraphs in their GUI

Through the kind offices of his publisher, O'Reilly Books, I was able to ask Andy Hertzfeld, the designer of the system software on the original Macintosh, whether this consideration was part of his thinking, and in fact it wasn't. He explained that the choice of a single byte line terminator as a standard on the Mac was inherited from the Apple Lisa computer, which itself inherited the standard from the USCD Pascal system used to develop the Lisa and Mac. He expressed a preference for one-byte line termination in any case, since "it's simpler and smaller (one character vs two) and cleaner and better defined (no opportunity for ambiguity -- in the two character approach, what happens if one character is missing?)"

My paragraph reflower is copied below. It's a subroutine that, for the purposes of testing and showing to others, is here accompanied by other subroutines (using CP/M BDOS system calls) for entering the paragraph at the keyboard and for displaying it after it has been reflowed. My reflower takes three passes through the paragraph: the first strips out the existing line endings and puts spaces in their places thus shortening the byte-count), the second applies the new measure by changing some of those spaces to single-byte line terminators (just a CR), and then the third pass turns these one-byte CR line terminators into two-byte CR+LF terminators, thus increasing the byte-count.

I reckon that with some thought I could combine the first two passes, stripping out the existing line endings and applying the new ones on-the-fly. But I believe that the third pass has to remain a separate operation because it needs to work backwards from the end of the paragraph, since the paragraph may grow. (At least, that's essential if the reflower works on the paragraph 'in-situ', as mine done, rather than making a copy of it elsewhere in memory.)

I'd be interested in critiques of this code, and in particular:

1) Is there a more efficient algorithm I could have used? I don't mean better reflowing in itself, as in Donald Knuth's 'Minimum raggedness' routine that counts the sum of the squares of the number of spaces at the ends of the lines. I mean just a smarter way to achieve the simple 'Minimum number of lines' result used in my program.

2) Does the program commit any stylistic errors that a beginner should seek to avoid? I've tried to be scrupulous about each subroutine leaving every register the way it found it and may have overdone the PUSHing and POPping needed to achieve this. To make sure I didn't overflow the CP/M allocated stack I set up my own 64-byte stack as the guide I'm using, 'Soul of CP/M' by Michael Waite and Robert Lafore, recommends doing that.

Last thought: am I wrong to think that the CP/M BDOS routines are really annoying in the way that they trash the registers each time you call them? Surely good programming practice would have been to avoid doing this? I've had to put a lot of work into PUSHing and POPping registers every time I call a BDOS routine because of this problem. CP/M is brilliant and all, but isn't this bad practice? Or is there a good reason that they do it that I haven't though of?

Regards

Gabriel Egan

Code: Select all: ;--------------------------------------------------------------------- ; IN-SITU PARAGRAPH REFLOWER. Parameters are passed to this ; subroutine thus: ; ; On entry, BC must contain the address of the start of the paragraph ; (hereafter called SOP) ; ; On entry DE must contains the new width ('measure') to be applied ; ; On entry, the paragraph must have ASCII CR/LF at the end of every ; line and CR/LF/CR/LF to mark the end of the paragraph. The same ; formatting will structure the reflowed paragraph on exit. ; ; During execution the passed parameters are stored in ; the subroutine's memory locations SOP and MEAS (allocated ; by the assembler), registers BC and DE get used as a pair of ; pointers to locations within the paragraph, and HL gets ; used to count the number of CRs inserted to reflow the ; paragraph. The use of 16-bit values for 'measure' ; and CR-count means there is no limit (other than the ; processor's own) on the width or length of the ; paragraph. Because two-byte line-endings (CR/LF) are ; necessarily changed to one-byte spaces and vice versa to ; perform reflowing, the resulting paragraph will be shorter ; in total bytes (if reflowed to a wider 'measure') or ; longer in total bytes (if reflowed to a narrower 'measure') ; than the original paragraph. It is the user's responsibility ; to ensure that enough memory locations are free directly ; after the end of the original paragraph to allow for any ; expansion. ; ; On exit, all registers are restored to their on-entry values. ; ; CP/M's stack pointer is saved in memory and another, larger stack ; is created for this program (using the 64 bytes after the program's ; end), and then CP/M's stack pointer is restored on exit ; ;-------------------------------------------------------------------- ;-------------------------------------------------------------------- ; Assign some symbols to make the code more legible ;-------------------------------------------------------------------- BDOS equ 05h ; The jump vector into BDOS GETCHAR equ 01h ; BDOS code for getting one char from keyboard WRTCHAR equ 02h ; BDOS code for writing a char to console WRTLINE equ 09h ; BDOS code for writing a line to the console CR equ 0Dh ; ASCII carriage return character LF equ 0Ah ; ASCII line feed character SPACE equ 20h ; ASCII space character CTRLC equ 03h ; ASCII Control-C character ;---------------------------------------------------------------- ; Here a short bit of code to call the REFLOW routine to show ; it works. First call GETKEY to have the user enter a ; paragraph, then call REFLOW to reflow it, and then call ; DISPlay to show the reflowed paragraph ;---------------------------------------------------------------- org 0100h ; Save all registers we came in with PUSH PSW ; First lets save all the registers PUSH B ; on the stack so we can restore PUSH D ; them upon exitting the routine PUSH H ; Set up our own stack starting 64 bytes after this program LXI H, 0h ; Init HL to 0 DAD SP ; HL = HL + SP (so LH = SP) SHLD OLDSP ; Stash away the CP/M stack pointer LXI SP, STKTOP ; And put the address of our stack into SP ; Set the variables for REFLOW, call DISP, then REFLOW, then DISP again LXI B, 8000h ; Put address of string-start into BC LXI D, 30h ; Put desired measure (here 48 decimal) into DE CALL GETKEY ; Let use type in a paragraph CALL REFLOW ; Call the subroutine for flowing CALL DISP ; Throw the flowed paragraph onto console ; Put back CP/M stack LHLD OLDSP ; Put into HL the 16-bit address of CP/M's ; stack pointer we stashed away earlier SPHL ; And put in the SP register ; Restore all the registers we came in with POP H ; Before we Return lets get back all the registers POP D POP B POP PSW RET ; go back to CP/M ;------------------------------------------------------------------------- ; END OF MAIN PROGRAM. The rest is subroutines called by above. ;------------------------------------------------------------------------- ;-------------------------------------------------------------------------- ; REFLOW actually starts here ;----------------------------------------------------------------- REFLOW PUSH PSW ; First lets save all the registers PUSH B ; on the stack so we can restore PUSH D ; them upon exitting the routine PUSH H ; Now save in memory the two variables (SOP and 'measure') ; passed to the subroutine via regs BC and DE LXI H, SOP ; Put the addresss of the variable SOP in HL MOV M, B ; and store there the contents of reg B LXI H, SOP+1 ; Put the address of second btye of SOP in HL MOV M, C ; and store there the contents of reg C LXI H, MEAS ; Do the same for 'measure' passed in DE MOV M, D LXI H, MEAS+1 MOV M, E ;------------------------------------------------------------------- ; Because we've saved their contents, we can use BC and DE as pointers ; and HL as the count of CRs. BC already contains the address of the ; start-of-block we wish to reflow ;-------------------------------------------------------------------- ;------------------------------------------------------------------ ; FIRST PASS. Turn all CR/LF pairs (except the two at the paragraph ; end) into single spaces, contracting the paragraph as necessary. ; BC will be the lo-pointer and DE the hi-pointer above it that ; skips over LFs we don't want to copy down to the lo-pointer ; location. In this pass we work our way up thru memory. ;------------------------------------------------------------------ MOV D, B ; Init. hi-pointer DE by MOV E, C ; copying lo-pointer BC LOOP1 LDAX D ; Copy one character from hi-pointer to STAX B ; lo-pointer CPI CR ; Is the char we just moved a CR? JNZ NOTCR ; If not, skip CR-handling INX D ; Ok, we hit a CR, so is it the beginning of INX D ; a CR/LR/CR/LF run? Look two chars ahead LDAX D ; and see if that is a CR CPI CR ; JNZ LINEND ; No, then skip ahead to line-end handler ; Yes, so we are at paragraph-end so let's tidy ; up the the end of new paragraph at lo-pointer. ; We already put a CR at lo-pointer to start MVI A, LF ; the CR/LF/CR/LF run at lo-pointer, INX B ; so increment lo-pointer and STAX B ; put a LR there MVI A, CR ; Then increment lo-pointer again and INX B ; put a CR there STAX B ; MVI A, LF ; Then do the same to put in the INX B ; last LF of CR/LF/CR/LF STAX B ; JMP PASS2 ; The para-end is now tidy so proceed to PASS2 LINEND ; If we get here the current char is CR but ; it's not the start of a CR/LF/CR/LF run. MVI A, SPACE ; We just placed at lo-pointer a CR that we STAX B ; need to make a space, so do that. DCX D ; We moved hi-pointer up 2 bytes to check for DCX D ; CR/LF/CR/LF so lets put it back, and then INX D ; move it up one to jump over the unwanted LF NOTCR INX B ; Increment lo-pointer INX D ; and hi-pointer JMP LOOP1 ; And go back for next character ;--------------------------------------------------------------------------------- ; SECOND PASS. We apply the new measure by turning the space after the ; last-word-that-fits (=the space before the first-word-that-won't-fit) ; on each line into a CR. (In the third pass we will turn these CRs into ; CR/LFs.) In this pass, BC will be lo-pointer, there is no hi-pointer, ; and DE (=space left) will start out holding the passed 'measure' and ; be decremented each time we accept a character and when it hits ; zero we've run out of room on the present line. DE (= space left) ; will be reset to 'measure' each time we start to process another ; HL will hold a count of the CRs we insert in this pass, which info ; is needed for the final pass. ;--------------------------------------------------------------------------------- PASS2 LXI H, SOP ; Lets get SOP pointer back into BC (since we used MOV B, M ; BC as lo-pointer in PASS1) by recovering it LXI H, SOP+1 ; from memory locations SOP and SOP+1 MOV C, M LXI H, MEAS ; And do the same for DE to hold Measure (since MOV D, M ; we used DE as hi-pointer in PASS1) LXI H, MEAS+1 ; MOV E, M LXI H, 0 ; Set HL (counter of CRs inserted) to zero LOOP2 LDAX B ; Get the char. currently lo-pointed to. CPI CR ; Is it a CR? If so we've reach end-of-para JZ PASS3 ; and should proceed to PASS3 ; So, it's not a CR. Do we need to put in a ; line-break because DE (=space left) = 0? MOV A, D ; Put MSB of DE (=space left) into accumm. ORA E ; and OR with LSB of DE. If the result is not JNZ GOTROOM ; zero then DE (space left) still has room ; Okay, DE (=space left) is zero so we need to ; back up one char at a time to find a space ; and put a CR there NOTSPC DCX B ; Move lo-pointer back one character LDAX B ; Get the char. pointed to CPI SPACE ; Is it a space? JNZ NOTSPC ; No, go around again to back up another char MVI A, CR ; Yes, it's a space so lets put a CR here, STAX B ; overwriting that space LDA MEAS ; Since we are thus starting a new line we MOV D, A ; need to restore DE (=space left) LDA MEAS+1 ; to the passed value of Measure MOV E, A INX B ; Move lo-pointer up on to first byte after ; the CR we just placed, so it points to ; the zeroeth char of this new line INX H ; Increment count of how many CRs we inserted GOTROOM INX B ; Point to next character in paragraph DCX D ; Decrement DE (= space left) to allow for ; the char we just accepted JMP LOOP2 ; Go back and process the next char. ;----------------------------------------------------------------------- ; THIRD PASS. Turn the single CRs that mark line-endings in the reflowed ; paragraph into CR/LF pairs, expanding the paragraph as necessary. ; We will use BC as lo-pointer and DE as hi-pointer. BC already ; points to the CR in the paragraph-ending CR/LF/CR/LF string and ; HL already contains the count of how many CRs we inserted in PASS2. ; We will work backwards in memory, starting with hi-pointer = ; lo-pointer + 3 (to allow for LR/CR/LF) + HL (to allow for one ; extra byte for each CR that we have to expand to CR/LF). ;----------------------------------------------------------------------- PASS3 MOV D, B ; Init point hi-pointer DE by copying MOV E, C ; lo-pointer BC INX D ; Move hi-pointer DE up 3 bytes up so that it INX D ; it points to the last byte in the INX D ; paragraph (= last LF of the CR/LR/CR/LF run) XCHG ; Add HL (count of CRs) to hi-pointer, bearing DAD D ; in mind that DAD (which adds DE to HL) XCHG ; leaves its result in HL and we want it in ; DE instead, so we swap HL<>DE before we do ; the addition and swap them again after MVI A, LF ; Now write out the CR/LF/CR/LF string that ends STAX D ; the new reflowed paragraph. In reverse order MVI A, CR ; we put LF, CR, LF, CR into hi-pointer, DCX D ; hi-pointer-1, hi-pointer-2, and hi-pointer-3, STAX D ; by decrementing hi-pointer each time MVI A, LF DCX D STAX D MVI A, CR DCX D STAX D ; lo-pointer BC now points to the first CR at ; the end of the old paragraph and hi-pointer DE ; points to the first CR at the end of the new ; longer paragraph LOOP3 DCX B ; Move both pointers down one byte DCX D ; to get ready to process next char. LDAX B ; Get the BE lo-pointer character into the accum. STAX D ; Put it in the DE hi-pointer location CPI 0Dh ; Was it a CR? JNZ LOOP3 ; No, then go back for next char MVI A, LF ; Yes, so we have CR->CR/LF handling to do. ; First, put an LF in the hi-pointer location to STAX D ; overwrite the CR we just stored there DCX D ; Then back up one byte to the location before MVI A, CR ; that LF and put an CR there STAX D ; DCX H ; We've just handled a CR->CR/LF so lower the ; count of line-ends still to do MOV A, H ; Is HL=0? Find out by putting MSB of HL ORA L ; into accumm and ORing with LSB of HL JNZ LOOP3 ; If the result not 0 we've more lines to do ; If result is zero, we've just done the last ; CR->CR/LF expansion and are done ; Write out closing message "Reflow done." LXI D, DONE ; Load DE with the address of the start ; of the prompt string MVI C, WRTLINE ; Load C with the BDOS call number for ; outputting a string to console CALL BDOS ; Make the BDOS call POP H ; Before we Return lets get back all the registers POP D POP B POP PSW RET ;--------------------------------------------------------------------- ; End of REFLOW subroutine ;--------------------------------------------------------------------- ;--------------------------------------------------------------------- ; DISPlay. This subroutine displays the paragraph that begins at the ; location pointed to by BC on entry. We will use the CP/M ; BDOS call "Console Output" in which the character to be displayed ; must be put in register E and the value 02h must be put in ; register C before CALLing BDOS (at location 0005h). All registers ; are restored before return ;--------------------------------------------------------------------- DISP PUSH PSW ; First lets save all the registers PUSH B ; on the stack so we can restore PUSH D ; them upon exitting the routine PUSH H LOOP0 PUSH B ; Save BC (because BDOS call will trash it) LDAX B ; Get the character pointed to by BC MOV E, A ; Put it into E MVI C, WRTCHAR ; Put 02h into C (thereby trashing BC) CALL BDOS ; Make the call that displays this character POP B ; Restore BC LDAX B ; Get the character we just displayed INX B ; Point to next character CPI LF ; Was it a line-feed we just displayed? JNZ LOOP0 ; No, so go back for the next character ; Yes, so lets see if it was the LF at ; the end of a CR/LF/CR/LF run DCX B ; Back up three memory locations DCX B ; (not two because we already incremented BC) DCX B ; LDAX B ; Get that character INX B ; INX B ; Put BC back where it was INX B ; CPI LF ; Is that also a LF? JNZ LOOP0 ; No, then merely a line-end, so go round again ; Yes, so we've displayed the entire paragraph ; and can stop displaying POP H ; Before we Return lets get back all the registers POP D POP B POP PSW RET ;----------------------------------------------------------------------- ; End of DISPlay subroutine ;----------------------------------------------------------------------- ;----------------------------------------------------------------------- ; GETKEY subroutine. This uses the get-one-char BDOS call ;----------------------------------------------------------------------- GETKEY PUSH PSW ; First lets save all the registers PUSH B ; on the stack so we can restore PUSH D l; them upon exitting the routine PUSH H LOOP PUSH B ; Save BC because BDOS call will trash it MVI C, 01h ; Load up C with the BDOS call number CALL 05h ; Make BDOS call to get a char and display it POP B ; Get back the BC that BDOS just trashed CPI 03h ; Did the user just hit CTRL-C? JZ CTC ; Yes, then we're done STAX B ; No, then store at end-of-block the incoming char CPI 0Dh ; Was it a carriage-return? JNZ NOTLE ; No, then skip line-end handler INX B ; Yes, then point BC to next free memory location MVI A, 0Ah ; Get a line-feed STAX B ; Save it after the CR we just save PUSH B ; We need another BDOS call to show this LF MOV E, A ; so save BC, put the LF into E, MVI C, 02h ; load up C with the BDOS call number, CALL 05h ; and make the BDOS call to display this LF POP B ; Get back the BC that the BDOS call trashed NOTLE INX B ; Point BC to the next free memory location JMP LOOP ; Go back for another character from keyboard CTC POP H ; Before we Return lets get back all the registers POP D POP B POP PSW RET ;------------------------------------------------------------------------ ; End of GETKEY subroutine ;------------------------------------------------------------------------ ;------------------------------------------------------------------------ ; Data Area ;------------------------------------------------------------------------ DONE db CR,LF,'Reflow done.',CR,LF,CR,LF,'$' SOP DS 2 ; Memory location SOP stores passed-B, ; and memory location SOP+1 stores passed-C MEAS DS 2 ; MEAS stores passed-D, MEAS+1 store passed-E OLDSP DS 2 ; Two bytes to stash CP/M's stack pointer DS 64 ; Reserve 64 bytes for our stack STKTOP: ; Set this label to the address at the top of ; our stack

by **AltairClone** » May 6th, 2016, 7:38 pm

Great work! The best way to improve your coding skills and style is to simply start writing and debugging code - which you've done. There are a number of things in the code that could be changed for speed and/or space efficiency, but that doesn't really matter yet. Great start with a fairly complex program!

Mike

AltairClone.com

Reflowing a paragraph

Reflowing a paragraph

Re: Reflowing a paragraph

Who is online