Marty, Yep, I did get your patch before I spoke with Paul Mackerras. I tried it and it completely broke TFTP. The reason, as it turns out, is that the udelay(300) in ntulip_transmit is there to prevent the routine from returning before the packet is sent out. Your patch didn't make provision for this and it is needed as the actual packet body to be transmitted will get trashed by the upper level code once this routine returns. The solution is to wait on the txd.status until it changes (requires declaring the txd structure "volatile" - this is now causing compile-time warnings which I haven't fixed yet, but the end result is still what is required). Your point about this driver supporting a variety of cards is definitely important. I don't believe that the changes Paul and I have made would cause problems for any card which is using a tulip or clone chipset, however, stranger things have happened. Thanks for offering to look into it more closely. The card we are using is the TRENDnet TE100-PCIA (www.trendware.com) which is using a genuine Intel 21143-PD chipset and appears to be based on the Intel reference design (ie. it uses the MII instead of the built-in 10Mbps interface). The patch is attached. I hope it is formatted correctly. Cheers, Bob Edwards. Marty Connor wrote: > > On 2/21/2000 2:53 AM Bob Edwards Robert.Edwards@anu.edu.au wrote: > >Thanks to all who replied to my enquiry about the suspect speed of > >the ntulip driver. Paul Mackerras of Linuxcare's OZLabs (and the guy who > >ported Linux to the PPC) sat down with me earlier in the day and helped > >me figure out what was wrong (he has contributed to the de4x5 driver). > > Glad you had some success, Bob! Many thanks to Paul for his efforts! > > >Basically, there were three main problems: > > - udelay was broken. > > Well, yes, udelay was/is broken (as the comments in the source code > indicated, it is for very gross delays only), but, udelay was only used > in two sections. First, it was used in the EEPROM reading code. This > happened well before the frames were transmitted, and thus, by the time > the Etherboot code printed the card's MAC address on the screen (a > fraction of a second into booting) this code was long over. > > The other use of udelay was at the end of ntulip_transmit, which I > immediately (and emphatically) suggested removing in my patch, as it was > clearly not needed. > > > - ntulip_transmit doesn't need to stop and start the transmitter for > > each frame > > True, it probably doesn't; But the delay introduced by this code amounts > to a couple of bus cycles (outl calls), which is negligable. In the > environment we're in (pre-boot, supporting a variety of cards), turning > off and on the transmitter, though over-kill (for your card) may be more > robust for some cards. Probably a NOOP in most cases. Remember, we're > supporting a number of cards here, and defensive programming, though > marginally slower, may be more robust. > > > - ntulip_transmit doesn't need to wait 300us (actually ~3800us) before > > returning, it can simply wait until the frame has gone. > > >From your message (and my analysis), this seemed to be the real > time-stealer, and my patch removed this call to udelay, which probably > got back 3800/3900 or over 97.4% of the delay you were seeing. I guess > you did not receive my email that described this fix in a timely fashion. > (Maybe your network connection is infrequent? I did reply to your first > message with a patch within 8 hours of your message appearing at my mail > server on Sunday morning.) > > >With these changes to ntulip.c, we have now got the 5.4MB download time > >down from 42+ seconds to 10.5 seconds. Still not as good as the eepro100 > >driver (at 3.6 seconds), but a lot better (over 30 seconds less per boot). > >I will forward the patches onto Ken as soon as I have them formatted > >correctly. > > There are a few other things that I noticed that may be contributing to > the performance difference you are seeing with you 21143s. As soon as I > see your patches, I'd like to suggest a couple of other optimizations for > your particular card that might give you even better performance. I > suspect part of the initialization code being run for your card is not > optimal. > > What is the brand and model of your Ethernet cards? If I can find one > locally, I'd be happy to try to optimize the driver to lower latency and > increase performance for you. > > Remember that ntulip.c supports a number of Tulip clone cards, and some > of the code which might appear not applicable to the 21143 may be > necessary for other cards to be reliable, thus we may need to > conditionalize your optimizations. The Linux driver, for example says it > support over 100 tulip models and clones, and is heavily conditionalized. > The general execution path is sometimes not optimal for any one card, but > can be optimized by conditionalization where the performance/reliability > improvement warrants. > > In any case, I'm glad you're seeing better performance, and hope that my > efforts contributed to the improvement. > > Regards, > > Marty > > --- > Name: Martin D. Connor > US Mail: Entity Cyber, Inc.; P.O. Box 391827; Cambridge, MA 02139; USA > Voice: (617) 491-6935, Fax: (617) 491-7046 > Email: mdc@thinguin.org > Web: http://www.thinguin.org/
--- ntulip.c~ Sat Feb 5 17:45:17 2000 +++ ntulip.c Mon Feb 21 12:25:08 2000 @@ -112,7 +112,7 @@ #define le16_to_cpu(val) (val) /* Calibration constant for udelay loop. Very approximate */ -#define UADJUST 870 +#define UADJUST 16 /* transmit and receive descriptor format */ struct txrxdesc { @@ -131,12 +131,12 @@ longword divisable */ /* transmit descriptor and buffer */ -static struct txrxdesc txd; +static volatile struct txrxdesc txd; static unsigned char txb[BUFLEN]; /* receive descriptor(s) and buffer(s) */ #define NRXD 4 -static struct txrxdesc rxd[NRXD]; +static volatile struct txrxdesc rxd[NRXD]; static int rxd_tail = 0; static unsigned char rxb[NRXD][BUFLEN]; @@ -179,7 +179,7 @@ static void udelay(unsigned long usec) { unsigned long i; - for (i=((usec*UADJUST)/33)+1; i>0; i--) + for (i=usec*33/UADJUST+1; i>0; i--) (void) inl(ioaddr + CSR0); } @@ -313,7 +313,7 @@ txd.buf2addr = &txb[0]; /* just in case */ txd.buf1sz = 192; /* setup packet must be 192 bytes */ txd.buf2sz = 0; - txd.control = 0x020; /* setup packet */ + txd.control = 0x028; /* setup packet */ txd.status = 0x80000000; /* give ownership to device */ /* construct perfect filter frame with mac address as first match @@ -391,7 +391,6 @@ /* Reset the chip, holding bit 0 set at least 50 PCI cycles. */ outl(0x00000001, ioaddr + CSR0); udelay(50000); - udelay(50000); /* turn off reset and set cache align=16lword, burst=unlimit */ outl(0x01A08000, ioaddr + CSR0); @@ -445,6 +444,11 @@ outl(csr6, ioaddr + CSR6); outl(csr6 | 0x00002000, ioaddr + CSR6); outl(csr6 | 0x00002002, ioaddr + CSR6); + + /* Wait for the setup frame to go out. */ + while (txd.status & 0x80000000) + ; + } @@ -461,9 +465,6 @@ whereami("ntulip_transmit\n"); - /* Stop Tx */ - outl(inl(ioaddr + CSR6) & ~0x00002000, ioaddr + CSR6); - /* setup ethernet header */ memcpy(ehdr, d, ETHER_ADDR_SIZE); memcpy(&ehdr[ETHER_ADDR_SIZE], nic->node_addr, ETHER_ADDR_SIZE); @@ -471,20 +472,16 @@ ehdr[ETHER_ADDR_SIZE*2+1] = t & 0xff; /* setup the transmit descriptor */ - memset(&txd, 0, sizeof(struct txrxdesc)); txd.buf1addr = &ehdr[0]; /* ethernet header */ txd.buf1sz = ETHER_HDR_SIZE; txd.buf2addr = p; /* packet to transmit */ txd.buf2sz = s; txd.control = 0x00000188; /* LS+FS+TER */ - txd.status = 0x80000000; /* give it the device */ - - /* Point to transmit descriptor */ - outl((unsigned long)&txd, ioaddr + CSR4); + txd.status = 0x80000000; /* give it to the device */ - /* Start Tx */ - outl(inl(ioaddr + CSR6) | 0x00002000, ioaddr + CSR6); - udelay(300); + /* Wait for packet to go out */ + while (txd.status & 0x80000000) + ; } /*********************************************************************/
For requests or suggestions regarding this mailing list archive please write to netboot@gkminix.han.de.