Enabling and Disabling Interrupts

Device drivers may need to stop interrupts for some time. This can be due to various reasons. Usually, interrupts are blocked to avoid deadlocks. If an interrupt is shared, disabling interrupts are not adviced. Linux kernel provides these functions for enabling and disabling interrupts:
void  enable_irq(int irq);
void  disable_irq(int irq);
void  disable_irq_nosync(int irq);
Enabling or disabling interrupts through these functions result in updating interrupt mask for the specified irq in the PIC (Programmable Interrupt Controller). This is visibile for all processors and hence can be used even in SMP. Disabling an interrupt waits for the completion of currently executing interrupt handler (if any). But disabling through disable_irq_nosync returns immediately but may result in race conditions. Enabling/disabling interrupt for the current processor Starting from Linux 2.6 kernel, interrupt handling on the current processor can be disabled. Linux kernel provides these functions for the same.
void  local_irq_save(unsigned long flags);
void  local_irq_disable(void);
Calling local_irq_save function will disable interrupt on the current processor and prior to which it saves current interrupt state into flags (passed to this function). Calling local_irq_disable function will disable interrupt on the current processor but without saving state. Thus disabled interrupts can be enabled using these functions.
void  local_irq_restore(unsigned long flags);
void  local_irq_enable(void);
local_irq_restore function enables interrupts and restores state using flags (stored by local_irq_save). The local_irq_enable function enables interrupts unconditionally. Reference: Linux Device Drivers

Read More...
Bookmark and Share
Your Ad Here

Registering Interrupt Handler

Hardwares inform processor through signals called interrupts whenever they require processor's attention. Registering interrupt handler: Interrupt handlers are registered using request_irq and unregistered using free_irq calls.
int request_irq(
 unsignd int irq, /* Requested interrupt number */
 irqreturn_t (*handler)(int, void *, struct  pt_regs *), 
  /* Interrupt handler */
 unsigned long flags, /* Flags */
 const  char *dev_name,
  /* Device name -- used in /proc/interrupts */
 void *dev_id); /* Device id */

void  free_irq(
  unsigned int irq, /* Interrupt number */
  void *dev_id); /*  Device id */

Flags:
SA_INTERRUPT - Indicates a fast interrupt handler. Fast interrupt handlers are executed with interrupts disabled on the current processor.
SA_SHIRQ - Interrupt can be shared between devices.
SA_SAMPLE_RANDOM - device may be considered as a source of random events (and can be used by the kernel random number generator).

Since an interrupt can be shared and device id is unique for each device, free_irq takes both interrupt number and device id.

Registration of interrupt handlers can be done either during driver load or during initialization. Correspondingly, free_irq is called either during termination or during driver unload.

Reference: Linux Device Drivers

Read More...
Bookmark and Share
Your Ad Here

Network Alias - Interface with 2 IP addresses

It is possible to create network alias or assign 2 ip address to single NIC under FreeBSD.

My setup:

lnc0 - IP : 192.168.1.1/255.255.255.0
lnc0 alias  - IP : 192.168.1.5/255.255.255.255

Note: Netmask must be different otherwise you will get an error

ifconfig: ioctl (SIOCAIFADDR): File  exists

A) From command line use ifconfig command as follows:

#  ifconfig lnc0 192.168.1.5 netmask 255.255.255.255 alias

B) You can setup this alias in /etc/rc.conf file by appending following text, so that next time FreeBSD comes up (after restart/shutdown) it will create alias for you:

ifconfig_lnc0_alias0="192.168.1.5 netmask  255.255.255.255"

C)Restart FreeBSD network service using following script:

# /etc/netstart

D) Display alias and real ip using ifconfig lnc0 command:

# ifconfig lnc0
lnc0: flags=8843 mtu 1500
inet6  fe80::20c:29ff:fe01:ddbd%lnc0 prefixlen 64 scopeid 0x1
inet 192.168.1.2  netmask 0xffffff00 broadcast 192.168.1.255
inet 192.168.1.5 netmask 0xffff

Source: http://ipucu.enderunix.org/view.php?id=1127&lang=en

Read More...
Bookmark and Share
Your Ad Here

TCP Tuning - sysctls

net.inet.tcp.recvspace and net.inet.tcp.sendspace control how much buffer space is allotted per socket connection, per direction. This is how much data the kernel will cache on a socket while the application chews on it. For data streams over a slow link (even 10Mb enet) this won’t matter as the processor can keep up rather well. On really fast links like gigabit, this is essential. This setting defaults to 32K each way but moving that higher lets the system get more data cached and ready for your system if it falls behind the network traffic. As a test, I rose this to 250KB for each direction on my local machine and another on my local gigabit network. Transfers went from 12MB/s to 25MB/s. Thinking I could get more, I then transferred data from both hard drives on the remote machine. Sure enough, the remote hard drive was the limiting factor and it jumped to 35MB/s. Just be very sure to update kern.ipc.maxsockbuf to match the total of both settings before changing these settings, especially in sysctl.conf. If you do not, then the kernel will attempt to allocate the buffer space for the two buffers and run out of space, preventing essential services like NetInfo from making a loopback connection and preventing startup and/or login.net.inet.tcp.mssdflt sets the default Maximum Segment Size, or the largest that the system will set the data portion of a TCP packet to. By default, many BSD-derivatives will set this to 512 bytes. Bad. Ethernet supports a frame size of 1,500 bytes on links up to 100 Mb or 9,000 bytes on 1000Mb links so if either is what you’re connected to then up this bad boy to the size of your frame minus the size of a TCP header and options (60 bytes). For 10/100Mb links, set it to 1,440; for 1000Mb links that have been explicitly set to use jumbo frames on all participating computers, set it to 8940. Remember: if you set your frames larger than the frames your router allows (on a large LAN) then it could fragment the packet when re-sending or outright crash, and either would suck. If you fail to calculate this correctly and it goes over the size of the hardware frame, then you’ll be fragmenting every non-trivial packet you send, slowing down your network. Be careful. net.inet.tcp.sockthreshold sets the number of open sockets needed for the system to actually obey your sendspace and recvspace marks. If the number of open sockets is below this number then the buffers are set to 64K, regardless of what you’ve set them to. This means that the system allocates 64K of buffer space to each socket connection direction, or 128K total, until you get to 256 open sockets, at which point they’re set to 32K (the default send/recvspace values). Set to 0 to disable and always use your custom sizes (not really a good idea). kern.ipc.somaxconn controls the size of the connection listening queue and typically only needs to be adjusted in high-performance server environments. The default value of 128 is more than adequate for a home/work machine and most workgroup servers. If, however, you are running a high-volume server and connections are getting refused at a TCP level, then you want to increase this. This is a very tweakable setting in such a case. Too high and you’ll get resource problems as it tries to notify a server of a large number of connections and many will remain pending, and too low and you’ll get refused connections. Source: Adam Knight

Read More...
Bookmark and Share
Your Ad Here

INTx implementation in FreeBSD

FreeBSD's PCI interrupt routing code attempts to provide a machine independent framework that machine dependent code can hook into where necessary. First, FreeBSD uses cookie values defined by machine dependent code for SYS_RES_IRQ resources. This provides a way to handle interrupts in machine independent code and interfaces. Second, when the PCI bus needs to route an interrupt it passes the request up the device tree until it reaches a level where the request can be handled.

All interrupt resources in FreeBSD drivers are managed as SYS_RES_IRQ resources. When a driver wants to use an interrupt, it allocates a SYS_RES_IRQ resource in much the same way it allocates memory or I/O space. The driver can then attach an interrupt handler to that resource. When a PCI device attempts to allocate a INTx interrupt, the PCI bus first routes it to an IRQ value that is used to create the SYS_RES_IRQ resource. It does this by asking its parent device, which is either a Host-PCI or PCI-PCI bridge, to look up the IRQ for the given PCI interrupt. The different interrupt routing algorithms are then implemented in different drivers for Host-PCI and PCI-PCI bridge drivers.

The simplest PCI bridge driver is the PCI-PCI bridge driver. This driver's interrupt routing routine implements the swizzle defined in Section 5.1 by calculating the corresponding slot and pin on the upstream side of the bridge and passing the request up to the PCI bridge driver for the upstream PCI bus. Thus, routing requests for interrupts on busses that are not part of the main chassis will bubble up through the device tree until they hit a bridge for a PCI bus that is part of the main chassis.

Interrupt routing for PCI busses that are part of the main chassis is handled by machine dependent PCI bridge drivers. For example, if ACPI is enabled, then ACPI will probe and attach to all the PCI bridges in the ACPI namespace. When an interrupt routing request reaches a PCI bridge with an ACPI driver, it will use the _PRT for the corresponding PCI bus to determine the GSI for the PCI interrupt. It then maps the GSI to a SYS_RES_IRQ cookie value which it returns. Thus, the machine dependent code is responsible for mapping platform-specific interrupts to SYS_RES_IRQ cookies in the PCI bridge drivers. Then in the top-level root, or nexus, devices in the device tree, the machine dependent code is responsible for mapping the SYS_RES_IRQ resources back to the platform-specific interrupts.

FreeBSD does allow the user to override the IRQ for any given PCI interrupt via a tunable. The format for this tunable is hw.pcibus.slot.INTpin.irq where bus is the PCI bus number, slot is the PCI slot number, and pin is the intpin (A, B, C, or D). The value of the tunable is the IRQ to use for the specified PCI interrupt. This tunable should only be used as a last resort when there aren't more specific tunables (such as the PCI link tunables) available. One instance in which this tunable is useful is correcting hard-wired routing to I/O APIC intpins due to a broken MP Table or _PRT entry. For example, to route the PCI interrupt for bus 0, slot 16, INTA# to IRQ 24, set the loader tunable hw.pci0.16.INTA.irq=24.

IRQs are Yummy Cookies

For the x86 platforms, FreeBSD models the mapping of IRQ values to platform interrupts on the Global System Interrupts approach from ACPI. In fact, when using ACPI FreeBSD uses the GSI values directly as IRQs. FreeBSD also always maps IRQ values 0 through 15 to the sixteen ISA IRQs. The only remaining case is when using the MP Table to enumerate APICs and route interrupts. For this case, the MP Table code simulates the GSI approach by assigning suitable base IRQ values to each I/O APIC similar to th base GSI values used by ACPI. The MP Table code calculates the base IRQs by adding the number of input pins on each I/O APIC to the base IRQ of the current I/O APIC to determine the base IRQ of the next I/O APIC. Thus, if you have a system with three I/O APICs where the first two I/O APICs have 24 pins and the third I/O APIC has 16 pins, the first I/O APIC would be assigned IRQs 0-23, the second I/O APIC would be assigned IRQs 24-47, and the last I/O APIC would be assigned IRQs 48-63.

The x86 platforms use a global array indexed by the IRQ value to map the IRQs to platform interrupts. Each entry in the array is a pointer to an interrupt source object. Interrupt source objects consist of a struct intsrc which contains a pointer to a group of function pointers in a struct pic. One can think of struct intsrc and struct pic as abstract base classes. Each interrupt controller driver provides its own extended versions of struct pic and struct intsrc. The extended versions contain the base structure as the first member and add driver-specific data after that. For example, the I/O APIC code defines a struct ioapic which extends struct pic. Each instance of struct ioapic contains functions for managing I/O APIC input pins in its method table. It also defines a struct ioapic_intsrc which extends struct intsrc to add I/O APIC-specific data such as which I/O APIC input pin an interrupt source represents. The interrupt controller drivers determine the IRQ values for each interrupt source object. Thus, they must ensure the IRQ properly matches up with the IRQ value used for any PCI interrupts routed to that interrupt source.

IDT Vectors on x86

Once the operating system has mapped a PCI interrupt to an interrupt source, the only remaining step for x86 platforms is mapping the interrupt source to an IDT vector. IDT vectors range from 0 to 255, and IDT vectors 0-31 are reserved for CPU faults and exceptions and NMIs. In addition, FreeBSD uses vectors 240-255 for IPIs, vector 239 for the local APIC timer interrupt, and vector 128 for system calls. That leaves vectors 32-127 and 129-238 for device interrupts.

The 8259As each require 8 contiguous IDT vectors. They each can also interrupt the CPU even when all input pins are masked if a spurious interrupt occurs. Thus, vectors 32-47 are reserved for the 8259As, even when APICs are used instead of the 8259As.

The rest of the device interrupts are allocated on an as-needed basis to active interrupt sources. For example, I/O APIC input pins allocate an IDT vector the first time an interrupt handler is registered. Most I/O APIC input pins are never used, so this strategy avoids reserving IDT vectors for interrupt sources that will never trigger.

Source: John Baldwin

Read More...
Bookmark and Share
Your Ad Here

MSI implementation in FreeBSD

FreeBSD implements MSI messages as SYS_RES_IRQ interrupts similar to the legacy INTx interrupts. The driver visible differences include different resource IDs (legacy INTx interrupt is rid 0, MSI messages start at rid 1) for SYS_RES_IRQ resources and new APIs for allocating and releasing MSI messages.

Behind the scenes, the PCI bus driver is responsible for programming the various MSI registers. It also allocates IRQs to map MSI messages onto via requests to the parent bridge. These requests pass up through the various PCI bridge drivers (very similar to how PCI interrupt routing passes up through PCI bridges) until it finds a device that can allocate IRQs for MSI messages. Similar requests are forwarded up the device tree to release MSI IRQs no longer in use and to compute the address and data register values for an MSI IRQ.

For the x86 platforms, the PCI bridge requests bubble up through the device tree until they arrive at the nexus0 device. This device's driver proxies the requests over to the x86 MSI code. The x86 MSI code uses interrupt source objects to manage IRQs for MSI messages. It provides a single struct pic shared by all MSI interrupt sources. The MSI interrupt sources are created on the fly when a request is made by a driver to allocate MSI messages.

Each MSI interrupt source is assigned an IRQ value in the range 256 - 383. These IRQ values are used to avoid conflicting with the IRQs used for legacy INTx interrupts which use a range of 0 - 255. Once an MSI interrupt source is created, it is never destroyed, but it may be reused by a different device if it is released by a driver and another driver makes a subsequent allocation request.

When an MSI interrupt source is allocated, it is assigned an IDT vector. If the MSI interrupt source is released, it frees the IDT vector back to the system. If a driver requests multiple MSI messages, care must be taken to ensure that the group of MSI messages use an aligned, contiguous range of IDT vectors. An extension to MSI known as MSI-X removes this limitation since it provides for separate address and data registers for each message.

Source: John Baldwin

Read More...
Bookmark and Share
Your Ad Here

MSI support by a loaded driver

It is possible to change the type of interrupt in the driver to be loaded (if that driver has such support). But if a driver is loaded and if there is a need to know the type of interrupts used by loaded driver, even then it is possible to find that.

Run "vmstat -i". It lists the drivers, their irq number and the number of interrupts taken by them from system start. If the irq number is 256 or more, it confirms that the driver is using MSI/MSI-X interrupts.

 FBSD66# vmstat -i
 interrupt                          total       rate
 irq1: atkbd0                          20          0
 irq12: psm0                            8          0
 irq14: ata0                        63496          0
 irq15: ata1                           47          0
 irq256: nxge0                   32028634        219
 irq23: rl0 ehci0                  125340          0
 cpu0: timer                    292259075       2000
 Total                          324476620       2220

Read More...
Bookmark and Share
Your Ad Here