openwrt/staging/blogic.git
16 years agokmod: fix race in usermodehelper code
Neil Horman [Tue, 22 Sep 2009 23:43:36 +0000 (16:43 -0700)]
kmod: fix race in usermodehelper code

The user mode helper code has a race in it.  call_usermodehelper_exec()
takes an allocated subprocess_info structure, which it passes to a
workqueue, and then passes it to a kernel thread which it creates, after
which it calls complete to signal to the caller of
call_usermodehelper_exec() that it can free the subprocess_info struct.

But since we use that structure in the created thread, we can't call
complete from __call_usermodehelper(), which is where we create the kernel
thread.  We need to call complete() from within the kernel thread and then
not use subprocess_info afterward in the case of UMH_WAIT_EXEC.  Tested
successfully by me.

Signed-off-by: Neil Horman <[email protected]>
Cc: Rusty Russell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMove magic numbers into magic.h
Nick Black [Tue, 22 Sep 2009 23:43:33 +0000 (16:43 -0700)]
Move magic numbers into magic.h

Move various magic-number definitions into magic.h.

Signed-off-by: Nick Black <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Cc: Al Viro <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Casey Schaufler <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoprintk: add printk_delay to make messages readable for some scenarios
Dave Young [Tue, 22 Sep 2009 23:43:33 +0000 (16:43 -0700)]
printk: add printk_delay to make messages readable for some scenarios

When syslog is not possible, at the same time there's no serial/net
console available, it will be hard to read the printk messages.  For
example oops/panic/warning messages in shutdown phase.

Add a printk delay feature, we can make each printk message delay some
milliseconds.

Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay

The value range from 0 - 10000, default value is 0

[[email protected]: fix a few things]
Signed-off-by: Dave Young <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoprintk boot_delay: rename printk_delay_msec to loops_per_msec
Dave Young [Tue, 22 Sep 2009 23:43:31 +0000 (16:43 -0700)]
printk boot_delay: rename printk_delay_msec to loops_per_msec

Rename `printk_delay_msec' to `loops_per_msec', because the patch "printk:
add printk_delay to make messages readable for some scenarios" wishes to
more appropriately use the `printk_delay_msec' identifier.

[[email protected]: add a comment]
Signed-off-by: Dave Young <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopoll/select: avoid arithmetic overflow in __estimate_accuracy()
Guillaume Knispel [Tue, 22 Sep 2009 23:43:30 +0000 (16:43 -0700)]
poll/select: avoid arithmetic overflow in __estimate_accuracy()

__estimate_accuracy() was prone to integer overflow, for example if *tv ==
{2147, 483648000} on a 32 bit computer (or even for delays as small as
{429, 500000000} if the task is niced).

Because the result was already forced between 0 and 100ms, the effect of
the overflow was not too problematic, but the use of the hrtimer range
feature was not optimal in overflow cases.

This patch ensures that there can not be an integer overflow in this
function.

Signed-off-by: Guillaume Knispel <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agokprobes: use do_IRQ() in lkdtm
M. Mohan Kumar [Tue, 22 Sep 2009 23:43:29 +0000 (16:43 -0700)]
kprobes: use do_IRQ() in lkdtm

Current lkdtm code puts a probe on __do_IRQ for some of the kdump test
cases.  Since __do_IRQ is deprecated, change lkdtm code to use do_IRQ
function.

Signed-off-by: M. Mohan Kumar <[email protected]>
Cc: Ankita Garg <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agosmbfs: read buffer overflow
Roel Kluin [Tue, 22 Sep 2009 23:43:28 +0000 (16:43 -0700)]
smbfs: read buffer overflow

This function uses signed integers for the unix_date and local variables -
if a negative number is supplied and the leap-year condition is not met,
month will be 0, leading to a read of day_n[-1]

Signed-off-by: Roel Kluin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoinclude/linux/kmemcheck.h: fix a trillion warnings
Andrew Morton [Tue, 22 Sep 2009 23:43:27 +0000 (16:43 -0700)]
include/linux/kmemcheck.h: fix a trillion warnings

of the form

include/net/inet_sock.h:208: warning: ISO C90 forbids mixed declarations and code

Cc: Johannes Berg <[email protected]>
Acked-by: Vegard Nossum <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 22 Sep 2009 15:11:04 +0000 (08:11 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_event, powerpc: Fix compilation after big perf_counter rename

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard...
Linus Torvalds [Tue, 22 Sep 2009 15:07:54 +0000 (08:07 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/vegard/kmemcheck

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck:
  kmemcheck: add missing braces to do-while in kmemcheck_annotate_bitfield
  kmemcheck: update documentation
  kmemcheck: depend on HAVE_ARCH_KMEMCHECK
  kmemcheck: remove useless check
  kmemcheck: remove duplicated #include

16 years agoMerge branch 'for-2.6.32' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Tue, 22 Sep 2009 14:54:33 +0000 (07:54 -0700)]
Merge branch 'for-2.6.32' of git://linux-nfs.org/~bfields/linux

* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
  nfsd4: nfsv4 clients should cross mountpoints
  nfsd: revise 4.1 status documentation
  sunrpc/cache: avoid variable over-loading in cache_defer_req
  sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
  nfsd: return success for non-NFS4 nfs4_state_start
  nfsd41: Refactor create_client()
  nfsd41: modify nfsd4.1 backchannel to use new xprt class
  nfsd41: Backchannel: Implement cb_recall over NFSv4.1
  nfsd41: Backchannel: cb_sequence callback
  nfsd41: Backchannel: Setup sequence information
  nfsd41: Backchannel: Server backchannel RPC wait queue
  nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
  nfsd41: Backchannel: callback infrastructure
  nfsd4: use common rpc_cred for all callbacks
  nfsd4: allow nfs4 state startup to fail
  SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
  nfsd4: fix null dereference creating nfsv4 callback client
  nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
  nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
  sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
  ...

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Linus Torvalds [Tue, 22 Sep 2009 14:51:45 +0000 (07:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/trivial

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  trivial: fix typo in aic7xxx comment
  trivial: fix comment typo in drivers/ata/pata_hpt37x.c
  trivial: typo in kernel-parameters.txt
  trivial: fix typo in tracing documentation
  trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
  trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
  trivial: remove unnecessary semicolons
  trivial: Fix duplicated word "options" in comment
  trivial: kbuild: remove extraneous blank line after declaration of usage()
  trivial: improve help text for mm debug config options
  trivial: doc: hpfall: accept disk device to unload as argument
  trivial: doc: hpfall: reduce risk that hpfall can do harm
  trivial: SubmittingPatches: Fix reference to renumbered step
  trivial: fix typos "man[ae]g?ment" -> "management"
  trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
  trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
  trivial: fix missing printk space in amd_k7_smp_check
  trivial: fix typo s/ketymap/keymap/ in comment
  trivial: fix typo "to to" in multiple files
  trivial: fix typos in comments s/DGBU/DBGU/
  ...

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Linus Torvalds [Tue, 22 Sep 2009 14:51:28 +0000 (07:51 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: Remove duplicate Kconfig entry
  HID: consolidate connect and disconnect into core code
  HID: fix non-atomic allocation in hid_input_report

16 years agoinput: add a driver for the Winbond WPCD376I Consumer IR hardware
David Härdeman [Tue, 22 Sep 2009 00:04:53 +0000 (17:04 -0700)]
input: add a driver for the Winbond WPCD376I Consumer IR hardware

Add a driver for the the Consumer IR (CIR) functionality of the Winbond
WPCD376I chipset (found on e.g. Intel DG45FC motherboards).

Signed-off-by: David Härdeman <[email protected]>
Reviewed-by: Jesse Barnes <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopnp: add a shutdown method to pnp drivers
David Härdeman [Tue, 22 Sep 2009 00:04:52 +0000 (17:04 -0700)]
pnp: add a shutdown method to pnp drivers

The shutdown method is used by the winbond cir driver to setup the
hardware for wake-from-S5.

Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: David Härdeman <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agohwmon: applesmc: restore accelerometer and keyboard backlight on resume
Henrik Rydberg [Tue, 22 Sep 2009 00:04:50 +0000 (17:04 -0700)]
hwmon: applesmc: restore accelerometer and keyboard backlight on resume

On resume from suspend, the driver currently resets the logical state as
if it was brought up from halt.  This patch uses the
dev_pm_ops.resume/restore methods to synchronize the hardware with the
memorized logical state, in effect bringing back the accelerometer and
backlight to the state prior to suspend.  Works for both suspend to ram
and hibernation.  The patch has zero effect on the running state.

Signed-off-by: Henrik Rydberg <[email protected]>
Cc: Nicolas Boichat <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agohwmon: fix freeing of gpio_data and irq
Roel Kluin [Tue, 22 Sep 2009 00:04:48 +0000 (17:04 -0700)]
hwmon: fix freeing of gpio_data and irq

If already requested, gpio_data and irq should be freed in the case of an
error.

Signed-off-by: Roel Kluin <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
Cc: David Brownell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agodrivers/hwmon/adm1021.c: add low_power support for adm1021 driver
Michael Abbott [Tue, 22 Sep 2009 00:04:47 +0000 (17:04 -0700)]
drivers/hwmon/adm1021.c: add low_power support for adm1021 driver

Occasionally it is helpful to be able to turn a temperature sensor off
(for example if it's making unwanted electrical noise).  This patch
adds a sysfs node to put any adm1021 compatible device into low power mode.

Signed-off-by: Michael Abbott <[email protected]>
Cc: Jean Delvare <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agodrivers/hwmon/adm1021.c: support high precision ADM1023 remote sensor
Michael Abbott [Tue, 22 Sep 2009 00:04:46 +0000 (17:04 -0700)]
drivers/hwmon/adm1021.c: support high precision ADM1023 remote sensor

The ADM1023 temperature sensor supports higher resolution for its external
sensor (sensitivity of 1/8 deg C).  This patch makes this higher
resolution available through the appropriate temperature sysfs nodes.

Curiously, this functionality was available in the 2.4 kernel driver (but
formatted in a less helpful manner).

Cc: Jean Delvare <[email protected]>
Signed-off-by: Michael Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agolis3_spi: code cleanups
Daniel Mack [Tue, 22 Sep 2009 00:04:45 +0000 (17:04 -0700)]
lis3_spi: code cleanups

Signed-off-by: Daniel Mack <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: Eric Piel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agolis3: add power management functions
Daniel Mack [Tue, 22 Sep 2009 00:04:44 +0000 (17:04 -0700)]
lis3: add power management functions

This enabled power management functions for the SPI transport layer of the
lis3 devices.  The device's suspend mode is only entered in case no wakeup
threshold has been given.  In this case, the device is supposed to wake up
the system and must thus not be put to deep sleep.

[[email protected]: fix lis3-spi for CONFIG_PM=n]
Signed-off-by: Daniel Mack <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: Eric Piel <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agolis3: add free-fall/wakeup function via platform_data
Daniel Mack [Tue, 22 Sep 2009 00:04:43 +0000 (17:04 -0700)]
lis3: add free-fall/wakeup function via platform_data

This offers a way for platforms to define flags and thresholds for the
free-fall/wakeup functions of the lis302d chips.

More registers needed to be seperated as they are specific to the

Signed-off-by: Daniel Mack <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: Eric Piel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agolis3: fix typo
Daniel Mack [Tue, 22 Sep 2009 00:04:42 +0000 (17:04 -0700)]
lis3: fix typo

Bit 0x80 in CTRL_REG3 is an ACTIVE_LOW rather than an ACTIVE_HIGH
function, I got that wrong during my last change.

Signed-off-by: Daniel Mack <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: Eric Piel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agodrivers/hwmon/coretemp.c: enable the Intel Atom
Michael Riepe [Tue, 22 Sep 2009 00:04:41 +0000 (17:04 -0700)]
drivers/hwmon/coretemp.c: enable the Intel Atom

Enable the coretemp driver on an Intel Atom.

I'm not sure if the readings are correct, however - on my 330, the driver
reports values between 27 and 41 °C (with core1 being about 8°C hotter
than core0, given the same load).  Maybe the maximum temperature of 100 °C
is wrong for Atom CPUs.

Cc: Arjan van de Ven <[email protected]>
Cc: Rudolf Marek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: add some common Blackfin checks
Mike Frysinger [Tue, 22 Sep 2009 00:04:40 +0000 (17:04 -0700)]
checkpatch: add some common Blackfin checks

Add checks for Blackfin-specific issues that seem to crop up from time to
time.  In particular, we have helper macros to break a 32bit address into
the hi/lo parts, and we want to make sure people use the csync/ssync
variant that includes fun anomaly workarounds.

Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
Cc: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: version 0.29
Andy Whitcroft [Tue, 22 Sep 2009 00:04:39 +0000 (17:04 -0700)]
checkpatch: version 0.29

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: limit sN/uN matches to actual bit sizes
Andy Whitcroft [Tue, 22 Sep 2009 00:04:38 +0000 (17:04 -0700)]
checkpatch: limit sN/uN matches to actual bit sizes

Limit our type matcher to the s/u/le/be etc sizes that actually exist to
prevent miss categorising s2 as a type.  Fix up the spelling of the error
also.

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: format strings should not have brackets in macros
Andy Whitcroft [Tue, 22 Sep 2009 00:04:38 +0000 (17:04 -0700)]
checkpatch: format strings should not have brackets in macros

We should not recommend braces for the following:

    #define pr_fmt(fmt)    "%s: " fmt, __func__

allow things with double quotes round them to avoid this check.

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: make -f alias --file, add --help, more verbose help message
Hannes Eder [Tue, 22 Sep 2009 00:04:37 +0000 (17:04 -0700)]
checkpatch: make -f alias --file, add --help, more verbose help message

Impact:
  - More verbose help/usage message.
  - Make the option -f an alias for --file.
  - On -h, --help, and --version display help message and exit(0).
  - With no FILE(s) given, exit(1) with "no input files".
  - On invalid options display help/usage and exit(1).

Based on a patch by Pavel Machek.

Signed-off-by: Hannes Eder <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: indent checks -- stop when we run out of continuation lines
Andy Whitcroft [Tue, 22 Sep 2009 00:04:36 +0000 (17:04 -0700)]
checkpatch: indent checks -- stop when we run out of continuation lines

Ensure we terminate when there are no futher continuation lines when
trying to determine relative indent of conditionals and their blocks.

Reported-by: John Daiker <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: handle C99 comments correctly (performance issue)
Daniel Walker [Tue, 22 Sep 2009 00:04:35 +0000 (17:04 -0700)]
checkpatch: handle C99 comments correctly (performance issue)

This fixes the sanitation process in checkpatch.pl so that it blocks out
the text after a C99 style comment the same way it does with block style
comments.  This prevents the text from getting processed as regular code.

Signed-off-by: Daniel Walker <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocheckpatch: possible types -- else cannot start a type
Andy Whitcroft [Tue, 22 Sep 2009 00:04:34 +0000 (17:04 -0700)]
checkpatch: possible types -- else cannot start a type

An else cannot start a type, it would have to be within a block after the
else.  This can trigger false modifier matching.

Signed-off-by: Andy Whitcroft <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoflex_array: add missing kerneldoc annotations
David Rientjes [Tue, 22 Sep 2009 00:04:33 +0000 (17:04 -0700)]
flex_array: add missing kerneldoc annotations

Add kerneldoc annotations for function formals of type struct flex_array
and gfp_t which are currently lacking.

Signed-off-by: David Rientjes <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoflex_array: introduce DEFINE_FLEX_ARRAY
David Rientjes [Tue, 22 Sep 2009 00:04:33 +0000 (17:04 -0700)]
flex_array: introduce DEFINE_FLEX_ARRAY

FLEX_ARRAY_INIT(element_size, total_nr_elements) cannot determine if
either parameter is valid, so flex arrays which are statically allocated
with this interface can easily become corrupted or reference beyond its
allocated memory.

This removes FLEX_ARRAY_INIT() as a struct flex_array initializer since no
initializer may perform the required checking.  Instead, the array is now
defined with a new interface:

DEFINE_FLEX_ARRAY(name, element_size, total_nr_elements)

This may be prefixed with `static' for file scope.

This interface includes compile-time checking of the parameters to ensure
they are valid.  Since the validity of both element_size and
total_nr_elements depend on FLEX_ARRAY_BASE_SIZE and FLEX_ARRAY_PART_SIZE,
the kernel build will fail if either of these predefined values changes
such that the array parameters are no longer valid.

Since BUILD_BUG_ON() requires compile time constants, several of the
static inline functions that were once local to lib/flex_array.c had to be
moved to include/linux/flex_array.h.

Signed-off-by: David Rientjes <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoflex_array: add flex_array_shrink function
David Rientjes [Tue, 22 Sep 2009 00:04:31 +0000 (17:04 -0700)]
flex_array: add flex_array_shrink function

Add a new function to the flex_array API:

int flex_array_shrink(struct flex_array *fa)

This function will free all unused second-level pages.  Since elements are
now poisoned if they are not allocated with __GFP_ZERO, it's possible to
identify parts that consist solely of unused elements.

flex_array_shrink() returns the number of pages freed.

Signed-off-by: David Rientjes <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoflex_array: poison free elements
David Rientjes [Tue, 22 Sep 2009 00:04:31 +0000 (17:04 -0700)]
flex_array: poison free elements

Newly initialized flex_array's and/or flex_array_part's are now poisoned
with a new poison value, FLEX_ARRAY_FREE.  It's value is similar to
POISON_FREE used in the various slab allocators, but is different to
distinguish between flex array's poisoned kmem and slab allocator poisoned
kmem.

This will allow us to identify flex_array_part's that only contain free
elements (and free them with an addition to the flex_array API).  This
could also be extended in the future to identify `get' uses on elements
that have not been `put'.

If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided.
These elements are considered to be in-use since they have been
initialized.

Signed-off-by: David Rientjes <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoflex_array: add flex_array_clear function
David Rientjes [Tue, 22 Sep 2009 00:04:30 +0000 (17:04 -0700)]
flex_array: add flex_array_clear function

Add a new function to the flex_array API:

int flex_array_clear(struct flex_array *fa,
unsigned int element_nr)

This function will zero the element at element_nr in the flex_array.

Although this is equivalent to using flex_array_put() and passing a
pointer to zero'd memory, flex_array_clear() does not require such a
pointer to memory that would most likely need to be allocated on the
caller's stack which could be significantly large depending on
element_size.

Signed-off-by: David Rientjes <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agovsprintf: use WARN_ON_ONCE
Marcin Slusarz [Tue, 22 Sep 2009 00:04:29 +0000 (17:04 -0700)]
vsprintf: use WARN_ON_ONCE

Signed-off-by: Marcin Slusarz <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMAINTAINERS: move ARM lists to infradead
Joe Perches [Tue, 22 Sep 2009 00:04:27 +0000 (17:04 -0700)]
MAINTAINERS: move ARM lists to infradead

Signed-off-by: Joe Perches <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Krzysztof Halasa <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMAINTAINERS: integrate P:/M: lines
Joe Perches [Tue, 22 Sep 2009 00:04:26 +0000 (17:04 -0700)]
MAINTAINERS: integrate P:/M: lines

A couple of new uses of separate "P: name" "M: address" lines are
converted to single line "M: name <address>"

Signed-off-by: Joe Perches <[email protected]>
Cc: Anil Ravindranath <[email protected]>
Cc: Kalle Valo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMAINTAINERS: omap: fix regex
Felipe Contreras [Tue, 22 Sep 2009 00:04:25 +0000 (17:04 -0700)]
MAINTAINERS: omap: fix regex

Otherwise 'arch/arm/*omap*/foo.c' wouldn't match

Signed-off-by: Felipe Contreras <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMAINTAINERS: acpi: add 'include/acpi'
Felipe Contreras [Tue, 22 Sep 2009 00:04:24 +0000 (17:04 -0700)]
MAINTAINERS: acpi: add 'include/acpi'

Signed-off-by: Felipe Contreras <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: add maintainers in order listed in matched section
Joe Perches [Tue, 22 Sep 2009 00:04:24 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: add maintainers in order listed in matched section

Previous behavior was "bottom-up" in each section from the pattern "F:"
entry that matched.  Now information is entered into the various lists in
the "as entered" order for each matched section.

This also allows the F: entry to be put anywhere in a section, not just as
the last entries in the section.

And a couple of improvements:

Don't alphabetically sort before outputting the matched scm, status,
subsystem and web sections.

Ignore content after a single email address so these entries are acceptable
M: name <address> whatever other comment

And a fix:

Make an M: entry without a name again use the name from an immediately
preceding P: line if it exists.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: add --remove-duplicates
Joe Perches [Tue, 22 Sep 2009 00:04:22 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: add --remove-duplicates

Allow control over the elimination of duplicate email names and addresses

--remove-duplicates will use the first email name or address presented
--noremove-duplicates will emit all names and addresses

--remove-duplicates is enabled by default

For instance:

$ ./scripts/get_maintainer.pl -f drivers/char/tty_ioctl.c
Greg Kroah-Hartman <[email protected]>
Alan Cox <[email protected]>
Mike Frysinger <[email protected]>
Alexey Dobriyan <[email protected]>
[email protected]

$ ./scripts/get_maintainer.pl -f --noremove-duplicates drivers/char/tty_ioctl.c
Greg Kroah-Hartman <[email protected]>
Alan Cox <[email protected]>
Alan Cox <[email protected]>
Alan Cox <[email protected]>
Mike Frysinger <[email protected]>
Alexey Dobriyan <[email protected]>
[email protected]

Using --remove-duplicates could eliminate multiple maintainers that
share the same name but not the same email address.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: using --separator implies --nomultiline
Joe Perches [Tue, 22 Sep 2009 00:04:21 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: using --separator implies --nomultiline

If a person sets a separator, it's only used if --nomultiline is set.
Don't make the command line also include --nomultiline in that case.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: add .mailmap use, shell and email cleanups
Joe Perches [Tue, 22 Sep 2009 00:04:21 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: add .mailmap use, shell and email cleanups

Add reading and using .mailmap file if it exists
Convert address entries in .mailmap to first encountered address
Don't terminate shell commands with \n
Strip characters found after sign-offs by: name <address> [stripped]

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: better email routines, use perl not shell where possible
Joe Perches [Tue, 22 Sep 2009 00:04:20 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: better email routines, use perl not shell where possible

Added format_email and parse_email routines to reduce inline use.

Added email_address_inuse to eliminate multiple maintainer entries
for the same email address, the first name encountered is used.

Used internal perl equivalents of shell cmd use of grep|cut|sort|uniq

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: add --pattern-depth
Joe Perches [Tue, 22 Sep 2009 00:04:17 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: add --pattern-depth

--pattern-depth is used to control how many levels of directory traversal
should be performed to find maintainers.  default is 0 (all directory levels).

For instance:

MAINTAINERS currently has multiple M: and F: entries that match
net/netfilter/ipvs/ip_vs_app.c

IPVS
M: Wensong Zhang <[email protected]>
M: Simon Horman <[email protected]>
M: Julian Anastasov <[email protected]>
[...]
F: net/netfilter/ipvs/

NETFILTER/IPTABLES/IPCHAINS
[...]
M: Patrick McHardy <[email protected]>
[...]
F: net/netfilter/

NETWORKING [GENERAL]
M: "David S. Miller" <[email protected]>
[...]
F: net/

THE REST
M: Linus Torvalds <[email protected]>
[...]
F: */

Using this command will return all of those maintainers:
(except Linus unless --git-chief-maintainers is specified)

$ ./scripts/get_maintainer.pl --nogit -nol \
-f net/netfilter/ipvs/ip_vs_app.c
Julian Anastasov <[email protected]>
Simon Horman <[email protected]>
Wensong Zhang <[email protected]>
Patrick McHardy <[email protected]>
David S. Miller <[email protected]>

Adding --pattern-depth=1 will match at the deepest level
$ ./scripts/get_maintainer.pl --nogit -nol --pattern-depth=1 \
-f net/netfilter/ipvs/ip_vs_app.c
Julian Anastasov <[email protected]>
Simon Horman <[email protected]>
Wensong Zhang <[email protected]>

Adding --pattern-depth=2 will match at the deepest level and 1 higher
$ ./scripts/get_maintainer.pl --nogit -nol --pattern-depth=2 \
-f net/netfilter/ipvs/ip_vs_app.c
Julian Anastasov <[email protected]>
Simon Horman <[email protected]>
Wensong Zhang <[email protected]>
Patrick McHardy <[email protected]>

and so on.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: add sections in pattern match depth order
Joe Perches [Tue, 22 Sep 2009 00:04:14 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: add sections in pattern match depth order

Before this change, matched sections were added in the order
of appearance in the normally alphabetic section order of
the MAINTAINERS file.

For instance, finding the maintainer for drivers/scsi/wd7000.c
would first find "SCSI SUBSYSTEM", then "WD7000 SCSI SUBSYSTEM",
then "THE REST".

before patch:

$ ./scripts/get_maintainer.pl --nogit -f drivers/scsi/wd7000.c
James E.J. Bottomley <[email protected]>
Miroslav Zagorac <[email protected]>
[email protected]
[email protected]

get_maintainer.pl now selects matched sections by longest pattern match.
Longest is the number of "/"s and any specific file pattern.

This changes the example output order of MAINTAINERS to whatever is
selected in "WD7000 SUBSYSTEM", then "SCSI SYSTEM", then "THE REST".

after patch:

$ ./scripts/get_maintainer.pl --nogit -f drivers/scsi/wd7000.c
Miroslav Zagorac <[email protected]>
James E.J. Bottomley <[email protected]>
[email protected]
[email protected]

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoscripts/get_maintainer.pl: add --git-blame
Joe Perches [Tue, 22 Sep 2009 00:04:13 +0000 (17:04 -0700)]
scripts/get_maintainer.pl: add --git-blame

Julia Lawall suggested that get_maintainers.pl should have the
ability to include signatories of commits that are modified by
a particular patch.

Vegard Nossum did something similar once.
http://lkml.org/lkml/2008/5/29/449

The modified script looks the commits for all lines in the
patch, and includes the "-by:" signatories for those commits.
It uses the same git-min-percent, git-max-maintainers, and
git-min-signatures options.  git-since is ignored.

It can be used independently from the --git default, so
        ./scripts/get_maintainers.pl --nogit --git-blame <patch>
or
        ./scripts/get_maintainers.pl --nogit --git-blame -f <file>
is acceptable.

If used with -f <file>, all lines/commits for the file are
checked.

--git-blame can be slow if used with -f <file>
--git-blame does not work with -f <directory>

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoMAINTAINERS: add IPVS include files
Hannes Eder [Tue, 22 Sep 2009 00:04:12 +0000 (17:04 -0700)]
MAINTAINERS: add IPVS include files

Signed-off-by: Hannes Eder <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agouml: fix order of pud and pmd_free()
Roel Kluin [Tue, 22 Sep 2009 00:04:11 +0000 (17:04 -0700)]
uml: fix order of pud and pmd_free()

If pmd_alloc() fails we should only free the prior allocated pud, if
pte_alloc_map() fails, we should free pmd as well.

Signed-off-by: Roel Kluin <[email protected]>
Cc: Jeff Dike <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoum: convert to asm-generic/hardirq.h
Christoph Hellwig [Tue, 22 Sep 2009 00:04:10 +0000 (17:04 -0700)]
um: convert to asm-generic/hardirq.h

Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Jeff Dike <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocpuidle: menu governor: reduce latency on exit
Corrado Zoccolo [Tue, 22 Sep 2009 00:04:09 +0000 (17:04 -0700)]
cpuidle: menu governor: reduce latency on exit

Move the state residency accounting and statistics computation off the hot
exit path.

On exit, the need to recompute statistics is recorded, and new statistics
will be computed when menu_select is called again.

The expected effect is to reduce processor wakeup latency from sleep
(C-states).  We are speaking of few hundreds of cycles reduction out of a
several microseconds latency (determined by the hardware transition), so
it is difficult to measure.

Signed-off-by: Corrado Zoccolo <[email protected]>
Cc: Venkatesh Pallipadi <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Adam Belay <[email protected]
Acked-by: Arjan van de Ven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agocpuidle: fix the menu governor to boost IO performance
Arjan van de Ven [Tue, 22 Sep 2009 00:04:08 +0000 (17:04 -0700)]
cpuidle: fix the menu governor to boost IO performance

Fix the menu idle governor which balances power savings, energy efficiency
and performance impact.

The reason for a reworked governor is that there have been serious
performance issues reported with the existing code on Nehalem server
systems.

To show this I'm sure Andrew wants to see benchmark results:
(benchmark is "fio", "no cstates" is using "idle=poll")

no cstates current linux new algorithm
1 disk 107 Mb/s 85 Mb/s 105 Mb/s
2 disks 215 Mb/s 123 Mb/s 209 Mb/s
12 disks 590 Mb/s 320 Mb/s 585 Mb/s

In various power benchmark measurements, no degredation was found by our
measurement&diagnostics team.  Obviously a small percentage more power was
used in the "fio" benchmark, due to the much higher performance.

While it would be a novel idea to describe the new algorithm in this
commit message, I cheaped out and described it in comments in the code
instead.

[changes since first post: spelling fixes from akpm, review feedback,
folded menu-tng into menu.c]

Signed-off-by: Arjan van de Ven <[email protected]>
Cc: Venkatesh Pallipadi <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yanmin Zhang <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agom68k: convert to asm-generic/hardirq.h
Christoph Hellwig [Tue, 22 Sep 2009 00:04:07 +0000 (17:04 -0700)]
m68k: convert to asm-generic/hardirq.h

Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agom68k: convert to use arch_gettimeoffset()
john stultz [Tue, 22 Sep 2009 00:04:05 +0000 (17:04 -0700)]
m68k: convert to use arch_gettimeoffset()

Convert m68k to use GENERIC_TIME via the arch_getoffset() infrastructure,
reducing the amount of arch specific code we need to maintain.

I've taken my best swing at converting this, but I'm not 100% confident
I got it right. My cross-compiler is now out of date (gcc4.2) so I
wasn't able to  check if it compiled. Any assistance from arch
maintainers or testers to get this merged would be great.

Signed-off-by: John Stultz <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Roman Zippel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agom32r: convert to asm-generic/hardirq.h
Christoph Hellwig [Tue, 22 Sep 2009 00:04:04 +0000 (17:04 -0700)]
m32r: convert to asm-generic/hardirq.h

Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agom32r: convert to use arch_gettimeoffset()
john stultz [Tue, 22 Sep 2009 00:04:04 +0000 (17:04 -0700)]
m32r: convert to use arch_gettimeoffset()

Convert m32r to use GENERIC_TIME via the arch_getoffset() infrastructure,
reducing the amount of arch specific code we need to maintain.

I also noted that m32r doesn't seem to be taking the xtime write lock
before calling do_timer()!  That looks like a pretty bad bug to me.  If
folks agree, let me know and I can move the lock grab to the correct spot.

Signed-off-by: John Stultz <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agom32r: remove redundant tests on unsigned
Roel Kluin [Tue, 22 Sep 2009 00:04:03 +0000 (17:04 -0700)]
m32r: remove redundant tests on unsigned

`off' and `max_cpus' are unsigned.  When negative they are wrapped and
caught by the other test.

Signed-off-by: Roel Kluin <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoalpha: convert to asm-generic/hardirq.h
Christoph Hellwig [Tue, 22 Sep 2009 00:04:02 +0000 (17:04 -0700)]
alpha: convert to asm-generic/hardirq.h

Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoalpha: use printk_once
Marcin Slusarz [Tue, 22 Sep 2009 00:04:01 +0000 (17:04 -0700)]
alpha: use printk_once

Signed-off-by: Marcin Slusarz <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Richard Henderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoarch/alpha/boot/tools/objstrip.c: wrong variable tested after open()
Roel Kluin [Tue, 22 Sep 2009 00:04:01 +0000 (17:04 -0700)]
arch/alpha/boot/tools/objstrip.c: wrong variable tested after open()

The incorrect variable is tested. fd is used for another open()
and is already tested.

Signed-off-by: Roel Kluin <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Richard Henderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoalpha: convert to use arch_gettimeoffset()
john stultz [Tue, 22 Sep 2009 00:04:00 +0000 (17:04 -0700)]
alpha: convert to use arch_gettimeoffset()

Converts alpha to use GENERIC_TIME via the arch_getoffset()
infrastructure, reducing the amount of arch specific code we need to
maintain.

I suspect the alpha arch could even be further improved to provide and
rpcc() based clocksource, but not having the hardware, I don't feel
comfortable attempting the more complicated conversion (but I'd be glad to
help if anyone else is interested).

[[email protected]: fix build]
Signed-off-by: John Stultz <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoh8300: convert to asm-generic/hardirq.h
Christoph Hellwig [Tue, 22 Sep 2009 00:03:58 +0000 (17:03 -0700)]
h8300: convert to asm-generic/hardirq.h

Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agonommu: add support for Memory Protection Units (MPU)
Bernd Schmidt [Tue, 22 Sep 2009 00:03:57 +0000 (17:03 -0700)]
nommu: add support for Memory Protection Units (MPU)

Some architectures (like the Blackfin arch) implement some of the
"simpler" features that one would expect out of a MMU such as memory
protection.

In our case, we actually get read/write/exec protection down to the page
boundary so processes can't stomp on each other let alone the kernel.

There is a performance decrease (which depends greatly on the workload)
however as the hardware/software interaction was not optimized at design
time.

Signed-off-by: Bernd Schmidt <[email protected]>
Signed-off-by: Bryan Wu <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Acked-by: David Howells <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopcmcia: cleanup/fixup patch for sa1100_jornada_pcmcia driver
Kristoffer Ericson [Tue, 22 Sep 2009 00:03:56 +0000 (17:03 -0700)]
pcmcia: cleanup/fixup patch for sa1100_jornada_pcmcia driver

Clean up the /drivers/pcmcia/sa1100_jornada.c file with respect to
formatting.  It also changes a build warning into a code comment (since
its a pain to watch every build and havent seen any problems with driver
in 3.5years).

Signed-off-by: Kristoffer Ericson <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopcmcia: switch /proc/bus/pccard/drivers to seq_file
Alexey Dobriyan [Tue, 22 Sep 2009 00:03:55 +0000 (17:03 -0700)]
pcmcia: switch /proc/bus/pccard/drivers to seq_file

Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopcmcia: fix read buffer overflow
Roel Kluin [Tue, 22 Sep 2009 00:03:54 +0000 (17:03 -0700)]
pcmcia: fix read buffer overflow

If count > 0 and dev->rlen == dev->rpos and dev->proto == 0 then we read
and write dev->rbuf[-1];

Signed-off-by: Roel Kluin <[email protected]>
Cc: Harald Welte <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopcmcia: yenta: add missing __devexit marking
Mike Frysinger [Tue, 22 Sep 2009 00:03:53 +0000 (17:03 -0700)]
pcmcia: yenta: add missing __devexit marking

The remove member of the pci_driver yenta_cardbus_driver uses
__devexit_p(), so the remove function itself should be marked with
__devexit.  Even more so considering the probe function is marked with
__devinit.

Signed-off-by: Mike Frysinger <[email protected]>
Cc: Daniel Ritz <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: reduce atomic use on use_mm fast path
Michael S. Tsirkin [Tue, 22 Sep 2009 00:03:52 +0000 (17:03 -0700)]
mm: reduce atomic use on use_mm fast path

When the mm being switched to matches the active mm, we don't need to
increment and then drop the mm count.  In a simple benchmark this happens
in about 50% of time.  Making that conditional reduces contention on that
cacheline on SMP systems.

Acked-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: move use_mm/unuse_mm from aio.c to mm/
Michael S. Tsirkin [Tue, 22 Sep 2009 00:03:51 +0000 (17:03 -0700)]
mm: move use_mm/unuse_mm from aio.c to mm/

Anyone who wants to do copy to/from user from a kernel thread, needs
use_mm (like what fs/aio has).  Move that into mm/, to make reusing and
exporting easier down the line, and make aio use it.  Next intended user,
besides aio, will be vhost-net.

Acked-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agoshmem: initialize struct shmem_sb_info to zero
Pekka Enberg [Tue, 22 Sep 2009 00:03:50 +0000 (17:03 -0700)]
shmem: initialize struct shmem_sb_info to zero

Fixes the following kmemcheck false positive (the compiler is using
a 32-bit mov to load the 16-bit sbinfo->mode in shmem_fill_super):

[    0.337000] Total of 1 processors activated (3088.38 BogoMIPS).
[    0.352000] CPU0 attaching NULL sched-domain.
[    0.360000] WARNING: kmemcheck: Caught 32-bit read from uninitialized
memory (9f8020fc)
[    0.361000]
a44240820000000041f6998100000000000000000000000000000000ff030000
[    0.368000]  i i i i i i i i i i i i i i i i u u u u i i i i i i i i i i u
u
[    0.375000]                                                          ^
[    0.376000]
[    0.377000] Pid: 9, comm: khelper Not tainted (2.6.31-tip #206) P4DC6
[    0.378000] EIP: 0060:[<810a3a95>] EFLAGS: 00010246 CPU: 0
[    0.379000] EIP is at shmem_fill_super+0xb5/0x120
[    0.380000] EAX: 00000000 EBX: 9f845400 ECX: 824042a4 EDX: 8199f641
[    0.381000] ESI: 9f8020c0 EDI: 9f845400 EBP: 9f81af68 ESP: 81cd6eec
[    0.382000]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.383000] CR0: 8005003b CR2: 9f806200 CR3: 01ccd000 CR4: 000006d0
[    0.384000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.385000] DR6: ffff4ff0 DR7: 00000400
[    0.386000]  [<810c25fc>] get_sb_nodev+0x3c/0x80
[    0.388000]  [<810a3514>] shmem_get_sb+0x14/0x20
[    0.390000]  [<810c207f>] vfs_kern_mount+0x4f/0x120
[    0.392000]  [<81b2849e>] init_tmpfs+0x7e/0xb0
[    0.394000]  [<81b11597>] do_basic_setup+0x17/0x30
[    0.396000]  [<81b11907>] kernel_init+0x57/0xa0
[    0.398000]  [<810039b7>] kernel_thread_helper+0x7/0x10
[    0.400000]  [<ffffffff>] 0xffffffff
[    0.402000] khelper used greatest stack depth: 2820 bytes left
[    0.407000] calling  init_mmap_min_addr+0x0/0x10 @ 1
[    0.408000] initcall init_mmap_min_addr+0x0/0x10 returned 0 after 0 usecs

Reported-by: Ingo Molnar <[email protected]>
Analysed-by: Vegard Nossum <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: remove duplicate asm/mman.h files
Arnd Bergmann [Tue, 22 Sep 2009 00:03:48 +0000 (17:03 -0700)]
mm: remove duplicate asm/mman.h files

A number of architectures have identical asm/mman.h files so they can all
be merged by using the new generic file.

The remaining asm/mman.h files are substantially different from each
other.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agohugetlb: add MAP_HUGETLB example
Eric B Munson [Tue, 22 Sep 2009 00:03:48 +0000 (17:03 -0700)]
hugetlb: add MAP_HUGETLB example

Add an example of how to use the MAP_HUGETLB flag to the vm documentation
directory and a reference to the example in hugetlbpage.txt.

Signed-off-by: Eric B Munson <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Adam Litke <[email protected]>
Cc: David Gibson <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agohugetlb: add MAP_HUGETLB for mmaping pseudo-anonymous huge page regions
Eric B Munson [Tue, 22 Sep 2009 00:03:47 +0000 (17:03 -0700)]
hugetlb: add MAP_HUGETLB for mmaping pseudo-anonymous huge page regions

Add a flag for mmap that will be used to request a huge page region that
will look like anonymous memory to userspace.  This is accomplished by
using a file on the internal vfsmount.  MAP_HUGETLB is a modifier of
MAP_ANONYMOUS and so must be specified with it.  The region will behave
the same as a MAP_ANONYMOUS region using small pages.

[[email protected]: fix arch definitions of MAP_HUGETLB]
Signed-off-by: Eric B Munson <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Adam Litke <[email protected]>
Cc: David Gibson <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: add MAP_HUGETLB for mmaping pseudo-anonymous huge page regions
Arnd Bergmann [Tue, 22 Sep 2009 00:03:45 +0000 (17:03 -0700)]
mm: add MAP_HUGETLB for mmaping pseudo-anonymous huge page regions

Add a flag for mmap that will be used to request a huge page region that
will look like anonymous memory to user space.  This is accomplished by
using a file on the internal vfsmount.  MAP_HUGETLB is a modifier of
MAP_ANONYMOUS and so must be specified with it.  The region will behave
the same as a MAP_ANONYMOUS region using small pages.

The patch also adds the MAP_STACK flag, which was previously defined only
on some architectures but not on others.  Since MAP_STACK is meant to be a
hint only, architectures can define it without assigning a specific
meaning to it.

Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Eric B Munson <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agohugetlbfs: allow the creation of files suitable for MAP_PRIVATE on the vfs internal...
Eric B Munson [Tue, 22 Sep 2009 00:03:43 +0000 (17:03 -0700)]
hugetlbfs: allow the creation of files suitable for MAP_PRIVATE on the vfs internal mount

This patchset adds a flag to mmap that allows the user to request that an
anonymous mapping be backed with huge pages.  This mapping will borrow
functionality from the huge page shm code to create a file on the kernel
internal mount and use it to approximate an anonymous mapping.  The
MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without
both flags being preset.

A new flag is necessary because there is no other way to hook into huge
pages without creating a file on a hugetlbfs mount which wouldn't be
MAP_ANONYMOUS.

To userspace, this mapping will behave just like an anonymous mapping
because the file is not accessible outside of the kernel.

This patchset is meant to simplify the programming model.  Presently there
is a large chunk of boiler platecode, contained in libhugetlbfs, required
to create private, hugepage backed mappings.  This patch set would allow
use of hugepages without linking to libhugetlbfs or having hugetblfs
mounted.

Unification of the VM code would provide these same benefits, but it has
been resisted each time that it has been suggested for several reasons: it
would break PAGE_SIZE assumptions across the kernel, it makes page-table
abstractions really expensive, and it does not provide any benefit on
architectures that do not support huge pages, incurring fast path
penalties without providing any benefit on these architectures.

This patch:

There are two means of creating mappings backed by huge pages:

        1. mmap() a file created on hugetlbfs
        2. Use shm which creates a file on an internal mount which essentially
           maps it MAP_SHARED

The internal mount is only used for shared mappings but there is very
little that stops it being used for private mappings. This patch extends
hugetlbfs_file_setup() to deal with the creation of files that will be
mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is
used in a subsequent patch to implement the MAP_HUGETLB mmap() flag.

Signed-off-by: Eric Munson <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Adam Litke <[email protected]>
Cc: David Gibson <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agommap: save some cycles for the shared anonymous mapping
Huang Shijie [Tue, 22 Sep 2009 00:03:41 +0000 (17:03 -0700)]
mmap: save some cycles for the shared anonymous mapping

shmem_zero_setup() does not change vm_start, pgoff or vm_flags, only some
drivers change them (such as /driver/video/bfin-t350mcqb-fb.c).

Move these codes to a more proper place to save cycles for shared
anonymous mapping.

Signed-off-by: Huang Shijie <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agommap: avoid unnecessary anon_vma lock acquisition in vma_adjust()
Lee Schermerhorn [Tue, 22 Sep 2009 00:03:40 +0000 (17:03 -0700)]
mmap: avoid unnecessary anon_vma lock acquisition in vma_adjust()

We noticed very erratic behavior [throughput] with the AIM7 shared
workload running on recent distro [SLES11] and mainline kernels on an
8-socket, 32-core, 256GB x86_64 platform.  On the SLES11 kernel
[2.6.27.19+] with Barcelona processors, as we increased the load [10s of
thousands of tasks], the throughput would vary between two "plateaus"--one
at ~65K jobs per minute and one at ~130K jpm.  The simple patch below
causes the results to smooth out at the ~130k plateau.

But wait, there's more:

We do not see this behavior on smaller platforms--e.g., 4 socket/8 core.
This could be the result of the larger number of cpus on the larger
platform--a scalability issue--or it could be the result of the larger
number of interconnect "hops" between some nodes in this platform and how
the tasks for a given load end up distributed over the nodes' cpus and
memories--a stochastic NUMA effect.

The variability in the results are less pronounced [on the same platform]
with Shanghai processors and with mainline kernels.  With 31-rc6 on
Shanghai processors and 288 file systems on 288 fibre attached storage
volumes, the curves [jpm vs load] are both quite flat with the patched
kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
jpm] than the unpatched kernel.

Profiling indicated that the "slow" runs were incurring high[er]
contention on an anon_vma lock in vma_adjust(), apparently called from the
sbrk() system call.

The patch:

A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
anon_vma lock when we're only adjusting the end of a vma, as is the case
for brk().  The comment questions whether it's worth while to optimize for
this case.  Apparently, on the newer, larger x86_64 platforms, with
interesting NUMA topologies, it is worth while--especially considering
that the patch [if correct!] is quite simple.

We can detect this condition--no overlap with next vma--by noting a NULL
"importer".  The anon_vma pointer will also be NULL in this case, so
simply avoid loading vma->anon_vma to avoid the lock.

However, we DO need to take the anon_vma lock when we're inserting a vma
['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
need to check for 'insert', as well.  And Hugh points out that we should
also take it when adjusting vm_start (so that rmap.c can rely upon
vma_address() while it holds the anon_vma lock).

akpm: Zhang Yanmin reprts a 150% throughput improvement with aim7, so it
might be -stable material even though thiss isn't a regression: "this
issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but is
severe on large machine (4*8*2 cpu)"

[[email protected]: test vma start too]
Signed-off-by: Lee Schermerhorn <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Eric Whitney <[email protected]>
Tested-by: "Zhang, Yanmin" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agotmpfs: depend on shmem
Hugh Dickins [Tue, 22 Sep 2009 00:03:37 +0000 (17:03 -0700)]
tmpfs: depend on shmem

CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when
CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
more sense of it by removing CONFIG_TMPFS altogether, always enabling its
code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
CONFIG_TMPFS off that we'd better leave that as is.

But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
shmem_acl.o being pointlessly built into the kernel when SHMEM is off.

And a selfish change, to prevent the world from being rebuilt when I
switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
header files is mm.h shmem_lock() - give that a shmem.c stub instead.

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Matt Mackall <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agommap: remove unnecessary code
Huang Shijie [Tue, 22 Sep 2009 00:03:36 +0000 (17:03 -0700)]
mmap: remove unnecessary code

If (flags & MAP_LOCKED) is true, it means vm_flags has already contained
the bit VM_LOCKED which is set by calc_vm_flag_bits().

So there is no need to reset it again, just remove it.

Signed-off-by: Huang Shijie <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: move highest_memmap_pfn
Hugh Dickins [Tue, 22 Sep 2009 00:03:35 +0000 (17:03 -0700)]
mm: move highest_memmap_pfn

Move highest_memmap_pfn __read_mostly from page_alloc.c next to zero_pfn
__read_mostly in memory.c: to help them share a cacheline, since they're
very often tested together in vm_normal_page().

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: ZERO_PAGE without PTE_SPECIAL
Hugh Dickins [Tue, 22 Sep 2009 00:03:34 +0000 (17:03 -0700)]
mm: ZERO_PAGE without PTE_SPECIAL

Reinstate anonymous use of ZERO_PAGE to all architectures, not just to
those which __HAVE_ARCH_PTE_SPECIAL: as suggested by Nick Piggin.

Contrary to how I'd imagined it, there's nothing ugly about this, just a
zero_pfn test built into one or another block of vm_normal_page().

But the MIPS ZERO_PAGE-of-many-colours case demands is_zero_pfn() and
my_zero_pfn() inlines.  Reinstate its mremap move_pte() shuffling of
ZERO_PAGEs we did from 2.6.17 to 2.6.19?  Not unless someone shouts for
that: it would have to take vm_flags to weed out some cases.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: hugetlbfs_pagecache_present
Hugh Dickins [Tue, 22 Sep 2009 00:03:33 +0000 (17:03 -0700)]
mm: hugetlbfs_pagecache_present

Rename hugetlbfs_backed() to hugetlbfs_pagecache_present()
and add more comments, as suggested by Mel Gorman.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: m(un)lock avoid ZERO_PAGE
Hugh Dickins [Tue, 22 Sep 2009 00:03:32 +0000 (17:03 -0700)]
mm: m(un)lock avoid ZERO_PAGE

I'm still reluctant to clutter __get_user_pages() with another flag, just
to avoid touching ZERO_PAGE count in mlock(); though we can add that later
if it shows up as an issue in practice.

But when mlocking, we can test page->mapping slightly earlier, to avoid
the potentially bouncy rescheduling of lock_page on ZERO_PAGE - mlock
didn't lock_page in olden ZERO_PAGE days, so we might have regressed.

And when munlocking, it turns out that FOLL_DUMP coincidentally does
what's needed to avoid all updates to ZERO_PAGE, so use that here also.
Plus add comment suggested by KAMEZAWA Hiroyuki.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Nick Piggin <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: FOLL flags for GUP flags
Hugh Dickins [Tue, 22 Sep 2009 00:03:31 +0000 (17:03 -0700)]
mm: FOLL flags for GUP flags

__get_user_pages() has been taking its own GUP flags, then processing
them into FOLL flags for follow_page().  Though oddly named, the FOLL
flags are more widely used, so pass them to __get_user_pages() now.
Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.

(The patch to __get_user_pages() looks peculiar, with both gup_flags
and foll_flags: the gup_flags remain constant; but as before there's
an exceptional case, out of scope of the patch, in which foll_flags
per page have FOLL_WRITE masked off.)

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: reinstate ZERO_PAGE
Hugh Dickins [Tue, 22 Sep 2009 00:03:30 +0000 (17:03 -0700)]
mm: reinstate ZERO_PAGE

KAMEZAWA Hiroyuki has observed customers of earlier kernels taking
advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
using in 2.6.24.  And there were a couple of regression reports on LKML.

Following suggestions from Linus, reinstate do_anonymous_page() use of
the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
with (map)count updates - let vm_normal_page() regard it as abnormal.

Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
most powerpc): that's not essential, but minimizes additional branches
(keeping them in the unlikely pte_special case); and incidentally
excludes mips (some models of which needed eight colours of ZERO_PAGE
to avoid costly exceptions).

Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
callers won't want to make exceptions for it, so increment its count
there.  Changes to mlock and migration? happily seems not needed.

In most places it's quicker to check pfn than struct page address:
prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
still need its ZERO_PAGE check? probably not, but keep it anyway.

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: fix anonymous dirtying
Hugh Dickins [Tue, 22 Sep 2009 00:03:29 +0000 (17:03 -0700)]
mm: fix anonymous dirtying

do_anonymous_page() has been wrong to dirty the pte regardless.
If it's not going to mark the pte writable, then it won't help
to mark it dirty here, and clogs up memory with pages which will
need swap instead of being thrown away.  Especially wrong if no
overcommit is chosen, and this vma is not yet VM_ACCOUNTed -
we could exceed the limit and OOM despite no overcommit.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: follow_hugetlb_page flags
Hugh Dickins [Tue, 22 Sep 2009 00:03:27 +0000 (17:03 -0700)]
mm: follow_hugetlb_page flags

follow_hugetlb_page() shouldn't be guessing about the coredump case
either: pass the foll_flags down to it, instead of just the write bit.

Remove that obscure huge_zeropage_ok() test.  The decision is easy,
though unlike the non-huge case - here vm_ops->fault is always set.
But we know that a fault would serve up zeroes, unless there's
already a hugetlbfs pagecache page to back the range.

(Alternatively, since hugetlb pages aren't swapped out under pressure,
you could save more dump space by arguing that a page not yet faulted
into this process cannot be relevant to the dump; but that would be
more surprising.)

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: FOLL_DUMP replace FOLL_ANON
Hugh Dickins [Tue, 22 Sep 2009 00:03:26 +0000 (17:03 -0700)]
mm: FOLL_DUMP replace FOLL_ANON

The "FOLL_ANON optimization" and its use_zero_page() test have caused
confusion and bugs: why does it test VM_SHARED? for the very good but
unsatisfying reason that VMware crashed without.  As we look to maybe
reinstating anonymous use of the ZERO_PAGE, we need to sort this out.

Easily done: it's silly for __get_user_pages() and follow_page() to
be guessing whether it's safe to assume that they're being used for
a coredump (which can take a shortcut snapshot where other uses must
handle a fault) - just tell them with GUP_FLAGS_DUMP and FOLL_DUMP.

get_dump_page() doesn't even want a ZERO_PAGE: an error suits fine.

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: add get_dump_page
Hugh Dickins [Tue, 22 Sep 2009 00:03:25 +0000 (17:03 -0700)]
mm: add get_dump_page

In preparation for the next patch, add a simple get_dump_page(addr)
interface for the CONFIG_ELF_CORE dumpers to use, instead of calling
get_user_pages() directly.  They're not interested in errors: they
just want to use holes as much as possible, to save space and make
sure that the data is aligned where the headers said it would be.

Oh, and don't use that horrid DUMP_SEEK(off) macro!

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: remove unused GUP flags
Hugh Dickins [Tue, 22 Sep 2009 00:03:24 +0000 (17:03 -0700)]
mm: remove unused GUP flags

GUP_FLAGS_IGNORE_VMA_PERMISSIONS and GUP_FLAGS_IGNORE_SIGKILL were
flags added solely to prevent __get_user_pages() from doing some of
what it usually does, in the munlock case: we can now remove them.

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: munlock use follow_page
Hugh Dickins [Tue, 22 Sep 2009 00:03:23 +0000 (17:03 -0700)]
mm: munlock use follow_page

Hiroaki Wakabayashi points out that when mlock() has been interrupted
by SIGKILL, the subsequent munlock() takes unnecessarily long because
its use of __get_user_pages() insists on faulting in all the pages
which mlock() never reached.

It's worse than slowness if mlock() is terminated by Out Of Memory kill:
the munlock_vma_pages_all() in exit_mmap() insists on faulting in all the
pages which mlock() could not find memory for; so innocent bystanders are
killed too, and perhaps the system hangs.

__get_user_pages() does a lot that's silly for munlock(): so remove the
munlock option from __mlock_vma_pages_range(), and use a simple loop of
follow_page()s in munlock_vma_pages_range() instead; ignoring absent
pages, and not marking present pages as accessed or dirty.

(Change munlock() to only go so far as mlock() reached?  That does not
work out, given the convention that mlock() claims complete success even
when it has to give up early - in part so that an underlying file can be
extended later, and those pages locked which earlier would give SIGBUS.)

Signed-off-by: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Hiroaki Wakabayashi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agomm: fix NUMA accounting in numastat.txt
Minchan Kim [Tue, 22 Sep 2009 00:03:21 +0000 (17:03 -0700)]
mm: fix NUMA accounting in numastat.txt

In Documentation/numastat.txt, it confused me.  For example, there are
nodes [0,1] in system.

barrios:~$ cat /proc/zoneinfo | egrep 'numa|zone'
Node 0, zone DMA
numa_hit 33226
numa_miss 1739
numa_foreign 27978
..
..
Node 1, zone DMA
numa_hit 307
numa_miss 46900
numa_foreign 0

1) In node 0,  NUMA_MISS means it wanted to allocate page
in node 1 but ended up with page in node 0

2) In node 0, NUMA_FOREIGN means it wanted to allocate page
in node 0 but ended up with page from Node 1.

But now, numastat explains it oppositely about (MISS, FOREIGN).
Let's fix up with viewpoint of zone.

Signed-off-by: Minchan Kim <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopage-allocator: maintain rolling count of pages to free from the PCP
Mel Gorman [Tue, 22 Sep 2009 00:03:20 +0000 (17:03 -0700)]
page-allocator: maintain rolling count of pages to free from the PCP

When round-robin freeing pages from the PCP lists, empty lists may be
encountered.  In the event one of the lists has more pages than another,
there may be numerous checks for list_empty() which is undesirable.  This
patch maintains a count of pages to free which is incremented when empty
lists are encountered.  The intention is that more pages will then be
freed from fuller lists than the empty ones reducing the number of empty
list checks in the free path.

[[email protected]: coding-style fixes]
Signed-off-by: Mel Gorman <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Christoph Lameter <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agopage-allocator: split per-cpu list into one-list-per-migrate-type
Mel Gorman [Tue, 22 Sep 2009 00:03:19 +0000 (17:03 -0700)]
page-allocator: split per-cpu list into one-list-per-migrate-type

The following two patches remove searching in the page allocator fast-path
by maintaining multiple free-lists in the per-cpu structure.  At the time
the search was introduced, increasing the per-cpu structures would waste a
lot of memory as per-cpu structures were statically allocated at
compile-time.  This is no longer the case.

The patches are as follows. They are based on mmotm-2009-08-27.

Patch 1 adds multiple lists to struct per_cpu_pages, one per
migratetype that can be stored on the PCP lists.

Patch 2 notes that the pcpu drain path check empty lists multiple times. The
patch reduces the number of checks by maintaining a count of free
lists encountered. Lists containing pages will then free multiple
pages in batch

The patches were tested with kernbench, netperf udp/tcp, hackbench and
sysbench.  The netperf tests were not bound to any CPU in particular and
were run such that the results should be 99% confidence that the reported
results are within 1% of the estimated mean.  sysbench was run with a
postgres background and read-only tests.  Similar to netperf, it was run
multiple times so that it's 99% confidence results are within 1%.  The
patches were tested on x86, x86-64 and ppc64 as

x86: Intel Pentium D 3GHz with 8G RAM (no-brand machine)
kernbench - No significant difference, variance well within noise
netperf-udp - 1.34% to 2.28% gain
netperf-tcp - 0.45% to 1.22% gain
hackbench - Small variances, very close to noise
sysbench - Very small gains

x86-64: AMD Phenom 9950 1.3GHz with 8G RAM (no-brand machine)
kernbench - No significant difference, variance well within noise
netperf-udp - 1.83% to 10.42% gains
netperf-tcp - No conclusive until buffer >= PAGE_SIZE
4096 +15.83%
8192 + 0.34% (not significant)
16384 + 1%
hackbench - Small gains, very close to noise
sysbench - 0.79% to 1.6% gain

ppc64: PPC970MP 2.5GHz with 10GB RAM (it's a terrasoft powerstation)
kernbench - No significant difference, variance well within noise
netperf-udp - 2-3% gain for almost all buffer sizes tested
netperf-tcp - losses on small buffers, gains on larger buffers
  possibly indicates some bad caching effect.
hackbench - No significant difference
sysbench - 2-4% gain

This patch:

Currently the per-cpu page allocator searches the PCP list for pages of
the correct migrate-type to reduce the possibility of pages being
inappropriate placed from a fragmentation perspective.  This search is
potentially expensive in a fast-path and undesirable.  Splitting the
per-cpu list into multiple lists increases the size of a per-cpu structure
and this was potentially a major problem at the time the search was
introduced.  These problem has been mitigated as now only the necessary
number of structures is allocated for the running system.

This patch replaces a list search in the per-cpu allocator with one list
per migrate type.  The potential snag with this approach is when bulk
freeing pages.  We round-robin free pages based on migrate type which has
little bearing on the cache hotness of the page and potentially checks
empty lists repeatedly in the event the majority of PCP pages are of one
type.

Signed-off-by: Mel Gorman <[email protected]>
Acked-by: Nick Piggin <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agooom: fix oom_adjust_write() input sanity check
KOSAKI Motohiro [Tue, 22 Sep 2009 00:03:16 +0000 (17:03 -0700)]
oom: fix oom_adjust_write() input sanity check

Andrew Morton pointed out oom_adjust_write() has very strange EIO
and new line handling. this patch fixes it.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agooom: oom_kill doesn't kill vfork parent (or child)
KOSAKI Motohiro [Tue, 22 Sep 2009 00:03:15 +0000 (17:03 -0700)]
oom: oom_kill doesn't kill vfork parent (or child)

Current oom_kill doesn't only kill the victim process, but also kill all
thas shread the same mm.  it mean vfork parent will be killed.

This is definitely incorrect.  another process have another oom_adj.  we
shouldn't ignore their oom_adj (it might have OOM_DISABLE).

following caller hit the minefield.

===============================
        switch (constraint) {
        case CONSTRAINT_MEMORY_POLICY:
                oom_kill_process(current, gfp_mask, order, 0, NULL,
                                "No available memory (MPOL_BIND)");
                break;

Note: force_sig(SIGKILL) send SIGKILL to all thread in the process.
We don't need to care multi thread in here.

Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
16 years agooom: make oom_score to per-process value
KOSAKI Motohiro [Tue, 22 Sep 2009 00:03:14 +0000 (17:03 -0700)]
oom: make oom_score to per-process value

oom-killer kills a process, not task.  Then oom_score should be calculated
as per-process too.  it makes consistency more and makes speed up
select_bad_process().

Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>