openwrt/staging/blogic.git
11 years agomissing bits of "splice: fix racy pipe->buffers uses"
Al Viro [Fri, 11 Apr 2014 16:01:03 +0000 (12:01 -0400)]
missing bits of "splice: fix racy pipe->buffers uses"

that commit has fixed only the parts of that mess in fs/splice.c itself;
there had been more in several other ->splice_read() instances...

Signed-off-by: Al Viro <[email protected]>
11 years agocifs: fix the race in cifs_writev()
Al Viro [Thu, 3 Apr 2014 14:27:17 +0000 (10:27 -0400)]
cifs: fix the race in cifs_writev()

O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.

Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().

Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agoceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
Al Viro [Fri, 4 Apr 2014 02:44:19 +0000 (22:44 -0400)]
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure

ceph_osdc_put_request(ERR_PTR(-error)) oopses.  What we want there
is break, not goto out.

Signed-off-by: Al Viro <[email protected]>
11 years agokill generic_file_buffered_write()
Al Viro [Wed, 12 Feb 2014 03:36:48 +0000 (22:36 -0500)]
kill generic_file_buffered_write()

Signed-off-by: Al Viro <[email protected]>
11 years agoocfs2_file_aio_write(): switch to generic_perform_write()
Al Viro [Wed, 12 Feb 2014 03:34:52 +0000 (22:34 -0500)]
ocfs2_file_aio_write(): switch to generic_perform_write()

Signed-off-by: Al Viro <[email protected]>
11 years agoceph_aio_write(): switch to generic_perform_write()
Al Viro [Wed, 12 Feb 2014 03:28:43 +0000 (22:28 -0500)]
ceph_aio_write(): switch to generic_perform_write()

Signed-off-by: Al Viro <[email protected]>
11 years agoxfs_file_buffered_aio_write(): switch to generic_perform_write()
Al Viro [Wed, 12 Feb 2014 03:25:22 +0000 (22:25 -0500)]
xfs_file_buffered_aio_write(): switch to generic_perform_write()

Signed-off-by: Al Viro <[email protected]>
11 years agoexport generic_perform_write(), start getting rid of generic_file_buffer_write()
Al Viro [Wed, 12 Feb 2014 02:34:08 +0000 (21:34 -0500)]
export generic_perform_write(), start getting rid of generic_file_buffer_write()

Signed-off-by: Al Viro <[email protected]>
11 years agogeneric_file_direct_write(): get rid of ppos argument
Al Viro [Wed, 12 Feb 2014 01:58:20 +0000 (20:58 -0500)]
generic_file_direct_write(): get rid of ppos argument

always equal to &iocb->ki_pos.

Signed-off-by: Al Viro <[email protected]>
11 years agobtrfs_file_aio_write(): get rid of ppos
Al Viro [Wed, 12 Feb 2014 00:31:06 +0000 (19:31 -0500)]
btrfs_file_aio_write(): get rid of ppos

Signed-off-by: Al Viro <[email protected]>
11 years agokill the 5th argument of generic_file_buffered_write()
Al Viro [Sun, 9 Feb 2014 18:37:49 +0000 (13:37 -0500)]
kill the 5th argument of generic_file_buffered_write()

same story - it's &iocb->ki_pos in all cases

Signed-off-by: Al Viro <[email protected]>
11 years agokill the 4th argument of __generic_file_aio_write()
Al Viro [Sun, 9 Feb 2014 17:58:52 +0000 (12:58 -0500)]
kill the 4th argument of __generic_file_aio_write()

It's always equal to &iocb->ki_pos, where iocb is the value of the 1st
argument.

Signed-off-by: Al Viro <[email protected]>
11 years agolustre: don't open-code kernel_recvmsg()
Al Viro [Sun, 9 Feb 2014 17:07:46 +0000 (12:07 -0500)]
lustre: don't open-code kernel_recvmsg()

Signed-off-by: Al Viro <[email protected]>
11 years agoocfs2: don't open-code kernel_recvmsg()
Al Viro [Sun, 9 Feb 2014 02:09:07 +0000 (21:09 -0500)]
ocfs2: don't open-code kernel_recvmsg()

Signed-off-by: Al Viro <[email protected]>
11 years agodrbd: don't open-code kernel_recvmsg()
Al Viro [Sun, 9 Feb 2014 02:07:38 +0000 (21:07 -0500)]
drbd: don't open-code kernel_recvmsg()

Signed-off-by: Al Viro <[email protected]>
11 years agoconstify blk_rq_map_user_iov() and friends
Al Viro [Sun, 9 Feb 2014 01:42:52 +0000 (20:42 -0500)]
constify blk_rq_map_user_iov() and friends

sg_iovec array passed to it can be const

Signed-off-by: Al Viro <[email protected]>
11 years agolustre: switch to kernel_sendmsg()
Al Viro [Sat, 8 Feb 2014 18:44:43 +0000 (13:44 -0500)]
lustre: switch to kernel_sendmsg()

(casts are due to misannotations in lustre; it uses iovec where kvec would be
correct type; too much noise to properly annotate right now).

Signed-off-by: Al Viro <[email protected]>
11 years agoocfs2: don't open-code kernel_sendmsg()
Al Viro [Sat, 8 Feb 2014 18:27:03 +0000 (13:27 -0500)]
ocfs2: don't open-code kernel_sendmsg()

Signed-off-by: Al Viro <[email protected]>
11 years agotake iov_iter stuff to mm/iov_iter.c
Al Viro [Thu, 6 Feb 2014 00:11:33 +0000 (19:11 -0500)]
take iov_iter stuff to mm/iov_iter.c

Signed-off-by: Al Viro <[email protected]>
11 years agoprocess_vm_access: tidy up a bit
Al Viro [Wed, 5 Feb 2014 18:25:32 +0000 (13:25 -0500)]
process_vm_access: tidy up a bit

saner variable names, update linuxdoc comments, etc.

Signed-off-by: Al Viro <[email protected]>
11 years agoprocess_vm_access: don't bother with returning the amounts of bytes copied
Al Viro [Wed, 5 Feb 2014 18:15:28 +0000 (13:15 -0500)]
process_vm_access: don't bother with returning the amounts of bytes copied

we can calculate that in the caller just fine, TYVM

Signed-off-by: Al Viro <[email protected]>
11 years agoprocess_vm_rw_pages(): pass accurate amount of bytes
Al Viro [Wed, 5 Feb 2014 17:55:11 +0000 (12:55 -0500)]
process_vm_rw_pages(): pass accurate amount of bytes

... makes passing the amount of pages unnecessary

Signed-off-by: Al Viro <[email protected]>
11 years agoprocess_vm_access: take get_user_pages/put_pages one level up
Al Viro [Wed, 5 Feb 2014 17:44:24 +0000 (12:44 -0500)]
process_vm_access: take get_user_pages/put_pages one level up

... and trim the fuck out of process_vm_rw_pages() argument list.

Signed-off-by: Al Viro <[email protected]>
11 years agoprocess_vm_access: switch to copy_page_to_iter/iov_iter_copy_from_user
Al Viro [Wed, 5 Feb 2014 17:14:11 +0000 (12:14 -0500)]
process_vm_access: switch to copy_page_to_iter/iov_iter_copy_from_user

... rather than open-coding those.  As a side benefit, we get much saner
loop calling those; we can just feed entire pages, instead of the "copy
would span the iovec boundary, let's do it in two loop iterations" mess.

Signed-off-by: Al Viro <[email protected]>
11 years agoprocess_vm_access: switch to iov_iter
Al Viro [Wed, 5 Feb 2014 16:51:53 +0000 (11:51 -0500)]
process_vm_access: switch to iov_iter

instead of keeping its pieces in separate variables and passing
pointers to all of them...

Signed-off-by: Al Viro <[email protected]>
11 years agountangling process_vm_..., part 4
Al Viro [Wed, 5 Feb 2014 15:40:15 +0000 (10:40 -0500)]
untangling process_vm_..., part 4

instead of passing vector size (by value) and index (by reference),
pass the number of elements remaining.  That's all we care about
in these functions by that point.

Signed-off-by: Al Viro <[email protected]>
11 years agountangling process_vm_..., part 3
Al Viro [Wed, 5 Feb 2014 15:28:58 +0000 (10:28 -0500)]
untangling process_vm_..., part 3

lift iov one more level out - from process_vm_rw_single_vec to
process_vm_rw_core().  Same story as with the previous commit.

Signed-off-by: Al Viro <[email protected]>
11 years agountangling process_vm_..., part 2
Al Viro [Wed, 5 Feb 2014 15:22:50 +0000 (10:22 -0500)]
untangling process_vm_..., part 2

move iov to caller's stack frame; the value we assign to it on the
next call of process_vm_rw_pages() is equal to the value it had
when the last time we were leaving process_vm_rw_pages().

drop lvec argument of process_vm_rw_pages() - it's not used anymore.

Signed-off-by: Al Viro <[email protected]>
11 years agountangling process_vm_..., part 1
Al Viro [Wed, 5 Feb 2014 15:14:01 +0000 (10:14 -0500)]
untangling process_vm_..., part 1

we want to massage it to use of iov_iter.  This one is an equivalent
transformation - just introduce a local variable mirroring
lvec + *lvec_current.

Signed-off-by: Al Viro <[email protected]>
11 years agoread_code(): go through vfs_read() instead of calling the method directly
Al Viro [Wed, 5 Feb 2014 00:08:21 +0000 (19:08 -0500)]
read_code(): go through vfs_read() instead of calling the method directly

... and don't skip on sanity checks.  It's *not* a hot path, TYVM
(a couple of calls per a.out execve(), for pity sake) and headers of
random a.out binary are not to be trusted.

Signed-off-by: Al Viro <[email protected]>
11 years agofold cifs_iovec_read() into its (only) caller
Al Viro [Tue, 4 Feb 2014 19:19:48 +0000 (14:19 -0500)]
fold cifs_iovec_read() into its (only) caller

Signed-off-by: Al Viro <[email protected]>
11 years agocifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()
Al Viro [Tue, 4 Feb 2014 19:07:43 +0000 (14:07 -0500)]
cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()

... we are doing them on adjacent parts of file, so what happens is that
each subsequent call works to rebuild the iov_iter to exact state it
had been abandoned in by previous one.  Just keep it through the entire
cifs_iovec_read().  And use copy_page_to_iter() instead of doing
kmap/copy_to_user/kunmap manually...

Signed-off-by: Al Viro <[email protected]>
11 years agoswitch vmsplice_to_user() to copy_page_to_iter()
Al Viro [Mon, 3 Feb 2014 23:19:51 +0000 (18:19 -0500)]
switch vmsplice_to_user() to copy_page_to_iter()

I've switched the sanity checks on iovec to rw_copy_check_uvector();
we might need to do a local analog, if any behaviour differences are
not actually bugfixes here...

Signed-off-by: Al Viro <[email protected]>
11 years agoswitch pipe_read() to copy_page_to_iter()
Al Viro [Tue, 4 Feb 2014 00:11:42 +0000 (19:11 -0500)]
switch pipe_read() to copy_page_to_iter()

Signed-off-by: Al Viro <[email protected]>
11 years agocifs_iovec_read(): resubmit shouldn't restart the loop
Al Viro [Tue, 4 Feb 2014 18:47:26 +0000 (13:47 -0500)]
cifs_iovec_read(): resubmit shouldn't restart the loop

... by that point the request we'd just resent is in the
head of the list anyway.  Just return to the beginning of
the loop body...

Signed-off-by: Al Viro <[email protected]>
11 years agointroduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()
Al Viro [Mon, 3 Feb 2014 22:07:03 +0000 (17:07 -0500)]
introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()

generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that.  Just set an iov_iter up and pass *that*
to do_generic_file_aio_read().  With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.

Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.

Signed-off-by: Al Viro <[email protected]>
11 years agoiov_iter: Move iov_iter to uio.h
Kent Overstreet [Thu, 28 Nov 2013 00:29:46 +0000 (16:29 -0800)]
iov_iter: Move iov_iter to uio.h

Signed-off-by: Kent Overstreet <[email protected]>
11 years agodo_shmem_file_read(): call file_read_actor() directly
Al Viro [Mon, 3 Feb 2014 03:18:22 +0000 (22:18 -0500)]
do_shmem_file_read(): call file_read_actor() directly

Signed-off-by: Al Viro <[email protected]>
11 years agocallers of iov_copy_from_user_atomic() don't need pagecache_disable()
Al Viro [Mon, 3 Feb 2014 03:10:25 +0000 (22:10 -0500)]
callers of iov_copy_from_user_atomic() don't need pagecache_disable()

... it does that itself (via kmap_atomic())

Signed-off-by: Al Viro <[email protected]>
11 years agoswitch ->is_partially_uptodate() to saner arguments
Al Viro [Mon, 3 Feb 2014 02:16:54 +0000 (21:16 -0500)]
switch ->is_partially_uptodate() to saner arguments

Signed-off-by: Al Viro <[email protected]>
11 years agopipe: kill ->map() and ->unmap()
Al Viro [Mon, 3 Feb 2014 02:09:54 +0000 (21:09 -0500)]
pipe: kill ->map() and ->unmap()

all pipe_buffer_operations have the same instances of those...

Signed-off-by: Al Viro <[email protected]>
11 years agofuse/dev: use atomic maps
Al Viro [Mon, 3 Feb 2014 02:03:40 +0000 (21:03 -0500)]
fuse/dev: use atomic maps

Signed-off-by: Al Viro <[email protected]>
11 years agoVFS: Make delayed_free() call free_vfsmnt()
David Howells [Fri, 24 Jan 2014 12:17:54 +0000 (12:17 +0000)]
VFS: Make delayed_free() call free_vfsmnt()

Make delayed_free() call free_vfsmnt() so that we don't have two functions
doing the same job.  This requires the calls to mnt_free_id() in free_vfsmnt()
to be moved into the callers of that function.

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
11 years agomn10300: kmap_atomic() returns void *, not unsigned long...
Al Viro [Sun, 2 Feb 2014 11:31:19 +0000 (06:31 -0500)]
mn10300: kmap_atomic() returns void *, not unsigned long...

Signed-off-by: Al Viro <[email protected]>
11 years agocifs: ->rename() without ->lookup() makes no sense
Al Viro [Sat, 1 Feb 2014 13:26:29 +0000 (08:26 -0500)]
cifs: ->rename() without ->lookup() makes no sense

Signed-off-by: Al Viro <[email protected]>
11 years agoget rid of pointless checks for NULL ->i_op
Al Viro [Sat, 1 Feb 2014 09:43:32 +0000 (04:43 -0500)]
get rid of pointless checks for NULL ->i_op

Signed-off-by: Al Viro <[email protected]>
11 years agontfs: don't put NULL into ->i_op/->i_fop
Al Viro [Sat, 1 Feb 2014 09:41:36 +0000 (04:41 -0500)]
ntfs: don't put NULL into ->i_op/->i_fop

Signed-off-by: Al Viro <[email protected]>
11 years agonew helper: readlink_copy()
Al Viro [Fri, 14 Mar 2014 17:42:45 +0000 (13:42 -0400)]
new helper: readlink_copy()

Signed-off-by: Al Viro <[email protected]>
11 years agolustre: generic_readlink() is just fine there, TYVM...
Al Viro [Fri, 14 Mar 2014 16:54:25 +0000 (12:54 -0400)]
lustre: generic_readlink() is just fine there, TYVM...

Signed-off-by: Al Viro <[email protected]>
11 years agoget rid of files_defer_init()
Al Viro [Fri, 14 Mar 2014 16:45:29 +0000 (12:45 -0400)]
get rid of files_defer_init()

the only thing it's doing these days is calculation of
upper limit for fs.nr_open sysctl and that can be done
statically

Signed-off-by: Al Viro <[email protected]>
11 years agonamei.c: move EXPORT_SYMBOL to corresponding definitions
Al Viro [Fri, 14 Mar 2014 16:20:17 +0000 (12:20 -0400)]
namei.c: move EXPORT_SYMBOL to corresponding definitions

Signed-off-by: Al Viro <[email protected]>
11 years agoget_write_access() is inlined, exporting it is pointless
Al Viro [Fri, 14 Mar 2014 16:09:36 +0000 (12:09 -0400)]
get_write_access() is inlined, exporting it is pointless

Signed-off-by: Al Viro <[email protected]>
11 years agotidy do_dentry_open() up a bit
Al Viro [Fri, 14 Mar 2014 13:43:29 +0000 (09:43 -0400)]
tidy do_dentry_open() up a bit

Signed-off-by: Al Viro <[email protected]>
11 years agomark struct file that had write access grabbed by open()
Al Viro [Fri, 14 Mar 2014 16:02:47 +0000 (12:02 -0400)]
mark struct file that had write access grabbed by open()

new flag in ->f_mode - FMODE_WRITER.  Set by do_dentry_open() in case
when it has grabbed write access, checked by __fput() to decide whether
it wants to drop the sucker.  Allows to stop bothering with mnt_clone_write()
in alloc_file(), along with fewer special_file() checks.

Signed-off-by: Al Viro <[email protected]>
11 years agofold __get_file_write_access() into its only caller
Al Viro [Fri, 14 Mar 2014 14:40:46 +0000 (10:40 -0400)]
fold __get_file_write_access() into its only caller

Signed-off-by: Al Viro <[email protected]>
11 years agoget rid of DEBUG_WRITECOUNT
Al Viro [Fri, 14 Mar 2014 14:06:32 +0000 (10:06 -0400)]
get rid of DEBUG_WRITECOUNT

it only makes control flow in __fput() and friends more convoluted.

Signed-off-by: Al Viro <[email protected]>
11 years agodon't bother with {get,put}_write_access() on non-regular files
Al Viro [Fri, 14 Mar 2014 14:56:20 +0000 (10:56 -0400)]
don't bother with {get,put}_write_access() on non-regular files

it's pointless and actually leads to wrong behaviour in at least one
moderately convoluted case (pipe(), close one end, try to get to
another via /proc/*/fd and run into ETXTBUSY).

Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agoncpfs: switch to sockfd_lookup()/sockfd_put()
Al Viro [Thu, 6 Mar 2014 22:41:01 +0000 (17:41 -0500)]
ncpfs: switch to sockfd_lookup()/sockfd_put()

Signed-off-by: Al Viro <[email protected]>
11 years agoswitch nbd to sockfd_lookup/sockfd_put
Al Viro [Thu, 6 Mar 2014 01:41:36 +0000 (20:41 -0500)]
switch nbd to sockfd_lookup/sockfd_put

Signed-off-by: Al Viro <[email protected]>
11 years agovhost: don't open-code sockfd_put()
Al Viro [Thu, 6 Mar 2014 01:39:00 +0000 (20:39 -0500)]
vhost: don't open-code sockfd_put()

Signed-off-by: Al Viro <[email protected]>
11 years agousbip: don't open-code sockfd_lookup/sockfd_put
Al Viro [Thu, 6 Mar 2014 01:33:08 +0000 (20:33 -0500)]
usbip: don't open-code sockfd_lookup/sockfd_put

Signed-off-by: Al Viro <[email protected]>
11 years agoreduce m_start() cost...
Al Viro [Thu, 27 Feb 2014 19:40:10 +0000 (14:40 -0500)]
reduce m_start() cost...

Signed-off-by: Al Viro <[email protected]>
11 years agosmarter propagate_mnt()
Al Viro [Thu, 27 Feb 2014 14:35:45 +0000 (09:35 -0500)]
smarter propagate_mnt()

The current mainline has copies propagated to *all* nodes, then
tears down the copies we made for nodes that do not contain
counterparts of the desired mountpoint.  That sets the right
propagation graph for the copies (at teardown time we move
the slaves of removed node to a surviving peer or directly
to master), but we end up paying a fairly steep price in
useless allocations.  It's fairly easy to create a situation
where N calls of mount(2) create exactly N bindings, with
O(N^2) vfsmounts allocated and freed in process.

Fortunately, it is possible to avoid those allocations/freeings.
The trick is to create copies in the right order and find which
one would've eventually become a master with the current algorithm.
It turns out to be possible in O(nodes getting propagation) time
and with no extra allocations at all.

One part is that we need to make sure that eventual master will be
created before its slaves, so we need to walk the propagation
tree in a different order - by peer groups.  And iterate through
the peers before dealing with the next group.

Another thing is finding the (earlier) copy that will be a master
of one we are about to create; to do that we are (temporary) marking
the masters of mountpoints we are attaching the copies to.

Either we are in a peer of the last mountpoint we'd dealt with,
or we have the following situation: we are attaching to mountpoint M,
the last copy S_0 had been attached to M_0 and there are sequences
S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i},
S_{i} mounted on M{i} and we need to create a slave of the first S_{k}
such that M is getting propagation from M_{k}.  It means that the master
of M_{k} will be among the sequence of masters of M.  On the
other hand, the nearest marked node in that sequence will either
be the master of M_{k} or the master of M_{k-1} (the latter -
in the case if M_{k-1} is a slave of something M gets propagation
from, but in a wrong peer group).

So we go through the sequence of masters of M until we find
a marked one (P).  Let N be the one before it.  Then we go through
the sequence of masters of S_0 until we find one (say, S) mounted
on a node D that has P as master and check if D is a peer of N.
If it is, S will be the master of new copy, if not - the master of S
will be.

That's it for the hard part; the rest is fairly simple.  Iterator
is in next_group(), handling of one prospective mountpoint is
propagate_one().

It seems to survive all tests and gives a noticably better performance
than the current mainline for setups that are seriously using shared
subtrees.

Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agoswitch mnt_hash to hlist
Al Viro [Fri, 21 Mar 2014 01:10:51 +0000 (21:10 -0400)]
switch mnt_hash to hlist

fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating.  Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <[email protected]>
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agodon't bother with propagate_mnt() unless the target is shared
Al Viro [Fri, 21 Mar 2014 14:14:08 +0000 (10:14 -0400)]
don't bother with propagate_mnt() unless the target is shared

If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agokeep shadowed vfsmounts together
Al Viro [Fri, 21 Mar 2014 00:34:43 +0000 (20:34 -0400)]
keep shadowed vfsmounts together

preparation to switching mnt_hash to hlist

Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agoresizable namespace.c hashes
Al Viro [Fri, 28 Feb 2014 18:46:44 +0000 (13:46 -0500)]
resizable namespace.c hashes

* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
11 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 29 Mar 2014 22:01:09 +0000 (15:01 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "A late breaking fix from John.  (The bug fixed has a hard lockup
  potential, but that was not observed, warnings were)"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Revert to calling clock_was_set_delayed() while in irq context

11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph...
Linus Torvalds [Sat, 29 Mar 2014 22:00:27 +0000 (15:00 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull Ceph fix from Sage Weil:
 "This drops a bad assert that a few users have been hitting but we've
  only recently been able to track down"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: drop an unsafe assertion

11 years agorbd: drop an unsafe assertion
Alex Elder [Tue, 25 Mar 2014 13:36:02 +0000 (15:36 +0200)]
rbd: drop an unsafe assertion

Olivier Bonvalet reported having repeated crashes due to a failed
assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
rbd_assert(which >= img_request->next_completion);

With a lot of help from Olivier with reproducing the problem
we were able to determine the object and image requests had
already been completed (and often freed) at the point the
assertion failed.

There was a great deal of discussion on the ceph-devel mailing list
about this.  The problem only arose when there were two (or more)
object requests in an image request, and the problem was always
seen when the second request was being completed.

The problem is due to a race in the window between setting the
"done" flag on an object request and checking the image request's
next completion value.  When the first object request completes, it
checks to see if its successor request is marked "done", and if
so, that request is also completed.  In the process, the image
request's next_completion value is updated to reflect that both
the first and second requests are completed.  By the time the
second request is able to check the next_completion value, it
has been set to a value *greater* than its own "which" value,
which caused an assertion to fail.

Fix this problem by skipping over any completion processing
unless the completing object request is the next one expected.
Test only for inequality (not >=), and eliminate the bad
assertion.

Tested-by: Olivier Bonvalet <[email protected]>
Signed-off-by: Alex Elder <[email protected]>
Reviewed-by: Sage Weil <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 28 Mar 2014 22:09:37 +0000 (15:09 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) We've discovered a common error in several networking drivers, they
    put VLAN offload features into ->vlan_features, which would suggest
    that they support offloading 2 or more levels of VLAN encapsulation.
    Not only do these devices not do that, but we don't have the
    infrastructure yet to handle that at all.

    Fixes from Vlad Yasevich.

 2) Fix tcpdump crash with bridging and vlans, also from Vlad.

 3) Some MAINTAINERS updates for random32 and bonding.

 4) Fix late reseeds of prandom generator, from Sasha Levin.

 5) Bridge doesn't handle stacked vlans properly, fix from Toshiaki
    Makita.

 6) Fix deadlock in openvswitch, from Flavio Leitner.

 7) get_timewait4_sock() doesn't report delay times correctly, fix from
    Eric Dumazet.

 8) Duplicate address detection and addrconf verification need to run in
    contexts where RTNL can be obtained.  Move them to run from a
    workqueue.  From Hannes Frederic Sowa.

 9) Fix route refcount leaking in ip tunnels, from Pravin B Shelar.

10) Don't return -EINTR from non-blocking recvmsg() on AF_UNIX sockets,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (28 commits)
  vlan: Warn the user if lowerdev has bad vlan features.
  veth: Turn off vlan rx acceleration in vlan_features
  ifb: Remove vlan acceleration from vlan_features
  qlge: Do not propaged vlan tag offloads to vlans
  bridge: Fix crash with vlan filtering and tcpdump
  net: Account for all vlan headers in skb_mac_gso_segment
  MAINTAINERS: bonding: change email address
  MAINTAINERS: bonding: change email address
  ipv6: move DAD and addrconf_verify processing to workqueue
  tcp: fix get_timewait4_sock() delay computation on 64bit
  openvswitch: fix a possible deadlock and lockdep warning
  bridge: Fix handling stacked vlan tags
  bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
  vhost: validate vhost_get_vq_desc return value
  vhost: fix total length when packets are too short
  random32: avoid attempt to late reseed if in the middle of seeding
  random32: assign to network folks in MAINTAINERS
  net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  vlan: Set hard_header_len according to available acceleration
  ...

11 years agoMerge branch 'vlan_offloads'
David S. Miller [Fri, 28 Mar 2014 21:17:16 +0000 (17:17 -0400)]
Merge branch 'vlan_offloads'

Vlad Yasevich says:

====================
Audit all drivers for correct vlan_features.

Some drivers set vlan acceleration features in vlan_features.  This causes
issues with Q-in-Q/802.1ad configurations.

Audit all the drivers for correct vlan_features.  Fix broken ones.
Add a warning to vlan code to help catch future offenders.
====================

Signed-off-by: David S. Miller <[email protected]>
11 years agovlan: Warn the user if lowerdev has bad vlan features.
Vlad Yasevich [Fri, 28 Mar 2014 02:14:49 +0000 (22:14 -0400)]
vlan: Warn the user if lowerdev has bad vlan features.

Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.

Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoveth: Turn off vlan rx acceleration in vlan_features
Vlad Yasevich [Fri, 28 Mar 2014 02:14:48 +0000 (22:14 -0400)]
veth: Turn off vlan rx acceleration in vlan_features

For completeness, turn off vlan rx acceleration in vlan_features so
that it doesn't show up on q-in-q setups.

Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoifb: Remove vlan acceleration from vlan_features
Vlad Yasevich [Fri, 28 Mar 2014 02:14:47 +0000 (22:14 -0400)]
ifb: Remove vlan acceleration from vlan_features

Do not include vlan acceleration features in vlan_features as that
precludes correct Q-in-Q operation.

Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoqlge: Do not propaged vlan tag offloads to vlans
Vlad Yasevich [Fri, 28 Mar 2014 02:14:46 +0000 (22:14 -0400)]
qlge: Do not propaged vlan tag offloads to vlans

qlge driver turns off NETIF_F_HW_CTAG_FILTER, but forgets to
turn off HW_CTAG_TX and HW_CTAG_RX on vlan devices.  With the
current settings, q-in-q will only generate a single vlan header.
Remember to mask off CTAG_TX and CTAG_RX features in vlan_features.

CC: Shahed Shaikh <[email protected]>
CC: Jitendra Kalsaria <[email protected]>
CC: Ron Mercer <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Acked-by: Jitendra Kalsaria <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agobridge: Fix crash with vlan filtering and tcpdump
Vlad Yasevich [Fri, 28 Mar 2014 01:51:18 +0000 (21:51 -0400)]
bridge: Fix crash with vlan filtering and tcpdump

When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference.  The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set.  It will then try
to get the table pointer and process the packet based
on the table.  Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().

Reported-by: Toshiaki Makita <[email protected]>
CC: Toshiaki Makita <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
Acked-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agonet: Account for all vlan headers in skb_mac_gso_segment
Vlad Yasevich [Thu, 27 Mar 2014 21:26:18 +0000 (17:26 -0400)]
net: Account for all vlan headers in skb_mac_gso_segment

skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.

Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoMAINTAINERS: bonding: change email address
Veaceslav Falico [Thu, 27 Mar 2014 17:43:50 +0000 (18:43 +0100)]
MAINTAINERS: bonding: change email address

Signed-off-by: Veaceslav Falico <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoMAINTAINERS: bonding: change email address
Jay Vosburgh [Thu, 27 Mar 2014 17:33:44 +0000 (10:33 -0700)]
MAINTAINERS: bonding: change email address

Update my email address.

Signed-off-by: Jay Vosburgh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoMerge branch 'akpm' (patches from Andrew Morton)
Linus Torvalds [Fri, 28 Mar 2014 20:57:13 +0000 (13:57 -0700)]
Merge branch 'akpm' (patches from Andrew Morton)

Merge two fixes from Andrew Morton:
 "The x86 fix should come from x86 guys but they appear to be
  conferencing or otherwise distracted.

  The ocfs2 fix is a bit of a mess - the code runs into an immediate
  NULL deref and we're trying to work out how this got through test and
  review, but we haven't heard from Goldwyn in the past few days.
  Sasha's patch fixes the oops, but the feature as a whole is probably
  broken.  So this is a stopgap for 3.14 - I'll aim to get the real
  fixes into 3.14.x"

* emailed patches from Andrew Morton [email protected]>:
  x86: fix boot on uniprocessor systems
  ocfs2: check if cluster name exists before deref

11 years agox86: fix boot on uniprocessor systems
Artem Fetishev [Fri, 28 Mar 2014 20:33:39 +0000 (13:33 -0700)]
x86: fix boot on uniprocessor systems

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too.  Enabling them also fixes
rapl_pmu code.

Signed-off-by: Artem Fetishev <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
11 years agoocfs2: check if cluster name exists before deref
Sasha Levin [Fri, 28 Mar 2014 20:33:38 +0000 (13:33 -0700)]
ocfs2: check if cluster name exists before deref

Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: strlcpy (lib/string.c:388 lib/string.c:151)
  CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
  RIP:  strlcpy (lib/string.c:388 lib/string.c:151)
  Call Trace:
   ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
   ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
   user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
   dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
   vfs_mkdir (fs/namei.c:3467)
   SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
   tracesys (arch/x86/kernel/entry_64.S:749)

akpm: this patch probably disables the feature.  A temporary thing to
avoid triviel oopses.

Signed-off-by: Sasha Levin <[email protected]>
Cc: Goldwyn Rodrigues <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
11 years agoipv6: move DAD and addrconf_verify processing to workqueue
Hannes Frederic Sowa [Thu, 27 Mar 2014 17:28:07 +0000 (18:28 +0100)]
ipv6: move DAD and addrconf_verify processing to workqueue

addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.

Reported-by: Stephen Hemminger <[email protected]>
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agotcp: fix get_timewait4_sock() delay computation on 64bit
Eric Dumazet [Thu, 27 Mar 2014 14:19:19 +0000 (07:19 -0700)]
tcp: fix get_timewait4_sock() delay computation on 64bit

It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.

This could result in wrong output in /proc/net/tcp for tm->when field.

Fixes: 96f817fedec4 ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoopenvswitch: fix a possible deadlock and lockdep warning
Flavio Leitner [Thu, 27 Mar 2014 14:05:34 +0000 (11:05 -0300)]
openvswitch: fix a possible deadlock and lockdep warning

There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <[email protected]>
=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agobridge: Fix handling stacked vlan tags
Toshiaki Makita [Thu, 27 Mar 2014 12:46:56 +0000 (21:46 +0900)]
bridge: Fix handling stacked vlan tags

If a bridge with vlan_filtering enabled receives frames with stacked
vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not
only the outer tag but also the inner tag.

br_vlan_untag() is called only from br_handle_vlan(), and in this case,
it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already
been set before calling br_handle_vlan().

Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agobridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
Toshiaki Makita [Thu, 27 Mar 2014 12:46:55 +0000 (21:46 +0900)]
bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled

Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci
if they are tagged, but if vlan tx offload is manually disabled on bridge
device and frames are sent from vlan device on the bridge device, the tags
are embedded in skb->data and they break this assumption.
Extract embedded vlan tags and move them to vlan_tci at ingress.

Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agovhost: validate vhost_get_vq_desc return value
Michael S. Tsirkin [Thu, 27 Mar 2014 10:53:37 +0000 (12:53 +0200)]
vhost: validate vhost_get_vq_desc return value

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014adfea6f173c1ef6378f7e5e7924866c923
    vhost-net: mergeable buffers support

CVE-2014-0055

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agovhost: fix total length when packets are too short
Michael S. Tsirkin [Thu, 27 Mar 2014 10:00:26 +0000 (12:00 +0200)]
vhost: fix total length when packets are too short

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agorandom32: avoid attempt to late reseed if in the middle of seeding
Sasha Levin [Fri, 28 Mar 2014 16:38:42 +0000 (17:38 +0100)]
random32: avoid attempt to late reseed if in the middle of seeding

Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
nonblocking pool becomes initialized") has added a late reseed stage
that happens as soon as the nonblocking pool is marked as initialized.

This fails in the case that the nonblocking pool gets initialized
during __prandom_reseed()'s call to get_random_bytes(). In that case
we'd double back into __prandom_reseed() in an attempt to do a late
reseed - deadlocking on 'lock' early on in the boot process.

Instead, just avoid even waiting to do a reseed if a reseed is already
occuring.

Fixes: 4af712e8df99 ("random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized")
Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Fri, 28 Mar 2014 20:03:00 +0000 (13:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input subsystem fixes from Dmitry Torokhov:
 "Updates to Synaptics touchpad to better cope with devices in Lenovo
  laptops, and a couple more fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics - add manual min/max quirk for ThinkPad X240
  Input: synaptics - add manual min/max quirk
  Input: cypress_ps2 - don't report as a button pads
  Input: da9052_onkey - use correct register bit for key status
  Input: adp5588-keys - get value from data out when dir is out

11 years agorandom32: assign to network folks in MAINTAINERS
Sasha Levin [Thu, 27 Mar 2014 06:01:34 +0000 (02:01 -0400)]
random32: assign to network folks in MAINTAINERS

lib/random32.c was split out of the network code and is de-facto
still maintained by the almighty net/ gods.

Make it a bit more official so that people who aren't aware of
that know where to send their patches.

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
11 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 28 Mar 2014 17:58:10 +0000 (10:58 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "I didn't want these to wait for stable cycle.

  The nouveau and radeon ones are the same problem, where the runtime pm
  stuff broke non-runtime pm managed secondary GPUs.

  The udl fix is for an oops on unplug, and the i915 fix is for a
  regression on Sandybridge even though it may break haswell (regression
  wins)"

Daniel Vetter comments:
 "My apologies for the i915 regression fumble, that thing somehow fell
  through the cracks here for almost half a year :( Imo that's more than
  enough flailing to just go ahead with the revert, and the re-broken
  hsw should get peoples attention ..."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Undo gtt scratch pte unmapping again
  drm/radeon: fix runtime suspend breaking secondary GPUs
  drm/nouveau: fail runtime pm properly.
  drm/udl: take reference to device struct for dma-bufs

11 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Fri, 28 Mar 2014 17:55:44 +0000 (10:55 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c build fix from Wolfram Sang:
 "The build fix from my last request unveiled another build problem
  which is fixed with this patch"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: cpm: Fix build by adding of_address.h and of_irq.h

11 years agoMerge tag 'stable/for-linus-3.14-rc8-tag' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Fri, 28 Mar 2014 17:52:05 +0000 (10:52 -0700)]
Merge tag 'stable/for-linus-3.14-rc8-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen bugfixes from David Vrabel:
 "Fix two bugs that cause x86 PV guest crashes.

  1. Ballooning a 32-bit guest would eventually crash it.

  2. Revert a broken fix for a regression with NUMA_BALACING.  The bad
     fix caused PV guests to crash after migration.  This is not ideal
     but unpicking the madness that is _PAGE_NUMA == _PAGE_PROTNONE will
     take a while longer"

* tag 'stable/for-linus-3.14-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  Revert "xen: properly account for _PAGE_NUMA during xen pte translations"
  xen/balloon: flush persistent kmaps in correct position

11 years agoInput: synaptics - add manual min/max quirk for ThinkPad X240
Hans de Goede [Fri, 28 Mar 2014 08:01:38 +0000 (01:01 -0700)]
Input: synaptics - add manual min/max quirk for ThinkPad X240

This extends Benjamin Tissoires manual min/max quirk table with support for
the ThinkPad X240.

Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
11 years agoInput: synaptics - add manual min/max quirk
Benjamin Tissoires [Fri, 28 Mar 2014 07:43:00 +0000 (00:43 -0700)]
Input: synaptics - add manual min/max quirk

The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.

Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.

We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).

So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.

Signed-off-by: Benjamin Tissoires <[email protected]>
CC: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
11 years agotime: Revert to calling clock_was_set_delayed() while in irq context
John Stultz [Thu, 27 Mar 2014 23:30:49 +0000 (16:30 -0700)]
time: Revert to calling clock_was_set_delayed() while in irq context

In commit 47a1b796306356f35 ("tick/timekeeping: Call
update_wall_time outside the jiffies lock"), we moved to calling
clock_was_set() due to the fact that we were no longer holding
the timekeeping or jiffies lock.

However, there is still the problem that clock_was_set()
triggers an IPI, which cannot be done from the timer's hard irq
context, and will generate WARN_ON warnings.

Apparently in my earlier testing, I'm guessing I didn't bump the
dmesg log level, so I somehow missed the WARN_ONs.

Thus we need to revert back to calling clock_was_set_delayed().

Signed-off-by: John Stultz <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
11 years agodrm/i915: Undo gtt scratch pte unmapping again
Daniel Vetter [Wed, 26 Mar 2014 19:10:09 +0000 (20:10 +0100)]
drm/i915: Undo gtt scratch pte unmapping again

It apparently blows up on some machines. This functionally reverts

commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <[email protected]>
Date:   Wed Oct 16 09:21:30 2013 -0700

    drm/i915: Disable GGTT PTEs on GEN6+ suspend

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841
Reported-and-Tested-by: Brad Jackson <[email protected]>
Cc: [email protected]
Cc: Takashi Iwai <[email protected]>
Cc: Paulo Zanoni <[email protected]>
Cc: Todd Previte <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>