[10:36:24]  there's no obvious way that i can get the kernel to lower the scheduling priority of a process on the other end of a socket from me, is there [10:36:34]  i mean, besides renicing it myself. [10:37:18]  the context here is throttling rendering apps at the production side, since anywhere later is useless [10:37:42]  if i make X just ignore their socket for a bit then the protocol will just buffer up in the kernel [10:37:55]  so you can stil lrace ahead by arbitrary amounts [10:39:17]  hrm. [10:41:18]  that doesn't make sense to me. [10:42:48]  X clients can submit rendering requests without waiting for any kind of reply to know that they happened. [10:43:08]  no, i mean, it doesn't make sense to me what you want the kernel to do. [10:43:23]  reduce your priority if the requests buffering? [10:44:08]  maybe?  beyond some watermark that might work, if you ignore putimage. [10:44:52]  or actually, what about this [10:45:42]  once X reads from a socket, anything it does until the next select is logically accounted against the timeslice of the process on the other end of that socket [10:46:23]  narf. [10:46:33]  gross, i admit. [10:46:35]  that's... interesting. [10:47:02]  can we just invent something to use besides unix sockets? :) [10:47:13]  if you make them as fast as unix sockets, sure. [10:47:30]  well, it obviously only applies to local connections, soo.... [10:47:54]  i didn't think unix sockets were all that fast. [10:48:01]  i'm sure we could do better. [10:48:02]  the bottom end of my transport layer is abstracted out, apps don't see it.  we already switched it to use abstract namespace sockets. [10:48:15]  why the fuck am i getting eight million copies? [10:48:25]  they're approximately as fast as pipes, believe it or not [10:48:42]  (of my mailing list messages.) [10:49:22]  oh, i guess i haven't really cared since pre-splice. [10:49:50]  (i should have probably checked to see if we actually use it before saying that. heh.) [10:49:53]  we did try a shared memory ring buffer scheme at one point, it was marginally faster for throughput but latency got a little worse since you had to do a pipe dance to get select to wake up [10:50:14]  but that was like ten years ago, eventfd might make it a proper win. [10:50:34]  that was what i was thinking. [10:50:59]  * kylem wonders what wayland does. [10:51:06]  unix sockets, iirc [10:51:31]  given that krh was talking about having implemented dnd with fd passing [10:51:44]  haha. [10:52:09]  beats the shit out of how X does it, copies ahoy [10:52:24]  yeah, just a fucking weird thing to say. [10:59:42]  so what i have in X right now, at least for intel, is that every so often i do a partial gpu command queue drain if i detect that i'm behind in rendering [11:00:05]  if anything was submitted 20ms ago and still isn't done, is what the heuristic works out to [11:00:37]  but that blows because it blocks all of X, so any ipc you're trying to do with the non-greedy client also gets blocked [11:01:32]  i could kind of fix that by tagging command submits with a client id, and ignoring those sockets, but that just pushes the queueing to the unix socket; the greedy client still monopolizes my time for as far ahead as i queue [11:02:06]  right. [11:02:08]  hrm. [11:02:49]  gah, i wish i was in boston so we could whiteboard this. [11:04:31]  although, that's an option too [11:05:32]  setsockopt(fd, PF_UNIX, SO_RCVQLEN, { 4096 }) [11:05:59]  and then the producer would block in socket write [11:06:04]  ***  half_ine_ is now known as halfline. [11:07:07]  heh. i wonder. [11:07:18]  all this is abstracted in xlib on the client-side right? [11:07:35]  xtrans and/or xcb, but yeah. [11:07:50]  oh hey, i do have SO_RCVBUF [11:09:11]  ooh, and SO_RCVLOWAT, although i doubt that matters much [11:12:23]  yeah, lowat doesn't do anything useful.  but rcvbuf might. [11:13:13]  ajax: we really do need a way for a server to tell the kernel when it's doing work on behalf of a particular client [11:13:31]  that would solve the whole dbus auditing problem steve grubb has too [11:13:54]  maybe something to talk about at plumbers... [11:14:49]  solaris had an X extension for this, apparently? [11:14:50]  https://bugs.freedesktop.org/show_bug.cgi?id=2192 [11:15:30]  i forget most of the details, i think that one was mostly about bumping client priority to reduce frontend latency, not about throttling stupid shit [11:16:22]  hmm too hard to grok that in passing [11:17:19]  ***  pjones has left chat #fedora-kernel (Changing host). [11:17:19]  ***  pjones (~pjones@fedora/pjones) has joined chat #fedora-kernel. [11:18:08]  basically i think we need something like prctl (PR_SET_CLIENT, client_fd) [11:18:57]  and from that point on the kernel knows the process is doing work on behalf of the process on the otherside of the fd [11:19:08]  i think that's too much overhead for X, but for audit sure. [11:19:24]  oh i see [11:19:27]  i really can't afford two more syscalls per request [11:19:36]  well we could do it in a smarter way [11:19:40]  well, okay, per soft-ctxsw, but sure. [11:19:54]  fcntl [11:20:48]  maybe.  if you make it an fcntl then obvious you enter a client context once you do read() [11:20:56]  but when do you exit that client's context? [11:21:06]  write?  next read/select? [11:21:12]  Optimizing Unix Resource Scheduling for User Interaction [11:21:12]  Steve Evans, Kevin Clarke, Dave Singleton, Bart Smaalders [11:21:13]  SunSoft Inc. [11:21:16]  that's a blast from the past. :) [11:21:47]  ajax: hmm [11:22:04]  yeah, there's been plenty of work on this in the past. remember when reading it that HZ used to be 100. [11:22:15]  which is _way_ too slow for graphics [11:22:52]  yeah, i can't think of many ways to do this that aren't a completely horrid hack. [11:23:04]  i'll read the paper afterl unch. [11:23:18]  i think rcvbuf does actually get me a lot of the way to where i want to be [11:24:15]  i don't know what my typical readq is for a greedy client, but i can find that out, that's just numerology [11:28:27]  so we could have a magix futex associated with each client [11:28:40]  when you hold it, the kernel knows you're serving that client [11:29:40]  hmm but the kernel doesn't get involved when it's uncontended i guess reading hte man page [11:30:47]  i guess the point is, you can't add any new calls at all [11:30:55]  it has to be implicitly figured out from what you already do [11:30:57]  i wonder if we could do something awesome with fuse. [11:31:02]  since what you already do is performance critical [11:31:53]  unless it was something really fast, like writing to a special address in memory? [11:33:07]  hrm. [11:33:17]  we could do the prctl thing with a vdso. that would be relatively fast. [11:33:51]  oh like the gettimeofday hack? [11:33:58]  yeh. [11:33:59]  ***  drago01 (~linux@chello062178124135.3.13.univie.teleweb.at) has joined chat #fedora-kernel. [11:34:03]  and getpid. [11:34:12]  (Wait, did we ever put getpid in there/) [11:34:28]  pretty sure we did [11:34:36]  i thought that only worked for readng from the kernel not writing to the kernel? [11:34:49]  back in a bit. [11:35:00]  halfline, we can't write to kernel space, but we can put somethign somewhere the kernel can easily get. [11:35:01]  vdso is readonly right now, yeah [11:35:28]  a writeable vdso segment isn't _that_ much logically different from a futex [11:35:46]  it's just a bunch of predefined futexes.. [11:37:10]  okay i don't know much about them [11:40:03]  i think we just need some writeable mapped memory, a single integer where we write to the kernel "i'm handling this client now" and afterwards "i'm done" [11:40:43]  that doesn't help scheduling [11:41:03]  accounting, sure, because that will read the ctx value out when it needs it [11:41:29]  but once the kernel has the accounting information, it can perform scheduling tweaks for you [11:41:36]  but the scheduler assumes ctx transitions happen at scheduling, you'd have to tell it more explicitly [11:41:41]  the write itself won't trigger anything [11:41:44]  unless it's a pagefault [11:41:58]  and page faults are very expensive [11:43:24]  anyway, something to gnaw on. [11:44:13]  maybe the answer is what you said originally, force all apps to only deal with one fd per iteration of poll [11:44:37]  and mark that one fd ahead as one to use for accounting [11:44:49]  accounting ends on next poll [11:45:05]  maybe. [11:45:18]  but i mean, that's sort of secondary? [11:45:30]  or maybe accounting ends on next poll or on next read of some other fd in the fdset [11:45:35]  the problem that attempts to solve is the scheduler making bad decisions [11:46:03]  and i don't think it's making bad decisions.  there appears to be more work to do so it's doing it. [11:46:34]  well the issue is, the kernel can only make decisions based on the available information [11:46:38]  if i want to influence that i should make it look like there's nothing to do. [11:46:50]  and for a server, one important peice of information is which clients its serving and when [11:46:54]  but the kernel doesn't have that information [11:46:58]  ***  adamw has left chat #fedora-kernel (Quit: Coyote finally caught me). [11:47:16]  that's why i'm saying try shrinking the recieve buffer [11:47:28]  ajax: your recvbuf thing will probably be "good enough" for your specific issue [11:47:33]  client write()s, it blocks because the buffer is full. [11:47:35]  and that's fine [11:48:00]  i suspect fixing that will actually make it so accounting tricks aren't even needed though [11:48:06]  was just hoping to kill some other birds with a new stone [11:48:27]  well i'd like to fix the sgrubb auditing problem too [11:49:03]  but i guess you probably don't want an audit entry for every draw operation in the x server so... [11:49:10]  maybe i'm shoehorning where i shouldn't be [11:49:11]  yeah.  i think those end up being different enough problems that you don't want to conflate them. [11:49:19]  good thought and all, but. [11:52:53]  part of the issue is, the accounting has to be very fast to be something X could make use of for improved scheduling, which probably means it has to be deduced implicitly [11:53:12]  but the accounting has to be very trustworthy and accurate for it to be something audit could make use of [11:53:22]  which probably means it can't be deduced implicitly [11:53:34]  yeah.  make it work, make it good, then make it fast. [11:54:44]  prctl seems entirely reasonable for audit's needs and if i ever think i need it for X i can probably make it work [11:54:59]  like, right now we don't do any estimation of request cost [11:55:21]  which is lame.  i've got a ton of information about that. [11:55:36]  and i try to drain multiple reqs per read(), so. [11:56:07]  i could amortize the prctl across multiple reqs and only fire it if i think the next timeslice is going to be expensive [11:57:14]  yea maybe something like that would work