[16:18:11]  halfline: nu? [16:20:34]  rmcgrath: hey [16:20:45]  i don't know what nu means, but nu right back at ya [16:21:05]  rmcgrath: let me page back into my brain why i pinged you [16:21:49]  rmcgrath: so i'm looking at this code: http://pastebin.mandriva.com/18195 [16:22:07]  starting around line 966 [16:22:29]  and on line 998 i see a comment [16:22:37]  about a small optimization [16:23:02]  but i can't make heads or tails how the optimization mentioned there would actually be an optimization [16:23:12]  unless the if condition was inverted [16:23:28]  am i missing something? [16:24:24]  one detail is line 975 isn't supposed to be commented out [16:25:15]  it's basically, forking() creating a new session, attaching the tty to that session, run rc.sysinit in that session, and then after ward it gets to that optimization [16:26:40]  you expect sysvinit code to make sense? [16:27:25]  to me the optimization says "if the process group attached to the terminal is no longer the one associated with the session created before then bail else detach the tty from everything" [16:27:35]  seems like the intent is to make the ctty cease to be associated with the current process.  if the tty pgrp is already not the current process, it seems to think that means the ctty is already disassociated. [16:29:38]  why would the comment say "stealing tty back" ? [16:29:59]  if it doesn't care if something else has it? [16:30:34]  notting: well i know sysvinit is yesterdays init system, but mandriva still uses it [16:30:39]  it doesn't make all that much sense to me.  i think the extra fork-child there doing setsid before setting its ctty might eject that tty from everything in other sessions, but i'd have to check [16:31:35]  well yea it would, but that code will never get run right? [16:31:40]  because of the "optimization" ? [16:32:01]  but i'm not entirely clear on why it doesn't just use TIOCNOTTY.  i guess the idea is there might be some other processes left around with that ctty. [16:32:52]  but if some other process has that ctty then tcgetpgrp(f) won't be equal to getpid() [16:33:26]  no, only if that other process is in the foreground pgrp [16:33:57]  i don't suppose it's possible to find whoever wrote that code and ask them what they thought they were doing [16:34:39]  Miquel van Smoorenburg is a name i've only ever seen in the top of header files and source code before [16:38:27]  rmcgrath: so the real problem i'm hitting is mandriva has this setup where they start gdm from rc.sysinit [16:38:29]  and wait for X to start [16:38:56]  and after rc.sysinit exits, that TIOCNOTTY ioctl kicks in [16:39:03]  and makes X go bonkers [16:39:33]  you mean TIOCSTTY on line 1017? [16:39:34]  but only if plymouth was running when the first TIOCNOTTY before rc.sysinit is ran gets executed [16:39:43]  yes [16:40:01]  well there are two TIOCSTTY calls, and you need both of them for this mandriva bug to happen [16:40:18]  but there are no TIOCNOTTY calls [16:40:19]  the first one is run before rc.sysinit, and the second after rc.sysinit [16:40:34]  sorry copy and pasted the wrong jumble of letters [16:40:45]  TIOCSTTY [16:41:46]  if plymouth is running when the first TIOCSTTY ioctl happens then rc.sysinit runs and X starts, and the second TIOCSTTY call causes X to go into 100% cpu loop [16:41:53]  getting EIO when it talks to the tty [16:42:39]  well, it seems like the intent is to make sure nothing run by rc.sysinit is still connected to the console afterwards.  so that is in fundamental conflict with rc.sysinit starting up something that should continue to run after rc.sysinit finishes and will use the console [16:42:43]  when that first TIOCSTTY ioctl happens, plymouth gets notices the hangup on its tty fd [16:42:44]  and reopens it [16:43:44]  rmcgrath: well sure, on the other hand their start X from rc.sysinit thing works without plymouth in the picture [16:44:00]  and they want to do the minimal changes necessary to keep it working with plymouth in the picture [16:44:24]  i'm not going to make a judgement call on the merits of starting gdm from rc.sysinit [16:44:37]  but i would like to understand why plymouth makes it break [16:44:59]  this whole thing is done for runlevels *#sS, doesn't that mean X should not be started? [16:45:26]  one of those s's is rc.sysinit i think [16:46:24]  one s is sysinit the other s single user mode pretty sure [16:48:33]  oh there's another weird thing, if they force X to start on vt8 instead of vt7 it works (they run plymouth on vt7, but plymouth is quit before X is started) [16:50:06]  if console_dev is /dev/tty7 then that makes sense [16:50:29]  nah console_dev is /dev/console i think [16:50:35]  or maybe /dev/tty0 [16:51:14]  my guess is when plymouth isn't run, that second TIOCSCTTY ioctl never fires [16:51:21]  because of the optimization [16:51:41]  but when plymouth runs that optimization starts failing for some reason [16:52:16]  and this code that probably hardly ever gets exercised is suddenly getting run, causing their X to nosedive [16:53:27]  maybe when plymouth reopens its tty after the first TIOCSCTTY it ends up stealing the controlling terminal away [16:54:01]  but it opens with O_NOCTTY so i don't think that should play a role [16:54:09]  well, the "optimization" certainly seems like it could be timing-sensitive depending what is going on [16:55:16]  well it's all very reliably failing. he added a sleep 1m after X started before leaving rc.sysinit and it still triggers as soon as rc.sysinit exits [16:55:23]  hmm [16:56:02]  i did look at the X server and notice it creates its socket before opening it's tty so there is a race there [16:56:20]  but not a 1minute race [16:56:39]  this may require more thought than i'm capable of this early in the morning. ;-)  probably becomes somewhat clear given a stap trace of all the proc_clear_tty/proc_set_tty calls [16:57:13]  hmm i wonder if they have systemtap [16:57:29]  oh, you aren't debugging this yourself? [16:57:38]  no, this is all mandriva [16:57:54]  fcrozat trying to get plymouth working with "speedboot" [17:02:18]  i'll get him to poke more on monday morning (well afternoon his time) [17:17:14]  i bet tcgetpgrp() is returning -1 [17:18:58]  checking if that is true + a ps command showing all process groups and sessions in play should be enough to figure this out