[12:45:05]  sure [12:46:51]  it seems like it'd definitely be worth at least an explanatory comment in the glib sources too [12:48:58]  i don't see it, looking in gutf8.c in master [12:51:53]  me? [12:52:03]  behdad: if so sure [12:52:48]  err [12:59:41]  behdad: if it's easier an email is great too, i can forward to dbus list if you're not subscribed intermittent problems I was having [13:08:36]  walters: I prefer 2 actually [13:09:01]  I don't remember the reason, but the FDD0..FDEF is also non-characters [13:09:08]  no valid text includes them [13:09:31]  the crash should be fixed in other ways.  there's no legitimate reason for sending those [13:10:28]  you guys did see havoc's reply on that thread right? [13:10:35]  walters: for 2), you just need to replace both instances of 0xFFFF with 0xFFFE on the last line. [13:10:43]  nope [13:10:44]  * behdad checks [13:10:51]  was basically [13:10:59]  - code was ruthlessly stolen from glib [13:11:07]  - glib later had a "fix" for that macro [13:11:12]  - dbus never got the same fix [13:11:15]  with the implication that [13:11:20]  - dbus should get the same fix [13:11:24]  right, he supports my conclusion [13:11:37]  yea [13:11:37]  Thiago looks really crazy now [13:11:40]  wtf [13:11:56]  walters: the range is Non-Characters [13:12:00]  lemme find you the reference seconds). [13:13:13]  walters: search for noncharacter here: http://www.unicode.org/Public/UNIDATA/PropList.txt [13:13:21]  that's essentially the canonical place [13:13:29]  * behdad checks the book for the reason [13:13:33]  I'm guessing legacy reasons [13:14:25]  walters: yet another resolution is to remove the check completely. [13:14:31]  why is dbus validating at all? [13:14:36]  unless it uses the text, it shouldn't. [13:14:41]  halfline: hmm but...that change doesn't affect 0xFDD0, does it? [13:15:00]  the idea is if you get a utf-8 string from dbus you can put it in pango essentially [13:15:21]  there's bunch other validations they need to do anyway [13:15:46]  pango validates [13:15:47]  it may be a bad design decision in retrospect perhaps [13:15:56]  the rule should be simple: if you got it from outside, you validate [13:15:58]  (and use) [13:16:13]  does it check for non-minimal utf8 encoding? [13:16:15]  do you know? [13:16:28]  nope [13:16:32]  ok [13:16:41]  relevant function is dbus/dbus-string.c:_dbus_string_validate_utf8 [13:17:20]  got the git address offhand? [13:17:23]  git+ssh [13:17:43]  * behdad sets up a git url scheme for fdo [13:17:50]  ssh://git.freedesktop.org/git/dbus/dbus [13:18:00]  thanks [13:18:03]  i'll be back in a few mins, late lunch [13:18:08]  k [13:18:25]  well, operating under the assumption i can find my wallet [13:19:01]  walters: if you can't, feel free to take one of the indian foods from my desk [13:21:06]  walters: looks sane otherwise [13:21:31]  the problem with not passing invalid utf8/text through is that in real world apps, you simply can't reject service [13:22:05]  walters: imagine, a wrong byte in a html page, how would you fill if your browser refused to show anything at all and gave an error message only? [13:22:10]  well, that's what gedit doesn, pisses me off [13:22:35]  walters: pango accepts invalid utf8, renders a crossed box for each bogus byte [13:22:56]  to be able to get something like that working, your entire stack should be able to pass through junk just fine [13:23:05]  it's not the transport's job to decide what to do with it [13:23:11]  so unstable might have that [13:23:14]  well there are two ways you can send data through dbus [13:23:24]  using the "utf-8 string" type [13:23:25]  yeah, it's a glib design bug [13:23:27]  that does validation etc [13:23:33]  or you can send an array of bytes [13:23:34]  I don't seem to be able to convince mclasen though [13:23:35]  that doesn't [13:23:57]  halfline: it *is* supposedly an string of text. [13:24:03]  if you don't know for sure your dealing with a valid utf-8 string, probably safer to send array of bytes [13:24:15]  just happens that sometimes there's junk in there (the 0.00001% of times) [13:24:38]  if you mean 99.99999% of texts should be passed in as array-of-bytes, I'd say that's broken design [13:25:04]  halfline: it's not really about being sure even.  sometimes you're sure it's NOT valid.  but you still want to get it on the screen. [13:25:17]  going down that route, the UTF-8 type should be deprecated and removed! [13:25:48]  maybe there could be a validating getter, but I don't think that should be the default [13:25:57]  well your argument is certainly one that can be made [13:25:58]  but [13:26:02]  if you look at gtk for instance [13:26:11]  it does g_return_if_fail g_utf8_validate [13:26:16]  all over the place [13:26:26]  halfline: who said gtk is right? ;) [13:26:32]  I've been working from pango up [13:26:44]  right now, I've got so far to the pango_layout_set_text() api [13:27:06]  sure, like i said you can make the argument that the way it's been historically done is wrong [13:27:06]  gtk's unicode handling was designed 10 years ago [13:27:12]  we've learned new things in the mean time [13:27:13]  but the history provides justification for the design [13:27:32]  sure.  I'm sure havoc made that decision very conciously [13:27:39]  I'm just explaining to walters why I think it's wrong :) [13:30:16]  one things for sure [13:30:22]  it's not consistent right now [13:30:53]  a lot of apps have the "call g_utf8_validate in a loop and replace with FFFE" function [13:31:07]  at least i think it's fffe [13:31:18]  whatever the replacement character is that looks like the question mark [13:32:48]  FFFD apparently [13:32:53]  � [13:35:40]  yeah