[12:45:05] sure [12:46:51] it seems like it'd definitely be worth at least an explanatory comment in the glib sources too [12:48:58] i don't see it, looking in gutf8.c in master [12:51:53] me? [12:52:03] behdad: if so sure [12:52:48] err [12:59:41] behdad: if it's easier an email is great too, i can forward to dbus list if you're not subscribed intermittent problems I was having [13:08:36] walters: I prefer 2 actually [13:09:01] I don't remember the reason, but the FDD0..FDEF is also non-characters [13:09:08] no valid text includes them [13:09:31] the crash should be fixed in other ways. there's no legitimate reason for sending those [13:10:28] you guys did see havoc's reply on that thread right? [13:10:35] walters: for 2), you just need to replace both instances of 0xFFFF with 0xFFFE on the last line. [13:10:43] nope [13:10:44] * behdad checks [13:10:51] was basically [13:10:59] - code was ruthlessly stolen from glib [13:11:07] - glib later had a "fix" for that macro [13:11:12] - dbus never got the same fix [13:11:15] with the implication that [13:11:20] - dbus should get the same fix [13:11:24] right, he supports my conclusion [13:11:37] yea [13:11:37] Thiago looks really crazy now [13:11:40] wtf [13:11:56] walters: the range is Non-Characters [13:12:00] lemme find you the reference seconds). [13:13:13] walters: search for noncharacter here: http://www.unicode.org/Public/UNIDATA/PropList.txt [13:13:21] that's essentially the canonical place [13:13:29] * behdad checks the book for the reason [13:13:33] I'm guessing legacy reasons [13:14:25] walters: yet another resolution is to remove the check completely. [13:14:31] why is dbus validating at all? [13:14:36] unless it uses the text, it shouldn't. [13:14:41] halfline: hmm but...that change doesn't affect 0xFDD0, does it? [13:15:00] the idea is if you get a utf-8 string from dbus you can put it in pango essentially [13:15:21] there's bunch other validations they need to do anyway [13:15:46] pango validates [13:15:47] it may be a bad design decision in retrospect perhaps [13:15:56] the rule should be simple: if you got it from outside, you validate [13:15:58] (and use) [13:16:13] does it check for non-minimal utf8 encoding? [13:16:15] do you know? [13:16:28] nope [13:16:32] ok [13:16:41] relevant function is dbus/dbus-string.c:_dbus_string_validate_utf8 [13:17:20] got the git address offhand? [13:17:23] git+ssh [13:17:43] * behdad sets up a git url scheme for fdo [13:17:50] ssh://git.freedesktop.org/git/dbus/dbus [13:18:00] thanks [13:18:03] i'll be back in a few mins, late lunch [13:18:08] k [13:18:25] well, operating under the assumption i can find my wallet [13:19:01] walters: if you can't, feel free to take one of the indian foods from my desk [13:21:06] walters: looks sane otherwise [13:21:31] the problem with not passing invalid utf8/text through is that in real world apps, you simply can't reject service [13:22:05] walters: imagine, a wrong byte in a html page, how would you fill if your browser refused to show anything at all and gave an error message only? [13:22:10] well, that's what gedit doesn, pisses me off [13:22:35] walters: pango accepts invalid utf8, renders a crossed box for each bogus byte [13:22:56] to be able to get something like that working, your entire stack should be able to pass through junk just fine [13:23:05] it's not the transport's job to decide what to do with it [13:23:11] so unstable might have that [13:23:14] well there are two ways you can send data through dbus [13:23:24] using the "utf-8 string" type [13:23:25] yeah, it's a glib design bug [13:23:27] that does validation etc [13:23:33] or you can send an array of bytes [13:23:34] I don't seem to be able to convince mclasen though [13:23:35] that doesn't [13:23:57] halfline: it *is* supposedly an string of text. [13:24:03] if you don't know for sure your dealing with a valid utf-8 string, probably safer to send array of bytes [13:24:15] just happens that sometimes there's junk in there (the 0.00001% of times) [13:24:38] if you mean 99.99999% of texts should be passed in as array-of-bytes, I'd say that's broken design [13:25:04] halfline: it's not really about being sure even. sometimes you're sure it's NOT valid. but you still want to get it on the screen. [13:25:17] going down that route, the UTF-8 type should be deprecated and removed! [13:25:48] maybe there could be a validating getter, but I don't think that should be the default [13:25:57] well your argument is certainly one that can be made [13:25:58] but [13:26:02] if you look at gtk for instance [13:26:11] it does g_return_if_fail g_utf8_validate [13:26:16] all over the place [13:26:26] halfline: who said gtk is right? ;) [13:26:32] I've been working from pango up [13:26:44] right now, I've got so far to the pango_layout_set_text() api [13:27:06] sure, like i said you can make the argument that the way it's been historically done is wrong [13:27:06] gtk's unicode handling was designed 10 years ago [13:27:12] we've learned new things in the mean time [13:27:13] but the history provides justification for the design [13:27:32] sure. I'm sure havoc made that decision very conciously [13:27:39] I'm just explaining to walters why I think it's wrong :) [13:30:16] one things for sure [13:30:22] it's not consistent right now [13:30:53] a lot of apps have the "call g_utf8_validate in a loop and replace with FFFE" function [13:31:07] at least i think it's fffe [13:31:18] whatever the replacement character is that looks like the question mark [13:32:48] FFFD apparently [13:32:53] � [13:35:40] yeah