[10:59:56] (CR) QChris: "I had a quick look, but I have no clue about vagrant, wikimetrics, or puppet. So I'll remove myself as reviewer" (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/105893 (owner: Stefan.petrea) [16:20:04] (CR) Ottomata: [ready for review] Added jumpstart (2 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/105893 (owner: Stefan.petrea) [18:08:14] * brainwane looks around, does not see milimetric [18:22:02] average: do you think I lost my chance to catch milimetric? [18:25:56] brainwane: try e-mail [18:26:23] got it [19:01:24] nuria: hola [19:02:16] DarTar: got a minute to help out me and nuria? [19:02:28] hey [19:02:33] sure, what's up [19:03:08] can you send an e-mail out to analytics, mobile-tech & eventlogging-alerts announcing the immanent deployment of the UA change? [19:03:15] yay! [19:03:21] it won't show up in the data yet, this just adds it to the varnishes [19:03:23] on that note, DarTar, check yer email re the UA thread [19:03:34] we should cabal it up to standardise on UA needs. [19:03:40] some data loss possible, we'll follow up with details post-deployment [19:03:41] just saw the code review comments, that's awesome [19:04:06] so, should we expect some discrepancy between the raw logs and the SQL tables? [19:04:17] it won't even show up in the raw logs at first [19:04:27] we're doing this at the producer side first and artificially scrubbing it out of the stream for now [19:04:27] ok [19:04:39] otherwise it's too many moving pieces [19:04:45] yup [19:05:28] Ironholds: is that the same thread with qchris et al? [19:05:50] if qchris is Christian, sure. [19:05:55] yes he is [19:05:59] otherwise, who knows. I'm not a proper analytics bod so I don't know everyone ;p [19:06:02] then yeah [19:06:08] the one Nuria kicked off with the awesome research [19:06:15] ori: to summarize: [19:06:54] - heads up that UA support is being added globally + link to bz ticket and gerrit changesey [19:07:17] - data not available yet [19:07:31] - potential data loss for a window of ... hours? [19:07:54] let's say 1 hr. [19:08:01] the plan is for 0 data loss, but you know. [19:08:03] computers. [19:08:04] ori: what happened to your -l btw? [19:08:23] it was itchy so i picked at it and it fell off [19:08:32] ny resolution? [19:08:35] ori, data loss? [19:08:39] from the varnishncsa reload? [19:08:55] ottomata: yeah, the format will change so the receiver could freak out and throw errors [19:08:57] k, sending out a line or 2 [19:09:00] it shouldn't, but you know. [19:09:02] computers. [19:10:34] ottomata: just waiting for nuria to sign on, she had to switch spots [19:11:01] ottomata: do you think we should combine this with removing gandolinium as relay? [19:11:03] k [19:11:08] hmmm [19:11:19] let me ssh into one of the esams varnishes and confirm that they can send data to vanadium [19:11:24] yeahhh [19:11:32] was about to do that, but you got it [19:13:02] yeah, works [19:13:16] i'll submit a patch, sec [19:13:18] k [19:16:10] qchris, think you can help me with some java hadoop camus version problems? [19:16:38] ottomata: let's find out. [19:16:51] Is there code I can pull? [19:17:07] you can use linkedin/camus master branch [19:17:09] its all merged nicely [19:17:12] ok. [19:17:20] but, i'm trying to compile with and use this on hadoop 2 with yarn, cdh4 [19:17:25] i think i've got hte poms all set properly [19:17:34] that change is not in master, so you'll have to change that with me [19:17:41] but, its weird [19:17:42] it worked in labs [19:17:46] but now its giving me erros in production [19:17:51] Ok. [19:17:56] better if we hangout i think [19:18:02] * qchris boots google machine. [19:18:17] am here https://plus.google.com/hangouts/_/76cpj4u4jnklnknpgtoc9ipetc [19:20:17] Sesame open! [19:20:24] Meh ... it does not let me in [19:21:01] ori, ottomata: mail sent, let me know if anything's missing [19:21:40] ottomata: I sent to the private list before I saw your message, should I cross-post to the public analytics list? [19:21:55] DarTar: i haven't gotten it [19:22:01] huh [19:22:03] nuria: hey [19:22:09] hola [19:22:18] ok, so quick recap [19:23:01] Ironholds: did you get my mail with subject "Sanitized UA coming to EventLogging" ? [19:23:07] via analytics-internal [19:23:12] there are bits varnishes in eqiad (ashburn, virginia), ulsfo (sf, ca; not active currently) and esams (amsterdam) [19:23:13] I didn't get any bounce [19:23:41] the esams data center used to not have a way to connect to the private intranet in eqiad [19:23:55] however, vanadium, the eventlogging data receiver (which is in eqiad), only has a private interface [19:24:18] as a result, eqiad bits varnishes were configured to route event.gif logs directly to vanadium [19:24:40] but esams varnishes were configured to send the data to another machine, gandolinium, which had a public interface and could act as a relay [19:25:10] however, we now have a private link between esams and eqiad, so gandolinium is not required [19:25:32] i figured we might as well do that change too (-- remove gandolinium, that is) and ottomata agreed [19:26:11] DarTar, I did [19:26:17] ok [19:26:28] i verified it by connecting to vanadium, running 'nc -ul 30123' (listen on udp port 30123), and then connecting to one of the esams varnishes (cp3020) and running 'cat /etc/motd | nc -u vanadium.eqiad.wmnet 30123' [19:27:03] it worked, so i submitted this patch: https://gerrit.wikimedia.org/r/#/c/105987/ [19:28:02] otto merged it, but we don't automatically deploy puppet changes (there has to be an explicit merge on the puppetmaster) so otto is waiting for us before he proceeds [19:28:15] i wouldn't have merged it if I hadn't already merged the other one :p [19:28:34] * ori nods [19:29:23] nuria: meanwhile DarTar agreed to send out an e-mail notifying folks of the upcoming deployment and specifying that some data loss is possible [19:29:35] very well [19:29:41] it should be in your inbox [19:30:03] sent to the 3 lists you requested and asked people to chime in in this channel if they have questions/concerns [19:30:32] ok, i must not be subscribed to those which are the lists? [19:31:04] i haven't gotten it either [19:32:14] DarTar: i'm sure it's fine so +2 from me [19:32:22] nuria: do you have an ssh session open to vanadium? [19:32:47] yes [19:32:57] weird, I sent to analytics-internal, eventlogging-alerts and mobile-tech [19:33:31] very slowww session but open though [19:33:41] nuria: okay, so remember: the fully-sanitized, multiplexed stream is streamed on port 8600 [19:34:02] i have a super simple command-line tool that subscribes to a zeromq publisher and prints to stdout [19:34:26] so 'zsub localhost:8600' (don't run that yet) gets you the full event stream [19:35:12] 'zsub localhost:8600 | grep cp30' will stream events coming in from the esams varnishes [19:35:12] DarTar, I got your email [19:35:22] ok thx [19:35:31] ya me too, great [19:35:48] that's because the recvFrom field specifies the varnish host that the event came from [19:35:56] ok [19:35:57] and the esams varnishes are cp30* [19:36:27] so i propose we start out by being irresponsibly optimistic and just run 'zsub localhost:8600 | grep cp30' [19:36:32] the zeromq publisher is udp ? [19:36:38] nuria: no, tcp [19:37:10] if we continue to see data, it means both changes worked [19:37:22] we're getting events, so adding the field did not cause everything to be invalidated [19:37:33] and we're getting esams events, so sending the data directly to vanadium works [19:37:57] if it doesn't work we can improvise and blame DarTar [19:38:07] by all means [19:38:41] i see the data events but [19:39:10] shouldn't we see also for php events the user agent in the capsule (assuming those other changes got deployed too) [19:40:34] no, because we pop it out of the event [19:40:45] in eventlogging-processor [19:41:33] ok so since we're being irresponsibly optimistic and testing for both changes working, we should probably make the change on an esams varnish first [19:41:46] ah ok, this is teh output of the processor we are looking at just before it gets inserted on teh db [19:42:11] before it gets published to all consumers, not just the db writer [19:42:40] so just a last note, if we *don't* see cp30* events using zsub localhost:8600 | grep cp30 [19:42:51] it could be because the events aren't validating, or because esams can't reach vanadium [19:43:04] we can determine that by running zsub localhost:8422 | grep cp30 [19:43:20] 8422 is the publisher that streams the raw data coming in from the varnishes [19:43:34] but there are plenty [19:43:40] ottomata hasn't deployed yet [19:43:43] "recvFrom": "cp3019.esams.wikim [19:44:06] they're still coming in via gandolinium because the change isn't deployed yet [19:44:18] well, when I merge on puppetmaster [19:44:21] it won't immediately deploy either [19:44:24] puppet needs to run [19:44:27] * ori nods [19:44:33] so i can merge and we can run puppet on cp3019 manually [19:44:47] and then race the puppet clock to get a fix in if it breaks things :p [19:45:04] sounds good to me. nuria, ready? [19:45:05] but really, this doesn't have the risk of breaking anything but the eventlogging varnishncsa instance [19:45:09] so no biggie [19:45:24] very well [19:45:30] ok gimme the caps, ori! [19:45:54] LET'S DO IT!!!111oneoneone [19:45:57] hehe [19:46:13] * zz_yuvipanda gives ori some spare shift keys [19:46:31] running puppet on cp3019 [19:46:32] zz_yuvipanda: hey! we need to catch up! [19:46:44] ori: yes! :) [19:46:48] nuria: so have zsub localhost:8600 | grep cp3019 running [19:47:15] ori: I've to sleep now, though. Hard fought sleep cycle fix :) I'll poke you tomorrow when you are not in the middle of an overoptimistic deploy that will obviously go well :) [19:47:16] night [19:47:22] ok, there it goes [19:47:55] 91.198.174.89:56804 10.64.21.123:8422 ESTABLISHED 29568/varnishncsa [19:47:58] ottomata: did it restart already? [19:48:00] yes [19:48:05] puppet reloads it on config change [19:48:28] nuria: seeing events from cp3019? [19:49:55] there are plenty [19:49:59] * ori nods [19:50:15] anything on 8600 passed validation, too, so that worked as well [19:50:33] we don't see the UA because we're not parsing for it, but try running zsub localhost:8422 | grep cp3019 [19:50:47] (8422 == raw data from varnishes) [19:51:10] see the UA? [19:53:10] nuria: ^ [19:53:39] I see the UA! [19:53:42] \o/ [19:53:46] right but ... [19:53:50] if it is raw [19:54:04] why does it get published on zeromq format? [19:54:16] ah wait [19:54:39] DarTar: ? [19:54:43] the userAgent data I see in the raw logs is still from the ad-hoc instrumentation, right? [19:54:55] the user-agent data in the logs is the raw user-agent coming in on the request [19:55:06] we will process it before it is broadcast to subscribers [19:55:12] like https://meta.wikimedia.org/wiki/Schema:PageContentSaveComplete [19:55:16] this actually relates to nuria's question, so let me back up for a second [19:55:23] yes, sorry [19:55:48] so zeromq publishers are a good pattern for solving a particular problem [19:56:08] you have data streaming in to your host on some port, and something that is consuming that stream and doing something with it [19:56:18] ah no it is not [19:56:36] sorry I was confusing matters [19:56:44] right is a pub/sub [19:56:44] you want to test something against the stream, or develop concurrently while keeping production up [19:56:47] w/o the json [19:56:50] * ori nods [19:57:11] yeah, so the first thing we do is just take the udp feed and rebroadcast it using zmq without applying *any* transformations [19:57:24] i assumed the queue had formated data but not is anything really [19:58:01] so varnish --- udp 8422 ----> vanadium(eventlogging-forwarder) ----> zmq 8422 ----> vanadium(eventlogging-processor) ----> zmq 8600 ----> rest of the world [19:58:22] yeah, 8422 is not the public stream [19:59:54] ottomata: i can do a puppet run on the eqiad varnishes, could you do the remaining esams ones? [20:00:16] ok so we are set then [20:00:18] enough time has elapsed that a manual run may not be required, but just to make sure [20:00:32] one more thing [20:00:49] there is a similar graph for php events right? [20:01:03] php events come in on port 8421 [20:01:11] so zsub localhost:8421 to see just the raw php-generated events [20:01:17] also as zmq? [20:01:31] ori, sure, which ones? [20:01:52] nuria: they come in as udp, and are republished as zmq [20:01:57] ottomata: cp30* [20:02:01] ok ua is not there yet [20:02:13] all varnishes? thought it was just bits, right? [20:02:14] nuria: the comments in http://git.wikimedia.org/blob/operations%2Fpuppet.git/e8e2c27f1c88121d0131a3f882264d8a509f4e64/manifests%2Frole%2Feventlogging.pp are useful, i think [20:02:21] ottomata: yes, that's what i meant; sorry [20:02:32] i'm a desktop-chauvinist [20:02:44] 19-22 [20:02:44] k [20:03:06] all right, that was my next question whether all teh ports were documented in puppet [20:03:10] nuria: it hasn't rolled out to prod yet [20:03:30] nuria: it's documented at https://wikitech.wikimedia.org/wiki/EventLogging#Data_flow as well [20:03:39] running puppet on 3020-3022 now [20:05:09] ottomata: cool, i'm doing cp1056-57 & 69-70 [20:05:25] ha, puppet had already beat me to it on those :p [20:07:02] and ori, how do we know when will our change be deployed (I know you deploy your own stuff but what about the automatic deployment?) [20:08:09] it depends on when the branch was cut and when our change got merged [20:08:31] to be honest i am always confused by it, but i chalk it up to me being disorganized [20:08:43] okeis [20:09:07] if you have the production branch cloned [20:09:15] you can do git submodule update --init extensions/EventLogging [20:09:21] and then cd to extensions/EventLogging and run git log -1 [20:09:25] to see what patch it's on [20:13:16] ok, will check that tomorrow. leaving now. BIG THANKS!! [20:13:39] my pleasure! thanks for absorbing all that info. and thanks for the help ottomata!! [20:13:55] yup! [20:13:59] glad it works! [20:14:02] thanks for absorbing like 10% of the info you mean [20:14:52] nuria: it'll be fine, don't stress about it :) [20:15:59] DarTar: so we would have had a few seconds of data loss at most, while the log streaming program on the varnishes was restarting [20:16:22] ok [20:16:35] as for where the UA is collected and processed, we figured: [20:17:05] the user-agent header is sent with every request whether we like it or not, so if we pack it into the event query string on the client-side, it still gets sent with the request [20:17:27] right [20:17:29] so there is something slightly silly and not credible in saying that we anonymize UAs on the client [20:17:57] and we'd also have to have a matching implementation for the server, because client-side code doesn't cover server-side events [20:18:15] absolutely, that was the point I made early in the internal thread [20:18:58] so basically what nuria proposed is that we have the raw UAs stream to the eventlogging-processor, but that we apply the canonicalization / anonymization as part of the parsing that precedes validation / broadcast to subscribers [20:20:03] yes, meaning that we're making it hard for us to join the data with the raw request logs [20:20:07] so just like we take the raw query string and transform it into a nice json object, we'd be taking the UA and transforming it into something anonymized prior to sending it out to all receivers [20:20:21] we should detail this on the MW page [20:20:24] right, all the relevant eventlogging data consumers would be downstream relative to that [20:23:16] i think it's the best approach because a) we can't convince the client not to send the raw UA, whether we like it or not; b) we can have the UA anonymizer keep a running count of UAs it encounters and use that to simply exclude unique / very rare UAs (transforming them into "Unknown / other" or whatever) [20:23:43] ori: do you mind if I copy and paste the above (or an edited version) and share it with luis to give them an overview of the approach, before we document it in detail? [20:23:47] you can't do that on the client because the client can only apply some algorithmic transformation to the data, it can't then check whether the result of that process is sufficiently generic [20:24:16] I can clarify teh anonymizing page but this info is there in ori's 1st commit [20:24:22] and the wiki [20:24:43] "Historically, the approach of EventLogging was to leave user-agent logging and processing to developers and analysts. [20:24:43] The problem with this approach is that the anonymization of user-agent fields is not uniform. When it is done at all, it is often done inconsistently. This often ends up being a barrier for releasing performance-related datasets that would greatly benefit from additional scrutiny. The solution is to centralize the processing of user-agents at the time of data collection. This way we can [20:24:43] be confident that all UAs logged via EventLogging are adequately anonymized." [20:25:12] right, I think what's missing is the disclosure about raw request logs, which we can't prevent [20:25:46] Ok, please edit teh page so we have 1 place with the info. [20:25:55] DarTar: coordinate with nuria plz [20:25:55] let me do this [20:26:00] sure [20:26:12] i should really pull back and restrict myself to deployment help rather than design / product [20:27:04] All right now loging off for real, do send me e-mail. I will answer our internal thread first and later take a look at comments on the BZ [20:27:23] on 1st site i think mos devs do not know taht ua is used for two different things [20:27:33] 1) identify device 2) identify browser [20:27:39] in mobile you need both [20:27:44] in desktop only teh 2nd one [20:28:20] (PS1) Ottomata: Updating to cdh4.3.1, upping snapshot version to wmf-2 [analytics/camus] (wmf) - https://gerrit.wikimedia.org/r/106006 [20:28:23] But both of you are right that we need more details [20:28:42] sorry " a more detailed" page [20:28:50] nuria: I'll hack the page and send it to both of you for review [20:29:17] if that's ok with you :) [20:29:35] After dario edits it to mention what he considers is important i shall add more info/guidelines from our internal e-mail thread [20:29:40] please edittt awaaayyyy [20:29:47] no need to send review [20:30:12] good [20:30:30] THANKS for your patience again. [20:36:25] (CR) Ottomata: [C: 2 V: 2] Updating to cdh4.3.1, upping snapshot version to wmf-2 [analytics/camus] (wmf) - https://gerrit.wikimedia.org/r/106006 (owner: Ottomata) [21:14:11] hi terrrydactyl [21:14:18] hi brainwane [21:16:59] wb milimetric [21:17:11] irc was having dns issues [21:17:16] at least for us East Coasters [21:17:33] just had to use google's DNS (8.8.8.8) [21:17:37] Heh [21:17:38] cool, so how can I help [21:17:50] milimetric: Read my email? [21:17:56] oh, reading [21:18:07] * terrrydactyl waves [21:18:16] Salut terrrydactyl! [21:18:19] hi marktraceur [21:18:20] How are things? [21:18:21] :D [21:18:28] working with milimetric with some wikimetrics stuff [21:18:29] :) [21:18:31] Oh cool [21:18:43] yeah, how are you doing? [21:18:49] oh hi terrrydactyl :) [21:18:54] (for context, I'm going to bother milimetric about the high cost to start with analysing EventLogging data) [21:18:54] you weren't kidding about being subscribed to a bunch of irc channels [21:19:14] right, so you are not the first one to make that observation marktraceur [21:19:16] terrrydactyl: I'm doing well, working on loads of new projects and generally being very frantic about it [21:19:34] marktraceur: keeping busy isn't always a bad thing. don't get too frantic! [21:19:39] TRue [21:19:55] milimetric: You're sure you're not thinking of me saying it (counting) 3 times now? :) [21:20:03] :) [21:20:04] no [21:20:06] 'kay [21:20:11] aaron, dario, etc. [21:20:15] oliver [21:20:18] Basically I'm wondering if this has changed, or is on the way to changing [21:20:20] everyone's having to re-invent the wheel [21:20:28] Ah, but who listens to what Ironholds says anyway. [21:20:37] :) [21:20:39] milimetric: Is there a way I can help instead of being part of the problem, at least? [21:20:40] everyone? [21:20:56] immediately, I don't think so [21:21:12] Really? That sorta sucks. [21:21:13] we're wrapping up some small work on Event Logging [21:21:16] terrrydactyl: what are you digging into, in analytics world, right now? [21:21:38] and another small MW oauth thing [21:21:46] but in the very short term, yes [21:22:00] basically the idea is that we have to make the following pipeline smoother [21:22:09] brainwane: just some bug fixes. little annoyances that need to be fixed. trying to do something with grant making and jaime but was waiting for milimetric to come back. seems like there's some uploading problems for them. [21:22:13] gather -> analyze -> present [21:22:23] and you're in the gather stage [21:22:40] Well, midway through analyze [21:22:43] so what you could do to help I guess is give me your perspective on exactly what you're doing so we can keep it in mind [21:23:17] milimetric: Well, I have two EL schemata that are being archived on prod. Those are nice, and they have cool data, but it's not in a useful format. [21:23:19] analyze is more like, you've got all the data in all the ways that you want and you're performing high level statistics on it [21:23:46] what would be a useful format? [21:24:02] milimetric: So I have to take each event, one by one, and treat it in whatever way to add it to the aggregate, and separate aggregates based on time frames (days right now), then create CSVs out of my aggregate data. [21:24:09] CSV, basically [21:24:34] Then on top of that I need to define the dashboard, which honestly seems like it will be the least aggravating part, but I could be terribly horribly wrong [21:24:58] *but*, I defined all manner of restrictions on the data that was getting submitted to EL in the first place [21:25:02] so let's focus for now on the "one by one, and treat it in whatever way" [21:25:23] so you mean you have like some case statement in your sql that branches out based on the event or something else? [21:25:31] Well [21:25:57] also, do you have some sql that you're working on? [21:26:08] you could throw it into an etherpad or somewhere we can look [21:26:16] I have an SQL query that gets me SOME_NUMBER of events at a time, then I run through each event and...hm, I could probably do that with more SQL than JS. [21:26:30] Maybe part of the problem here is that I'm not that great at SQL [21:26:38] could be [21:26:53] but we're thinking about that too [21:27:03] one of the ideas that is dear to us is a "community schema" [21:27:21] I could probably whip up a few SQL queries that do this instead of screwing around with node [21:27:22] this would be some editable schema for a virtual database of analytics-pregnant data [21:27:35] anyway, that's long term [21:27:44] so let's look at your sql, I can definitely help with that [21:28:03] I don't have any, I've sprinted down the wrong road twice now so I'm stuck with a lot of JavaScript [21:28:15] I'll open a new etherpad and we'll see what I can do [21:28:20] ok, great [21:28:28] http://etherpad.wikimedia.org/p/multimedia-analytics-queries [21:28:30] blank slate is probably easier honestly [21:28:32] we can chat in the etherpad [21:30:45] smoeone pinged me [21:30:47] what did I do? [21:31:15] Ironholds: Nothing, you just keep it up over there [21:31:37] lies [21:31:49] cakes [22:39:39] milimetric: One thing - I don't want the row that has information about today, because today isn't over. I know LIMIT, but can I get all but the last row? [22:39:52] (I want them sorted in ascending order, so that's not an option AFAIK [22:40:04] (LIMIT isn't)) [22:40:30] you can just do where timestamp < today at 00:00 [22:40:38] Hm [22:41:13] Will it be smart enough to transform the dumb 14-char mediawiki timestamps into whatever necessary format for comparison with today? [22:41:49] no, i think you'll have to covert timestap to a date or today to a string [22:41:59] probably today to a string is best [22:42:10] Hm [22:42:17] hm, no, but you're already getting substring [22:42:31] just compare datestring < (hang on :)) [22:42:37] But substring, not subnumber [22:42:46] http://www.w3schools.com/sql/func_date_format.asp [22:43:17] Aha. [22:43:17] yeah, timestamp strings are alphabetic though [22:43:57] so you can date_format(today(), '%Y%m%d%H') [22:44:02] Yeah [22:44:12] I see curdate() in the mysql docs but not today() [22:44:53] you're right, this page seems to have a lot of examples: http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html [22:44:59] yeah, it's something like that, i always forget [22:45:05] oh! [22:45:07] now() [22:45:31] But I'll want less than today at midnight [22:45:52] So date_format(curdate(), '%Y%m%d') + '000000' [22:46:42] yep [22:46:59] 'kay [22:50:18] Hrm, I guess the fileType and fileSize distributions will need to be run on the whole dataset every time [22:50:21] That's annoying [22:50:41] But they aren't sorted by the date like the rest [23:15:24] milimetric: Hm, it's saying there's a syntax error in the select statement near "union" [23:15:43] Oh, trailing comma, ignore me [23:16:47] qchris: saw your email about bingle - i'm investigating now [23:16:58] awjr: Thanks! [23:17:04] np [23:20:28] * awjr facepalm [23:20:47] Traceback (most recent call last): [23:20:48] File "/data/project/bingle/code/bingle-analytics/bingle.py", line 76, in [23:20:48] bugCard, bug.get('summary'), bug.get('id')) [23:20:48] File "/data/project/bingle/code/bingle-analytics/lib/mingle.py", line 37, in findCardByName [23:20:48] print name [23:20:49] UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 11: ordinal not in range(128) [23:21:11] qchris: something's choking on the name of that bug you referenced (https://bugzilla.wikimedia.org/show_bug.cgi?id=59645) because of the quote marks [23:21:33] Ha! [23:21:33] which i presume is causing bingle to exit abnormally before it gets the bugs added to mingle [23:21:45] i encountered a similar issue a long time ago and thought i fixed it [23:21:47] No nice unicode quotation marks for me then. [23:21:51] :) [23:22:03] Ok. I'll try to keep it 7bit safe then. [23:22:05] Thanks. [23:22:10] i thought i told bingle to ignore ascii errors [23:22:20] i may have missed soething tho, lemme take a look [23:22:42] Don't bother. 7bit is fine :-) [23:23:03] hahah [23:23:18] you guys appear to be using an outdated version of bingle oto [23:23:20] *too [23:23:47] actually, a very outdated version [23:23:53] Mhmm. [23:24:04] milimetric: For some reason the scripts aren't outputting anything in my test environment, which is odd [23:24:10] I tried connecting to the labs instance but was'nt allowed to log in. [23:24:29] eh it looks like diederik set it up using his own bingle branch [23:24:41] qchris: milimetric should have access now [23:24:45] Ok. [23:24:54] Can we just swtich to the maintained branch? [23:25:16] im not sure if thre's anything actually unique in diederik's branch - im pretty sure we merged most if not all of his changes back into master [23:25:41] but there have been a number of bug fixes - one of which might actually fix the problem you're facing now [23:26:07] Ok. I see. [23:26:15] I'll ping mili metric tomorrow about it. [23:26:24] right on [23:26:25] Thanks for your help awjr! [23:26:28] np :) [23:32:15] milimetric: Got it - string concat in mysql doesn't work with + [23:36:23] milimetric: Ah, but, I can't make the utility tables I wanted on the analytics slave, so I have to figure out the last-date-found a different way [23:46:11] Hey folks, re: UA sanitization in EL, why are we doing this again? Doesn't our privacy policy allow us to store the full user agent? [23:51:32] milimetric: Any thoughts on how I could keep track of the last date I found in prod?