[01:32:53] madhuvishy: still here? [01:34:18] YuviPanda: just walked out the front door [01:34:27] Sup? [01:34:41] madhuvishy: ah ok. [01:34:57] madhuvishy: was going to ask how to check if my eventlogging event came through [01:35:00] but is ok, you go [01:35:11] Oh. Beta or prod? [01:35:39] Okay I'll ping when I get home [01:37:33] madhuvishy: prod [01:37:35] ofc [01:40:53] YuviPanda, the bug I have been chasing: https://github.com/halfak/python-sphinx-break-stuff [01:41:29] * halfak turns the readme to markdown [01:43:58] {{done}} [02:00:41] YuviPanda: kafkacat -b kafka1012:9092 -t eventlogging_SchemaName to listen, on stat1002 [02:01:05] You can also do -o -10 for last 10 messages [02:01:23] o for offset [02:01:44] madhuvishy: ah [02:01:49] madhuvishy: does it show non-valid messages too? [02:01:54] madhuvishy: I'm pretty sure I'm failing validation [02:02:03] and I Can't find new logs on deployment-prep [02:02:05] No [02:02:12] there's /var/log/upstart/eventlogging_processor-client-side-events.log.1.gz [02:02:17] but not .log [02:02:18] Logstasg should have it [02:02:25] Beta might be broken [02:02:34] oh I see [02:03:45] Errors on eventlogging_Event Error for all schemas, split by schema should be there in log stash [02:04:22] YuviPanda: deployment-eventlogging03 is the right one, but it might be broken [02:26:42] madhuvishy: ya [02:30:26] madhuvishy: I can't see it [02:42:00] https://meta.wikimedia.org/beacon/event?%7B%22revision%22%3A+15237188%2C+%22schema%22%3A+%22ToolsJobManipulation%22%2C+%22wiki%22%3A+%22enwiki%22%2C+%22event%22%3A+%7B%22hostname%22%3A+%22tools-bastion-01.tools.eqiad.wmflabs%22%2C+%22command%22%3A+%22jsub%22%2C+%22username%22%3A+%22yuvipanda%22%2C+%22commandline%22%3A+%22jsub%22%7D%7D [02:42:03] is the URL [02:42:05] for hitting it [02:44:25] and nothing in logstash either [02:44:27] hmm [02:54:45] madhuvishy: haha, the reason the error logs weren't working was because there were no errors [02:54:47] it works [02:54:49] wheeeeeeeee [02:54:51] thanks [02:56:00] It dint show up when you tailed Kafka? [05:55:24] madhuvishy: no I didn't do that but was looking at db [05:55:47] madhuvishy: and when I fixed my code I was only looking at error logs [05:55:58] since I was sure that clearly, the first fix couldn't have made it work [05:56:01] apparently it did [14:28:57] Hi everybody :) [14:30:27] o/ CristianCantoro [14:30:48] Hi halfak! :) [14:36:52] CristianCantoro, did you see the notes re. citation data from the dev summit? [14:37:21] no, sorry... [14:38:07] * halfak searches [14:39:11] Hmmm... Can't figure out where the notes went. [14:39:26] harej, when you log on. ^ [14:39:40] DarTar and harej were organizers. [15:43:35] harej: Did you set up the LAMP stack for librarybase? [16:46:05] halfak:ping [16:53:32] Hey Guerillero [17:28:03] Can you please reply to my email? [17:28:13] kjschiroo: heya! :) [17:28:32] YuviPanda: Hey [17:28:33] kjschiroo: mw syntax is notoriously terrible, highly reccomend against using regexes to try parse it [17:28:38] ends in tears. every time. [17:29:04] I wouldn't want to do the whole thing. Just two parts of it. [17:29:15] How about headings? Where does it trip up? [17:29:30] for example [17:29:32] if you hit a line [17:29:44] = something something = bah [17:29:49] that isn't actually a heading I think [17:30:00] == Hello = Goodbye == [17:30:08] Let me test it quick [17:30:11] kjschiroo: what's the problem with using mwphfh? [17:30:15] Guerillero, just finished a short meeting. Will do right now [17:30:46] kjschiroo: a lot of work has been put into it to fix a lot of edge cases and stuff. and it's even available in debian packages [17:32:31] Guerillero, looking at the maps and thinking... [17:32:43] ok :) [17:32:52] It would be cool if we could get this up on the wiki and make an announcement on the research mailing lists. [17:33:06] YuviPanda: my objection to it is that it seems rather heavy for we need it for. It only has two uses right now. [17:33:07] One option for that is taking advantage of Extension:Graph [17:33:21] Well, I really need to find an Open Source Database for geocoding [17:33:35] the one I am currently using has an angy TOU [17:34:07] Gotcha. Maybe that's something we can provide via the WMF. [17:34:17] I know that Ironholds has been working with geocoding. [17:34:36] YuviPanda, with headings it seems like they have a pretty strict rule that it is, start, n equal signs, stuff, n equal signs, end [17:34:38] Ironholds, do we have a sharable access to a geocoding DB with a non-angry TOU? [17:34:45] that seems like a regex would do fine. [17:35:08] kjschiroo: hmm, it starts that way and you end up in tears at the end. [17:35:12] it's happened every single time... [17:35:35] ^ can confirm. [17:35:41] He was but I kinda went against his suggestion to geocode only to the country level because I wanted to test kernel densities [17:36:03] it always starts this way too [17:36:08] 'I only want two things' [17:36:13] Is mwparserfromhell failing? [17:36:22] At the moment [17:36:28] Why not patch that instead? [17:36:36] earwig is working on it [17:36:37] Too complex? [17:36:58] Oh! Cool. So, we'd use mwparserfromhell for everything except headings for a short time? [17:37:20] huh? [17:37:32] Guerillero, what was the argument against more granular geo-coding? [17:37:58] kjschiroo, are you proposing to drop mwparserfromhell entirely, or just for heading detection? [17:38:42] halfak, mainly for heading detection, but after that we only have a single use of it. [17:39:38] kjschiroo, seems like it's a core use of this library. I'm lost on why we'd want to move away from that and implement an island grammar. [17:39:48] We're very unlikely to get better performance (I know. I've tried) [17:42:29] * halfak tries to familiarize himself with the bug. [17:42:33] Can you link me to it? [17:43:12] Is it this? https://github.com/earwig/mwparserfromhell/issues/55 [17:43:18] kjschiroo, ^ [17:43:58] halfak: https://github.com/kjschiroo/WikiChatter/issues/16 [17:44:35] https://github.com/earwig/mwparserfromhell/issues/137 [17:44:44] halfak ^ [17:46:29] What parses wrong in that rev_id? [17:46:42] I assume it's from enwiki, is that right? [17:47:05] halfak, == How to link any file like video or picture, on wikipedia's article? == [17:47:13] yeah, enwiki [17:47:44] is enwiki important for mwparserfromhell? [17:48:15] yes [17:48:25] IMO [17:48:33] So, I see "{{UTelleroftheunknowns|AsifRasheed]]" in that section [17:48:49] Is that what is causing the problem that earwig is talking about? [17:48:54] yeah [17:49:02] "improper parsing of nested invalid templates" [17:49:12] OK. So we're going to solve that problem with a regex? [17:49:51] halfak, I think we more ignore that problem with a regex [17:51:54] I'm guessing the only definition of mw syntax is in the code itself? [17:52:25] kjschiroo, in mwparserfromhell's parser? Probably [17:52:37] * halfak looks at how mwparserfromhell handles that [17:52:41] It actually looks OK. [17:52:51] I'll have to dig deeper to know where the problem is. [17:53:05] halfak, I mean by wikimedia, they were the ones to define it right? [17:53:22] Oh! Um. Ha. The definition is in MediaWiki [17:53:30] Whatever MediaWiki does is Right(tm) [17:53:34] And it's a big ball of regexes [17:53:42] Hence YuviPanda's concerns. [17:53:54] Regretfully, it's not a context-free grammar [17:54:17] There's really no spec. [17:56:15] So, I'm not getting `len(sec.filter_headings()) > 1` for that section [17:56:22] Maybe I need to parse the whole revision [17:57:00] YuviPanda, I can see how this could get ugly quickly with regexes. [17:57:48] can you apply a regex to wikicode? [17:58:05] kjschiroo, I've done that a lot for *extremely narrow* use-cases. [17:58:15] E.g. find me all the links that look like links in this markup. [17:58:42] In that case, it was primarily for an analysis. Not a tool that would support someone else's stuff. [17:58:55] Also, I didn't care if the links appeared in a comment or not. [17:59:03] This is where I get shy on the use of wikicode. I understand how strings work, wikicode is still opaque to me. Right now that is how signatures get detected. [18:00:45] I'll look more into it to see about just using wikicode [18:01:01] Ha! I see the issue now that I parsed the whole page. [18:01:34] yep [18:02:35] kjschiroo, do you use a regex to detect signatures? [18:02:48] yeah [18:02:56] I think that's not too crazy. [18:03:05] Since signature is not a part of wikitext [18:03:13] Rather, it's configured by the wiki [18:03:48] It would be great if that were totally true. I wish they didn't offer custom signatures. [18:03:58] * halfak runs to meeting [18:15:11] hi ellery. just checking: you will be around for the meeting with the Reading team in 15 min, right? asking since the meeting time got changed. [23:02:52] halfak [23:02:58] o/ [23:04:16] Hi halfak, how much time left for meeting? [23:04:39] Sorry. Not sure I understand. [23:05:04] I thought it to be thursday... [23:05:55] this is 4.30 pm utc? [23:06:47] tarrow: yes, I set up the LAMP stack [23:08:17] halfak: notes are https://etherpad.wikimedia.org/p/source_metadata [23:08:49] Thanks harej [23:10:46] Diligent13, I just sent you some private messages. Do you see them? [23:36:26] g