[00:16:02] (PS2) Madhuvishy: [WIP] Report RESTBase traffic metrics to Graphite [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) [00:51:57] (PS3) Madhuvishy: [WIP] Report RESTBase traffic metrics to Graphite [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) [08:52:23] Analytics-Tech-community-metrics, ECT-September-2015: Labeling some bots (active in Git/Gerrit) as bots - https://phabricator.wikimedia.org/T110545#1591978 (jgbarah) The list of bots is currently maintained in the Sorting Hat (identities) database, in table profiles. If the field "is_bot" is 1, the identi... [08:53:17] Analytics-Kanban, operations, Monitoring, Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1591979 (fgiunchedi) >>! In T83580#1571705, @Ottomata wrote: > Natively share the dict? Hm. Just quickly tried this, and I get an immediate segfault: err, what I meant is to kee... [09:00:45] Analytics-Tech-community-metrics, ECT-September-2015: Labeling some bots (active in Git/Gerrit) as bots - https://phabricator.wikimedia.org/T110545#1591981 (jgbarah) Added Contributors | Bots subsection at the [[ https://www.mediawiki.org/wiki/Community_metrics | Community Metrics wiki ]] with this info. [09:20:56] Analytics-Tech-community-metrics, ECT-September-2015: Labeling some bots (active in Git/Gerrit) as bots - https://phabricator.wikimedia.org/T110545#1592030 (jgbarah) From the list in [[ https://gerrit.wikimedia.org/r/#/admin/groups/4,members | Group Non-Interactive Users ]], I tagged gerritpatchuploader@g... [10:47:52] Analytics-Tech-community-metrics, ECT-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1592276 (Qgil) [11:59:17] Analytics-Tech-community-metrics, ECT-September-2015: Labeling some bots (active in Git/Gerrit) as bots - https://phabricator.wikimedia.org/T110545#1592408 (Aklapper) >>! In T110545#1592030, @jgbarah wrote: > I couldn't find @jdlrobson+frankie@gmail.com in our database. Maybe it didn't act? Probably, mig... [12:39:20] o/ joal & milimetric [12:39:32] I think we should skip the meeting this morning. [12:39:43] I just got back from a trip, so I don't have any updates. [12:55:16] Hey halfak :) [12:55:20] No worries [12:55:44] I think milimetric is overbusy, and I also have stuff to do :) [12:56:07] halfak: only one question: have you tried a run of sorted json ? [12:57:38] :) I was just pinging in research saying I could use the time [12:58:28] milimetric: how are you doing? If you need some brain cycle or whatever, please aks :) [12:59:49] no worries joal, the data point stuff is winding down. I just need to catch myself up with other stuff now [13:00:03] and thank you : [13:00:36] You're very welcome [13:23:52] (CR) Joal: "Hey Madhu," [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) (owner: Madhuvishy) [13:26:50] joal, still no attempt at the sorted JSON. [13:26:56] For enwiki [13:27:33] halfak: ok, thanks for the update :) [13:27:52] halfak: Can I skeep this evening meeting or do you need me ? [13:28:10] skeep [13:28:12] :) [13:28:16] Great :) [13:28:24] Thanks! [13:28:42] * joal likes skeeping [13:48:07] o/ ottomata [13:48:15] sorry to miss the last meeting about the events system [13:49:23] One Q. I've been troubled by csteipp's concerns re. events that must be deleted after they are emitted. I wonder if you guys have been thinking about how to address that? [13:49:47] E.g. I think we'll need to be able to re-write (or at least censor) individual records in a kafka stream. [13:51:48] Analytics-Kanban, operations, Monitoring, Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1592846 (Ottomata) When you say 'queue' and 'dict' do you mean the same thing? I had thought you were suggesting just using a regular old dict shared between the child (varnishlog... [13:53:41] halfak: eh? [13:53:51] events in kafka stream will be deleted within a week max [13:54:28] Oh. So no event stream for my use cases then :/ [13:56:29] what's up? [13:56:48] halfak: i don't have any context. what needs deleted? [13:57:11] e.g. a page is moved, but is accidentally moved to a title that matches my social security number. [13:57:20] So the event itself would need to be deleted. [14:01:51] halfak: there's no way to delete individual events from a topic, but the topic messages are temporary. you can set shorter or longer lifetime for messages ina topic if you like [14:02:00] so, if you have a sensitive topic, you can set it to only keep data for a day [14:02:04] or maybe even an hour, not sure [14:02:23] but, then you would not easily be able to recover if your services stops consuming for a while [14:02:37] Maybe we could have a filter node in place that would know which events to not re-emit? [14:03:11] halfak: i guess, your consumer would have to do that, i think. if there is custom logic needed for particular message types [14:03:16] that would have to be done on consumer side [14:03:40] Indeed. But that consumer would be yet another producer. [14:03:53] Of filtered events. [14:05:23] I guess we'd want it to be more of a proxy than an actual kafka producer. [14:05:36] Since it wouldn't make sense to store the stream in such a node. [14:07:52] ? [14:15:53] hello everyboooodyyyyy [14:15:57] HIIII [14:16:10] nuria !!!!!!! HELLOOOooooOOO :D [14:16:10] ottomata, not sure where I lost you [14:16:16] Hi nuria ! [14:16:19] Welcome back! [14:17:25] halfak: not sure what producer you are talking about [14:17:29] oh [14:17:37] hm [14:17:39] like [14:17:50] have a volatile topic with unfiltered possibly sensitive data [14:18:05] then have a consumer that filters and produces back to a more stable topic [14:18:08] and use that one for apps? [14:18:22] yeah. Use the latter for as much as possible. [14:18:36] I'm not sure what you mean by "apps" [14:18:54] The consumer would need to filter on demand. [14:19:13] Filtering is a state-y activity and could happen right before the data is requested. [14:19:31] * halfak has been grumbling about how to address this for a long time. [14:26:32] apps i mean whatever the consumer is [14:26:36] your thing you are building [14:26:37] or wahtever [14:27:08] ottomata, this shouldn't just be *me*. It should be *everyone* and every database that stores sensitive events. [14:27:27] So HDFS, consumers of events, etc. [14:27:35] halfak: how do you identify a sensitive event in say possibly thousands of schemas [14:27:48] ottomata, that's a good question. Still open. [14:28:00] Right now, sensitive events are handled internally in MediaWiki. [14:28:15] halfak: i think there is no way to solve that one. this system will be used for more than just mediawiki type events [14:28:16] a row in the logging/revision/recentchanges tables will actually be dropped. [14:28:28] kafka is a temporary buffer [14:28:29] ottomata, sure, but it will also be used for MediaWiki events. [14:28:41] Oh. So the system is just a temporary buffer? [14:28:43] yes [14:28:59] temporary i the timespan of days [14:29:11] in* [14:29:19] So, it won't write events to anywhere? [14:29:25] no, that's for consumers to do [14:29:26] That's *something else* [14:29:28] whatever you want to do [14:29:28] yeah [14:29:46] this system is a scalable and standardized way to produce and consume events [14:29:53] Seems like this is becomming a very narrow slice. [14:29:58] Which is probably best. [14:30:13] Won't replay or store -- just buffer. [14:30:21] With brief replay [14:32:09] * halfak wants to think about a while system consisting of such coherent parts. [14:32:55] halfak: ya that's pretty much what kafka is [14:33:07] this project is about building standardized tooling around producing and consuming events from it [14:35:29] Yeah. Seems like privacy enforcement would be part of "standardized tooling". No? [14:38:59] no, i don't think so, i think that is out of scope, this is system isn't about making the applications of the system [14:39:06] this is an event bus, what is done with the data is up to the apps [14:39:35] I don't think that privacy enforcement is "done with the data by the apps" [14:39:42] the tooling includes: [14:39:43] - schema store (and hopefully evolution) [14:39:43] - schema validation [14:39:43] - event production [14:39:43] - event consumptuion [14:39:44] It would need to be some middle system [14:40:08] its similar to eventlogging except more scalable and for more than just analytics type events [14:40:20] this system can't do any logic processing [14:40:22] Indeed. This is a problem with event logging. [14:41:06] if there is to be a 'standardize mediwiki events, and make it possible to filter out sensitive data' project, then that will likely use this system [14:41:10] but it isn't part of building this system [14:41:20] Analytics-Kanban, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1593071 (Milimetric) [14:41:52] halfak: we are basically building a foundation wide pub/sub schema based message queue. that's it really [14:41:53] ottomata, many event logging schemas suffer from this issue. Basically we're talking about all analytics logging -- not just MediaWiki events. [14:42:14] ottomata, OK. Then I guess we are talking about the scope and the definition of an app. [14:42:17] yes [14:42:28] I'm also pointing out a need that's not being address now or in planned systems. [14:43:04] halfak: add to use cases here? [14:43:04] https://etherpad.wikimedia.org/p/scalable_events_system [14:43:21] ottomata, you just told me it is out of scope. Also, this isn't a use-case. [14:44:18] I don't want to *use* the system to filter out deleted events. I want "the system" itself to be able to handle deleted events. [14:44:55] halfak: it is a requirement for a use case of this system [14:45:29] think of this system as a pub sub buffer of events [14:45:39] the events will only be availble for consumption for max a week [14:45:44] It's a characteric of any use-case that consumes events that might be deleted for privacy reasons. [14:46:09] sure, and if your use of this system is storing data somewhere [14:46:14] then that will need to be handled [14:46:32] ottomata, OK. This is all very reasonable. I'm just saying that this isn't an "app" or a use-case. It is a piece of middleware that we'll need in order to do most of the things we're already doing with event logging. [14:47:09] it is a use case of this system. the use case is: automatic event filtering during consumption [14:47:10] Analytics-Kanban, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1593093 (Milimetric) [14:47:22] halfak: high volume consumers of events can and probably will consume directly from kafka [14:47:41] we might even be keeping consumption out of scope for MVP of this thing [14:47:54] since there is already a lot to think about on the production side: schemas, formats, security, services, etc. [14:48:30] in the future, we may want to build a nicer way to consume where you don't need a kafka client, but that may just add more complexity [14:48:45] e.g. whatever realtime streaming framework we go with in the future [14:48:47] ottomata, OK. I get it. You're building something right above hardware and I want to talk about how people are going to be using it. [14:48:53] there is likely to already be lots of kafka support in it [14:49:19] I'm out of scope until you are ready to talk about that. [14:49:24] so, maybe mw event processing will be ina realtime streaming system. then we can build some generic support into processing mw events and auto filtering deleted stuff [14:49:26] :) [14:50:07] Analytics-Kanban, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1593100 (Milimetric) [14:50:08] I would just encourage you to not see my requirements and concerns as an edge or individual use-case, but one that applies to almost all. [14:50:26] And allow the problem to trouble you for an evening or two as it has troubled me for many. [14:51:00] haha, ok. i will keep it in mind, but halfak, i think its unlikely that i will be very involved in building the part of the system you are talking about. i'm buildling infrastructure to support such things [14:52:09] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1593104 (Milimetric) @kaldari, that's what we're hoping to do with the API, for sure. The endpoint you would need to feed a chart like that is planned... [14:52:44] ottomata, my question to you about droping events in a kafka replay-log-thing is relevant though, no? [14:52:58] Your infra for buffering is also the infra we need to work around for privacy. [14:53:17] So your design decisions will affect the future of my work for better or worse. [14:58:16] halfak: not sure. there's no way to delete events in kafka, except by waiting for them to be old [14:58:38] i guess we'll keep in mind supporting different topic expiration policies, because kafka supports that [14:59:21] yeah. So that design decision might make it hard for us to work with events that contain sensitive information. E.g. no apps that use events could consume directly from kafka as they might expose these events. [14:59:35] Or we could plan to engineer around the limitation. [14:59:39] I like plans :) [14:59:53] Or just to make sure that people know we're signing up for some hard work later. [15:00:30] Anyway, IMO good privacy enforcement exists in the infra. [15:02:49] aye [15:03:28] ottomata: i thought about that a bit. I was thinking we could filter out the private events and restrict access to the unfiltered topics. [15:10:59] milimetric: until kafka supports some more built in acls or something, it will be hard to restrict stuff like that [15:12:15] yeah, but the work on that looked in full swing. I guess by spring next year we can count on 0.9 coming with authentication [15:15:34] aye [15:19:30] a-team, electricty issue at my place --> internet will cut soon [15:19:41] I will possibly miss standup [15:20:25] Updates: CR for Madhu, meeting with ops on pageview (Dan knows), spark job for pageview API data [15:24:26] joal, I owe you a CR, will do that when I finish the task I have in hands, unless it's urgent, and I'll do it after standup [15:24:41] mforns: no urgency :) [15:24:50] joal, ok [15:32:07] nuria: Standup :) [15:33:15] madhuvishy: can you send me batcave link? [15:33:22] nuria: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave [15:45:40] Analytics-Tech-community-metrics, ECT-September-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1593373 (Aklapper) >>! In T107562#1579585, @Qgil wrote: > About adding MediaWiki core data (see above), I think it is a good idea. https://www.mediawiki.org/w/... [15:49:45] Analytics-Kanban, RESTBase-API: create RESTBase endpoints [21 pts] {slug} - https://phabricator.wikimedia.org/T107053#1593389 (madhuvishy) The most recent code for this is here: https://github.com/madhuvishy/restbase/tree/test_projectview [15:50:17] ottomata, ops question? [15:52:12] sure, doing a few things at once, but ask! [15:53:00] ottomata, so, at least in THEORY we have a hard block on requests with nul IP addresses [15:53:15] err [15:53:16] user agents [15:53:18] thank you brain [15:53:24] in practice this is total bullshit and I see thousands upon thousands of ludicrous automated requests with a user agent of "-" [15:53:46] are you aware of where the code for rejecting requests based on the UA lives? Did we remove it without telling anyone? What? [15:56:24] rejecting requests? [15:56:42] i'm not aware of any request rejection based on UA [15:56:51] what do you mean rejecting? [16:02:46] ottomata, telling them to sod off and read the API guidelines [16:02:59] See https://meta.wikimedia.org/wiki/User-Agent_policy [16:03:06] or does that simply mean 'you must provide the header'? [16:03:30] Analytics-Tech-community-metrics, ECT-September-2015, Patch-For-Review: Fine tune "Code Review overview" metrics page in Korma - https://phabricator.wikimedia.org/T97118#1593442 (Aklapper) [16:08:02] Ironholds: i don't nkow anything about this. this is done at mw level? if so, that would mean that varnish would still log these requests [16:08:21] as they would be served, but maybe mw would return them a http error? [16:09:37] Analytics-Backlog: Provide the Wikimedia DE folks with Hive access/training {flea} [8 pts] - https://phabricator.wikimedia.org/T106042#1593482 (Nuria) a:Ottomata>Nuria [16:09:43] ottomata, aha. huh. Okay; i'll throw out a wikitech thread! Thanks :) [16:10:39] k! [16:25:20] Analytics-Tech-community-metrics, ECT-September-2015: Automated generation of (Git) repositories for Korma - https://phabricator.wikimedia.org/T110678#1593524 (Aklapper) a:Dicortazar [16:28:09] Analytics-Tech-community-metrics, ECT-September-2015: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1593534 (Aklapper) a:Aklapper [16:29:40] Analytics-Tech-community-metrics, ECT-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1593535 (Aklapper) a:Dicortazar [16:30:27] Analytics-Tech-community-metrics, ECT-October-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1593538 (Aklapper) [16:32:07] milimetric: ready when you are [17:05:23] Ironholds: ottomata: 2-3 years ago there was definitely request rejection if User-Agent was unspecified or empty. I recall api.php requests generated from simple php code not working unless ini_set(user_agent, ..) was called first [17:05:42] but I can't reproduce it now [17:05:53] $ curl -A '-' --include -v 'https://en.wikipedia.org/w/api.php?action=query&format=json' [17:05:54] works fine [17:06:01] $ curl -A '' --include -v 'https://en.wikipedia.org/w/api.php?action=query&format=json' [17:06:02] too [17:06:09] which omits the header entirely (not sent as empty string) [17:06:14] so it's just not working. ack. Mind replying in-thread to verify? [17:06:19] * Ironholds sighs [17:06:48] Ironholds: which list? [17:07:30] Krinkle, the wikitech thread :) [17:10:29] Ironholds: OK :) there's only so many lists one can try before asking – http://i.imgur.com/9H4KILU.png [17:10:37] ahahah [17:18:11] sent [17:36:17] ta! [18:26:15] milimetric, madhuvishy : i cannot ssh into labs now, was there a config update that i need? [18:26:57] nuria: yes! right, they made them more secure, one sec [18:28:04] nuria: this is what you need: [18:28:07] https://www.irccloud.com/pastebin/tfoAsBci/ [18:28:29] I'm pretty sure everything else stayed the same [18:28:32] milimetric: on ssh config? [18:28:36] yes, [18:30:00] milimetric: I never added this - tried it before and got Bad SSH2 cipher spec 'chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr' [18:30:22] weird... [18:30:29] if i don't add it my connections don't work [18:30:45] madhuvishy: right, got that too [18:31:06] oh weird... now it works without that too [18:31:13] I can connect without that. nuria what error did you get first? [18:31:21] maybe it's different on mac [18:32:00] https://www.irccloud.com/pastebin/rrRCe5RZ/ [18:32:10] mmm.. maybe i just need to delete know_hosts [18:32:15] yeah [18:32:45] nah, got denied again [18:32:55] same error? [18:33:18] no [18:33:22] https://www.irccloud.com/pastebin/PyAzfgsh/ [18:33:43] ssh-add ~/.ssh/id_rsa [18:34:16] nuria: it's usually this ^ [18:35:03] still nothing, ay ay ... [18:35:16] nuria: no /home/nuria on wikimetrics1 on labs [18:35:22] So no ssh key I guess [18:35:30] aah [18:36:10] joal: does that mean that i cannot ssh at all? [18:36:26] I think it does mean you can't ssh that machine [18:36:37] But I don't know how homes get created on labs :S [18:36:51] ok, let's see if ottomata knows [18:37:45] nuria: Same on another machine (cassandra-dev.analytics.eqiad.wmflabs) --> no /home/nuria [18:37:54] joal: there is some tool through which you give people access [18:38:02] on wiki [18:38:22] hm -> On cassandra-dev (I created that one), I didn't anything [18:38:48] joal: i think this creates home dirs: https://wikitech.wikimedia.org/wiki/Nova_Resource:Analytics [18:38:57] for people that are analytics admins [18:39:04] yeah, but you are in it [18:39:17] joal: rather, it "tells" you what homedirs should exist [18:39:29] joal: how those come to be i do not know, ok asking on labs [18:39:42] madhuvishy: right, i am on that page [18:40:51] milimetric: did the name of the machine changed to "Wikimetrics1.analytics.eqiad.wmflabs" [18:41:34] I connect to wikimetrics1.eqiad.wmflabs [18:42:02] but I think they did change something with the hostnames, don't remember what [18:42:12] milimetric: ok, something changed here, will ask on labs [18:42:14] it didn't affect my connection [18:42:36] yeah, sorry I forget [18:42:47] i think this is normal, the project name is displayed with the hostname, but you can skip it when you log in [18:43:42] i think home gets created by if you are in the labs project [18:44:27] hm you are in this project [18:44:35] nuria might want to ask in #labs [18:44:40] maybe some of those homes got deleted with the NFS problems? [18:45:03] they might have restored ours but forgot nuria's... maybe [18:45:03] ottomata: did bastion changed? [18:45:13] ottomata: k, will ask on labs [18:49:34] nuria: from your paste it looks like you are getting into bastion [18:49:39] try just sshing into that to be sure [18:50:07] ottomata: right, i was wondering if there was a new bastion now [18:50:22] on bastion i have nothing though [18:50:35] on my homedir [18:53:49] joal: I can't run a query through HiveContext - keep getting some error! [18:54:06] :( [18:54:15] yarn mode ? [18:54:24] how many executors / memory ? [18:54:30] hmmm i was just trying the shell [18:54:41] like spark-shell with no options [18:54:50] shell can run in both local and yarn mode [18:54:59] then you were running local mode [18:55:22] joal: yeah I guess [18:55:30] try: spark-shell --master yarn --num-executors 2 --executor-memory 2g [18:55:39] And let's see [18:55:53] nuria: they've mostly disabled nfs homedirs, so ja [18:56:08] joal: okay, I like that the parquetData load works great though. The job finishes in a minute [18:56:13] the fact taht you don't have a home dir on the labs instances seems to be the problem [18:56:14] not the bastion [19:00:06] ottomata: all good now [19:00:14] ottomata: i was missing ahem nuria@blah [19:00:27] ottomata: but i will ask for my homedir [19:02:29] ah ,heh [19:25:28] Analytics-Backlog: Create white list for pageview data {hawk} [8 pts] - https://phabricator.wikimedia.org/T110061#1594289 (Nuria) [20:00:11] (PS4) Madhuvishy: [WIP] Report RESTBase traffic metrics to Graphite [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) [20:01:12] joal: in case you're still around, I think the parquet load is simpler, and seems to work well [20:02:45] ottomata: still arround? [20:03:55] yes [20:03:57] hiya [20:04:30] ottomata: do you want to give me a brief tour of what you guys are doing with EL? the changes so far and such [20:08:15] nuria: there is a diagram here - https://phabricator.wikimedia.org/T102225 for an overview if you haven't seen it already [20:08:54] madhuvishy: nice, thank you [20:09:29] madhuvishy: ok :) [20:10:02] madhuvishy: and how far are we on that one? [20:10:17] madhuvishy: do we have a multiinstance processor? [20:10:24] nuria: most of the code has been written [20:11:00] nuria: yeah, we can run multiple processors in parallel [20:11:49] madhuvishy : and is data flowing into hadoop per schema? [20:11:59] nuria: Yup :) [20:12:08] there are some glitches though [20:12:52] ottomata and I were gonna start deploying the new system, but ran into a bug with the library we are using to spin up these parallel processors [20:12:57] madhuvishy: what db does it flow into? [20:13:16] nuria: with you ina few... [20:13:21] lemme finish up my thought process [20:13:29] ottomata: by all means... [20:13:29] (i'm working on the bug madhu mentions) [20:13:55] nuria: hmmm, dont know if we have external tables on top of them, let me see if I can find them on hadoop [20:14:20] madhuvishy: i see, they are plain files on hdfs? [20:15:46] madhuvishy: is the code on the EventLogging depot or elsewhere? [20:16:01] nuria: /mnt/hdfs/wmf/data/raw/eventlogging [20:16:23] nuria: it's all on eventlogging, you can see new handlers for reading and writing from kafka [20:17:04] madhuvishy: and are you guys deploying this to the EL machine we have or using the spare one to test? [20:17:10] the puppet configs have new kafka based urls for reading and writing [20:17:19] nuria: http://grafana.wikimedia.org/#/dashboard/db/eventlogging [20:17:27] WIP :) [20:17:35] its running on a spare node [20:17:38] madhuvishy: right, i mean the physical machine where new changes are going into [20:17:38] eventlog1001 has not been changed yet [20:17:45] nuria: aah [20:17:45] ottomata: sorry, i see [20:17:47] yes [20:19:56] nuria: so the old zeromq system is running now, we are gonna attempt to switch out the pieces and replace them with the new code - hopefully we'll have this bug resolved for that [20:21:48] milimetric: Finally ! [20:21:53] milimetric: https://github.com/jobar/analytics-refinery-source/tree/api_data_creation [20:22:10] Managed to get the daily run working (fast, but with a lot of ram) [20:27:04] madhuvishy: got it, the files on hadoop are snappy compressed? [20:27:26] So, running 64 workers with 4g ram each --> 1 hour took 1.5 min, 1 day took 12 mins (project, article, access, agent dimensions, aggregation on all-acces, all-agents, all-access-agents) [20:28:05] milimetric: I'm gonna head off, so if you have time tomorrow, let's discuss ! [20:28:14] a-team have a good night ! [20:28:23] good night joal! [20:28:37] joal: nite, we talk tomorrow [20:28:53] ok milimetric :) [20:28:55] Thx ! [20:30:34] nuria: I think so, not super sure [20:31:05] milimetric: and we know that EL is honoring "do not track" right? https://gerrit.wikimedia.org/r/#/c/182995/3/modules/ext.eventLogging.core.js [20:31:13] yes [20:31:33] madhuvishy: i think i got it [20:31:33] https://github.com/Parsely/pykafka/pull/245 [20:31:36] ottomata: you submitted a patch [20:31:41] yeah I just saw :) [20:31:42] nuria hi! [20:31:44] haha [20:31:53] nuria: want to batcave? or just IRC? [20:32:03] ottomata: did you add tests/run existing ones? [20:32:04] ottomata: batcave for 5 mins? [20:32:08] sure [20:32:36] milimetric: ok, good. Now we should do similar changes on varnish right? [20:32:45] nuria: am there [20:32:55] madhuvishy: i tried to run tests, wasn't succesful [20:33:00] many depenencies, kafka test setup, etc. [20:33:01] i thikn [20:33:06] ottomata: aah right [20:33:53] nuria: yeah, that makes sense. Especially the new DNT wording is very explicit that people's information should never even be collected in the first place [20:36:36] nuria: mforns and I have also been working on {tick} which is the project to clean up all EL schemas data in the mysql datastores, to make sure we aren't storing any sensitive/PII data beyond 90 days [20:36:54] :] [20:37:59] nuria, there's still a couple to-do's for tick [20:38:00] we need to work on applying those rules for the data in hadoop too, when we move to the new EL - there's a ticket for that - https://phabricator.wikimedia.org/T106253 [20:52:49] ottomata: lostya [20:53:03] ottomata: but it's ok, i have all my questions answered now [20:53:28] mforns, madhuvishy so many things! [20:53:43] aha [20:53:47] mforns, madhuvishy : i will write a ticket for us to check dnt on varnish then? [20:53:49] nuria: :) Yeah! [20:54:00] cc milimetric [20:54:20] nuria: sorry internet hiccup [20:54:22] we good ja? [20:54:38] agreed, dnt on varnish checking [20:54:41] ottomata1: yessssir [20:54:54] nuria, dnt on varnish? [20:55:03] milimetric: what project do i put the ticket under? [20:55:23] nuria: this would be oryx, general EL [20:55:23] nuria, does it have to do with tick? [20:55:29] ok ok [20:56:37] milimetric: ah wait, i was thinking for all data but maybe that will interfere with pageview? [20:56:41] cc mforns [21:00:12] (PS5) Madhuvishy: [WIP] Report RESTBase traffic metrics to Graphite [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234453 (https://phabricator.wikimedia.org/T109547) [21:01:22] (CR) Milimetric: "FYI: I'm waiting to hear from iOS folks before I look over the patch. If they're not going to chime in, let me know." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [21:01:33] (CR) Milimetric: "FYI: I'm waiting to hear from iOS folks before I look over the patch. If they're not going to chime in, let me know." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/234938 (owner: BearND) [21:02:20] nuria: I don't think we made the decision to use DNT that way yet. It's being discussed in the scope of unique ids, but I haven't heard one way or another [21:02:28] milimetric: k [21:07:38] madhuvishy: can i run EL tests on master? [21:08:28] mforns: do we deploy dashiki to a different domain now? [21:08:39] nuria: yes [21:08:42] nuria: yeah I think so, let me check once :) [21:08:52] it lives in /var/lib/dashiki on limn1 [21:08:55] milimetric: can you paste the url? [21:09:05] and we are about to puppetize the setup of all the dashboards under it [21:09:26] nuria: there are two, and there's about to be another [21:09:30] https://vital-signs.wmflabs.org/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics= [21:09:32] oops [21:09:35] https://vital-signs.wmflabs.org/ [21:09:41] here is the task: https://phabricator.wikimedia.org/T110351 [21:09:44] and https://edit-analysis.wmflabs.org/compare/ [21:10:34] milimetric: did we merged the code for VE back to dashiki? [21:10:52] yes, dashiki is fully modular and everything [21:11:03] it all works on a standard data format called TimeseriesData [21:11:14] milimetric: nice! [21:11:16] all the data converters convert to that and all the visualizers understand it [21:11:28] milimetric: did we also put stats to count usage? [21:11:45] nuria: yes, there's actually a --piwik flag when you build with gulp now [21:11:52] you tell it what piwik to connect to [21:12:07] milimetric: i see, is it teh piwiki on labs? [21:12:18] piwik.wmflabs.org has vital signs, wikimetrics, and VE dashboard data [21:12:20] yes [21:14:46] madhuvishy: let me know if you can run tests for EL on master. [21:16:19] nuria: yeah, I'm just running them [21:17:11] madhuvishy: with tox -e py27? [21:17:15] on test directory? [21:17:55] nuria: yeah. i do get 1 error, need to look into it [21:18:18] madhuvishy: ok, no worries, just was making sure my setup still worked [21:19:04] cool, yeah it should be the same, although if you want to test on vagrant for things like the processor, you need to have zookeeper and kafka running [21:19:54] madhuvishy: ok, can you get those two running on vagrant? [21:20:19] nuria: yup! let me find how i did it [21:21:20] Analytics-Kanban: Use formatversion=2 API to fetch EventLogging schemas - https://phabricator.wikimedia.org/T110450#1594780 (Milimetric) [21:21:37] nuria: here - http://docs.confluent.io/1.0/quickstart.html I used confluent - which is a system that the Kafka team built, it comes with kafka and zookeeper so it's pretty easy to install using this [21:21:40] nuria: brb [21:22:15] madhuvishy: ok [21:41:12] nuria: doing the first 3 steps should get you set up and running kafka and zookeeper [21:55:03] Analytics-Kanban, Patch-For-Review: Use formatversion=2 API to fetch EventLogging schemas - https://phabricator.wikimedia.org/T110450#1594941 (Milimetric) @Legoktm: this new patch fixes the other place where EL gets schemas (the python processor that runs on eventlog1001.eqiad.wmnet). We'll merge and depl... [22:06:44] Analytics-EventLogging, MediaWiki-API, Patch-For-Review: Mediawiki API is returning empty strings for 'required' boolean fields - https://phabricator.wikimedia.org/T97487#1594989 (Milimetric) The solution in https://phabricator.wikimedia.org/T110450 is probably better than any work-around here. We ca... [22:08:38] Analytics-Kanban, Patch-For-Review: Use formatversion=2 API to fetch EventLogging schemas - https://phabricator.wikimedia.org/T110450#1594991 (Milimetric) By the way, for history's sake, this is the schema revision that caused these problems: https://meta.wikimedia.org/w/index.php?title=Schema:CentralNotic... [22:08:48] Analytics-Kanban, Patch-For-Review: Use formatversion=2 API to fetch EventLogging schemas - https://phabricator.wikimedia.org/T110450#1595000 (Milimetric) a:Milimetric