[00:41:12] Quarry, Security, Vuln-MissingAuthz: Quarry: Query edit restriction is enforced in UI, not API - https://phabricator.wikimedia.org/T134699#2494444 (Bawolff) Open>Resolved a:Bawolff per yuvi on irc, this can be public now [00:41:31] Quarry, Security, Vuln-MissingAuthz: Quarry: Query edit restriction is enforced in UI, not API - https://phabricator.wikimedia.org/T134699#2494447 (Bawolff) [07:13:52] Analytics-Kanban: Eventbus POST event failures after kafka 0.9 upgrade - https://phabricator.wikimedia.org/T141336#2494752 (elukey) [07:19:42] ahhh this one is related to https://phabricator.wikimedia.org/T138265#2493664 [07:21:22] Analytics-Kanban: Eventbus POST event failures after kafka 0.9 upgrade - https://phabricator.wikimedia.org/T141336#2494771 (elukey) Related https://phabricator.wikimedia.org/T138265#2493664 [08:20:35] elukey: o/ [08:21:33] morning! [08:21:49] I changed a bit https://grafana.wikimedia.org/dashboard/db/aqs-cassandra-compaction [08:21:55] the first two graphs [08:22:27] (PS1) Addshore: betafeatures, stop using temp table [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301074 [08:22:27] elukey: Just saw that :) [08:22:32] super :) [08:22:55] elukey: what purpose? Having the textual number below? [08:24:08] yep also sorted [08:24:23] elukey: sorry, wrongly said: graphs look good, main difference is about numbers being written [08:24:27] elukey: great :) [08:24:47] elukey: coffeine level still too low ;) [08:25:05] morning joal! Any luck with deploying the ooozzie job? [08:25:24] elukey: compactions will be done sometime this morning, I'kll wait for your restart before starting a new loading job [08:25:27] addshore: Hi ! [08:25:49] joal: super! I saw that the third month seems to be taking more or less like the second one [08:25:49] addshore: I wanted to apologize, I just completely forgot to deploy yesterday, and I'm doing it now :) [08:25:52] good news? [08:25:59] elukey: I hope so ! [08:26:05] \o/ [08:26:20] joal: I also completely forgot yesterday, until about 9pm ;) [08:26:21] elukey: I try to make sense in my head of the data-size pattern at load time [08:26:31] addshore: ok, not too bad then [08:26:35] addshore: doing it now [08:26:39] awesome! [08:27:05] elukey: my wonder is about sstable levels (since we are using levelled compaction [08:29:36] addshore: maaaan, we're unlucky [08:30:09] addshore: I think gerrit upgrade yesterday changed some stuff with the new deploy code [08:30:19] addshore: config dfor deploy is not looking good [08:30:26] hahahaaaaa :D [08:30:43] addshore: I need to wait for madhuvishy to help on that (she's the one having done that code, and we're not familiar with it) [08:30:59] thats fine :) I'll try and remember to remind you if you forget! :) [08:31:26] addshore: sounds good, my brain is full of holes ;) [08:32:13] joal: as in maven release form shows wrong version? [08:32:19] madhuvishy: hey ! [08:32:23] madhuvishy: not asleep ? [08:32:31] womaaaaaan [08:32:38] madhuvishy: correct, versions are NaN [08:32:41] Hey :) I just got off a flight and happen to be awake [08:32:57] You can just fill in the right numbers [08:32:58] Or [08:33:07] You can trigger a build [08:33:07] madhuvishy: I didn't mean to spy, just that it's probably very late for you :) [08:33:18] joal: https://grafana.wikimedia.org/dashboard/db/kafka (bottom graphs) [08:33:20] From the build with parameters thing on the side [08:33:23] madhuvishy: o/ [08:33:43] Basically, the parameters are populated based on the jobs workspace [08:34:05] And the workspace is either empty, or is outdated [08:34:35] Triggering a build updates it. No matter what, if you enter the right version numbers that are desired, it will work [08:34:47] Because the release job will handle scm [08:34:59] madhuvishy: If I trigger a build, it'll update the workspace values, and then I'll be able to perform release, right? [08:35:10] It just doesn't already know the right thing sometimes based on current state of workspace [08:35:16] Correct [08:35:26] You don't even have to wait for it to finish [08:35:32] madhuvishy: And, triggering a build doesn't break anythoing, right? [08:35:37] Can kill it as soon as it starts [08:35:46] ok makes sense [08:35:49] trying that [08:35:57] Nope, all it will do is mvn clean package,if it finishes all the way [08:36:08] madhuvishy: I'll let it try to finish :) [08:36:36] madhuvishy: Thanks a lot, things get clearer every time I use it :) [08:36:50] Its just pointlessly packaging ;) [08:37:05] In the first 5 seconds it will pull from gerrit and update Workspace [08:37:10] * joal like pointless stuff :) [08:37:11] That's all you really need [08:37:13] Ha ha [08:37:16] Okay :) [08:38:14] joal: oozie stopped again... [08:38:15] :/ [08:38:18] madhuvishy: How is it going in labs? Making progress on notebooks for spark? [08:38:26] elukey: mooooooh:( [08:39:37] elukey: I think it's related to requests number - When 'too many' requests (even if small), loading breaks [08:39:56] number of requests is gently growing last night [08:40:22] madhuvishy: IT WORKED !!! [08:40:25] it is definitely related to timeouts [08:40:25] READ messages were dropped in last 5000 ms: 10 for internal timeout and 0 for cross node timeout [08:40:28] madhuvishy: Thanks a mil [08:40:45] but the req/s increase does not look something so big to cause timeouts [08:41:06] elukey: old aqs is VERY sensitive to reads [08:41:31] !log Deploying refinery-source using Jenkins [08:41:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [08:41:53] com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses [08:42:01] elukey: also about kafka charts, you're willing to understand better GC on JVM right? [08:42:15] well my first goal was to have graphs :P [08:42:22] we didn't have G1 metrics before yesterday [08:42:28] elukey: ok [08:42:33] joal: awesome! In retrospect, adding run a build and make sure it succeeds might be a good pre-step to add to the deployment [08:42:41] elukey: I like the motto: if it exists, graph it ! [08:42:45] but it would be great to see if we could improve the GC settings [08:42:48] even in hadoop [08:42:57] madhuvishy: I"ll update the docs :) [08:43:21] As for labs, going great - I'm learning about storage and NFS - haven't had more time to work on spark notebooks, should do next weekend sometime :) [08:43:40] elukey: hmm, while I understand optimisqation is good, I have not yet seen issues directly related to GC [08:44:01] I'll see if I can make the maven command configurable at build with mvn clean package as default, but if you don't wanna wait can just make it mvn clean [08:44:02] great madhuvishy :) I'm happy you seem happy :) [08:44:11] :) [08:44:26] Now to sleep :) [08:44:35] bye madhuvishy, thanks again ! [08:44:40] elukey: o/ [08:44:51] np! [08:46:18] joal: I know what you mean, I am not planning to make changes to disrupt the cluster, don't worry. What I would like to do it observe how, for example, G1 would perform in a single datanode or if the G1 frequent young collection is always expected or if there are simple gotchas that we are not using and could improve the overall performance [08:46:24] this is all [08:47:16] elukey: I have no problem with optimising, just thinking of another motto about premature optimisation :) [08:48:28] well we had OOM errors on the data nodes (and before that with Yarn) resolved increasing Xmx [08:48:43] I am not saying that they could have resolved with GC tuning of course [08:48:51] but we also need to pay attention to those details [08:49:11] it is keeping your house clean in my view, not premature optimization [08:49:27] we have little time for these things but we shouldn't forget them [08:50:38] elukey: I think those 'details' as you say are very interesting and important, however there also some other 'cleasing' to do, so as you said, not forgetting this one, but other neither (I basically means make swure we prioritise as we really want :) [08:52:33] elukey: But i fairness, I'd love if you could share your learnings (I'm no expert in GC), I'd love to know more :) [08:53:02] oh sure there are tons of things to do, I'll follow the team's priorities [08:53:21] atm I am really ignorant [08:53:27] elukey: Thanks for the graph, they'll help to understand :) [08:54:01] (same thing for AQS, I didn't know that we use G1 in there, maybe a good thing to watch once in a while) [08:54:43] joal: one weird thing that I noticed in log stash [08:55:04] elukey: ? [08:55:07] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 1 responses. at org.apache.cassandra.auth.Auth.selectUser(Auth.java:276) ~[apache-cassandra-2.1.13.jar:2.1.13] [08:55:36] this is part of an error stack trace, second function call in the order [08:55:48] looks weird [08:56:41] precisely at org.apache.cassandra.auth.Auth.selectUser(Auth.java:276) ~[apache-cassandra-2.1.13.jar:2.1.13] [08:58:21] permissions_validity_in_ms [08:58:21] (Default: 2000) How long permissions in cache remain valid. Depending on the authorizer, fetching permissions can be resource intensive. [09:00:39] elukey: interesting ! [09:02:23] elukey: However by default AllowAllAuthorizer is used. Do we use this default? [09:03:16] I have no idea :P [09:03:22] huhu [09:03:24] but looks like we could have a lead [09:03:35] so I am going to help Giuseppe with some puppet patches for kafka [09:03:41] then I'll restart working on this [09:03:46] k [09:08:42] (PS1) Addshore: betafeatures, stop using temp table [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301078 [09:08:46] (CR) Addshore: [C: 2] betafeatures, stop using temp table [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301078 (owner: Addshore) [09:08:50] (CR) Addshore: [C: 2] betafeatures, stop using temp table [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301074 (owner: Addshore) [09:08:55] (Merged) jenkins-bot: betafeatures, stop using temp table [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301078 (owner: Addshore) [09:08:58] (Merged) jenkins-bot: betafeatures, stop using temp table [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301074 (owner: Addshore) [09:08:59] elukey: Just checked puppet: it seems we use CassandraAuthorizer --> permissions_validity_in_ms is a valid setting ! [09:09:07] elukey: Good catch!!! [09:11:16] I found this on krypton [09:11:17] Jul 20 21:21:06 krypton Burrow[819]: 1469049666114289868 [Critical] Cannot write PID file: open /var/log/burrow/burrow.pid: file exists [09:11:24] burrow seemed down -.0 [09:11:35] elukey: hm, not good [09:12:00] elukey: I'm completely burrow ignorant, but let me know if there's anything I can do [09:14:16] addshore: Can you please provide a patch changing the jar version number in coordinator.properties --> 0.0.33 [09:21:36] ooh, joal yes! [09:22:29] (PS1) Addshore: Use refinery-job-0.0.33.jar in articleplaceholder_metrics [analytics/refinery] - https://gerrit.wikimedia.org/r/301080 [09:22:32] joal: ^^ [09:23:12] # [*authorizor*] [09:23:12] # Authorization backend, implementing IAuthorizer; used to limit access/provide permissions. [09:23:15] # If false, AllowAllAuthorizer will be used. [09:23:17] # If true, CassandraAuthorizer will be used. [09:23:20] # Else, the value provided will be used. [09:23:22] # Default: true [09:23:25] joal: --^ [09:23:32] man I'm not yet used to the new gedrrit UI [09:23:43] elukey: yes, saw that (see my message earlier) [09:24:38] elukey: I think trying to bump the permissions_validity_in_ms is a very interesting idea :) [09:25:14] (CR) Joal: [C: 2 V: 2] "LGTM - Merging" [analytics/refinery] - https://gerrit.wikimedia.org/r/301080 (owner: Addshore) [09:25:31] joal: neither am I, I keep tripping over it! [09:25:35] Analytics-Kanban: Investigate why cassandra per-article-daily oozie jobs fail regularly - https://phabricator.wikimedia.org/T140869#2495075 (elukey) Noticed this error from logstash: ``` Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - receiv... [09:26:17] !log Deploying refinery [09:26:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [09:27:04] joal: ah snap I didn't see the commens sorry :/ [09:27:16] elukey: no problemo :) [09:27:45] elukey: That's a very good catch, it'd be great if it works ! [09:28:31] elukey: I think it involves some changes in puppet (the parameter is not defined, so quite a few changes [09:31:59] elukey: I checked system_auth keyspace: replication factor is 3 (ok) [09:32:29] elukey: But under read pressure, permissions reload costs a lot [09:33:18] addshore: code deployed [09:33:31] addshore: can you remind me when your job needs to xstart? [09:33:39] *double checks* [09:33:57] the 18th of this month please :) [09:34:44] addshore: 2016-07-18 :) [09:35:33] addshore: 0040683-160630131625562-oozie-oozi-C [09:35:40] awesome! :) [09:35:50] im looking forward to closing this ticket :D [09:36:01] addshore: I can imagine :) [10:04:29] joal: awesome, it looks like it has all run [10:04:40] addshore: yes sir [10:04:42] Will it run again for today once all data is in the table (ie tommorrow)? [10:05:43] addshore: that's oozie point [10:06:05] Just making sure it still would, as it has already done a partial run for today and spat some data out! [10:06:23] addshore: hm, that's not good [10:06:30] addshore: thanks for having noticed [10:07:01] addshore: Ok, I have found the culprit [10:07:07] :) [10:07:40] addshore: originaly, you were building an hourly job - then realise you wanted daily data - and we didn't change the data dependency in coordinator.xml [10:07:59] ahh, okay *looks* [10:08:24] addshore: You can look in oozie/last_access_uniques/daily/coordinator.properties as an example [10:08:39] oops sorry addshore coordinator.xml, not properties [10:08:58] sorry addshore for having miised that :( [10:09:13] ahhh, in input-events ? [10:09:18] correct [10:09:34] the current settings makes you wait for the current hour [10:09:44] while you want to wait for a full day [10:09:59] this is because webrequest_text dataset frequency is hourly [10:10:23] (PS1) Addshore: Fix input-events for articleplaceholder_metrics [analytics/refinery] - https://gerrit.wikimedia.org/r/301091 [10:10:26] so something like that? [10:11:23] addshore: yes, (you don't need to change the dataset name :) [10:11:38] (PS2) Addshore: Fix input-events for articleplaceholder_metrics [analytics/refinery] - https://gerrit.wikimedia.org/r/301091 [10:12:29] (CR) Joal: [C: 2 V: 2] "Merging for straight deploy" [analytics/refinery] - https://gerrit.wikimedia.org/r/301091 (owner: Addshore) [10:13:28] !log Re-deploying refinery after bug fix [10:13:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [10:16:56] addshore: I think you're patch is the one on which I have the most trouble getting it right in the last year :) [10:17:30] Hahahaa :D I'm sure my next is going to go much smoother. Many mistakes mean I now have a much greater understanding than if it went right first time ;) [10:17:41] addshore: Very true :) [10:18:25] addshore: 0040752-160630131625562-oozie-oozi-C [10:18:57] addshore: Today's job is waiting for webrequest data to be present (that is what we expect) [10:19:08] great! [10:26:23] You certainly have been a great help joal ! :) [10:26:34] It always helps to have someone in the right timezone too! :D [10:27:01] addshore: Thanks for saying so :) It is true that same timwzone helps :) [10:29:52] addshore: I also think you've been doing a great job: oozie + spark are no easy systems :) [10:30:24] hehe, well, I enjoy throwing myself in at the deep end and flailing around a bit ;) [10:34:55] I created a code review but probably it is wrong, puppet compiler does not convince me. I'll have a chat with urandom when he'll have time :) [10:35:29] elukey: great ! [10:35:53] elukey: Looks like you can restart aqs100[456] [10:36:18] all right will do [10:36:19] :) [10:36:30] thanks elukey ! [10:38:59] Analytics-Kanban: Eventbus POST event failures after kafka 0.9 upgrade - https://phabricator.wikimedia.org/T141336#2495227 (elukey) After restarting the eventbus service no issue has been registered, but I didn't try to send a fake test post to eventbus. [10:59:52] joal: all instances restarted [11:00:41] you are free to load the fourth month :) [11:05:10] elukey: Great ! Thanks :) [11:09:11] * elukey lunch! [12:13:49] Analytics-Kanban: Eventbus POST event failures after kafka 0.9 upgrade - https://phabricator.wikimedia.org/T141336#2495340 (elukey) Got another alert now. The problem seems to be coming from: python-kafka: /usr/lib/python2.7/dist-packages/kafka/client.py [12:24:37] Analytics, Pageviews-API: Count pageviews for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2495359 (Sadads) Thanks @nuria thats good to know: community programs and events could really use the data coming off these wikis. [12:25:54] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2495363 (Sadads) Finally! Yay! Super excited! [13:00:20] o/ [13:00:36] Hi halfak :) [13:00:56] I just got back from vacation, so nothing much to talk about re. live systems. [13:01:29] halfak: If you have 5 minutes, we can update you on the progress on history reconstruction (and say hello :) [13:01:42] cool [13:03:18] milimetric: [13:03:20] here? [13:04:21] sorry forgot to say,mat the doc this morning and then to Otto's [13:04:59] no prob milimetric :) [13:07:04] Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2495413 (BBlack) @Nuria - Thanks, sounds awesome :) [13:19:17] (CR) Elukey: [C: 2 V: 2] Initial basic configuration for the Refinery Scap repository. [analytics/refinery/scap] - https://gerrit.wikimedia.org/r/299714 (https://phabricator.wikimedia.org/T129151) (owner: Elukey) [13:35:18] Analytics-Kanban: Eventbus POST event failures after kafka 0.9 upgrade - https://phabricator.wikimedia.org/T141336#2495486 (Ottomata) Hmmmmm.....I'm not sure if this is new, but maybe just newly noticed? Yesterday Petr noticed this too. I went and looked at eqiad production eventlogging-service-eventbus lo... [13:35:58] (CR) KartikMistry: [C: 2] Add a script for post-processing interlanguage links stats [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/291358 (https://phabricator.wikimedia.org/T139327) (owner: Amire80) [13:36:04] (Merged) jenkins-bot: Add a script for post-processing interlanguage links stats [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/291358 (https://phabricator.wikimedia.org/T139327) (owner: Amire80) [13:37:26] good morning ottomata :) [13:37:41] (CR) KartikMistry: "I just used autopep8, so didn't check for long lines. It looks difficult for long queries." [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/292915 (owner: KartikMistry) [13:37:47] hi! [13:38:42] ayeeeee elukey can I explain state of kafka python clients in batcave to you? [13:38:47] maybe you can help me make a decision on what to dooooo [13:39:16] suuure! I was reviewing your mirror maker patch :) [13:39:18] looks good [13:39:38] ah danke! [13:39:49] in bc, whenev you got a sec [13:48:17] Analytics-Kanban: Page History: write scala for page history reconstruction algorithm - https://phabricator.wikimedia.org/T138853#2495550 (JAllemandou) Comments from a discussion with @Halfak : - Neil Quinn Could be a good tester for the data we are planning to release (he is currently taking ownership of e... [13:55:56] (CR) Nikerabbit: [C: 2] pep8 cleanup [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/292915 (owner: KartikMistry) [13:56:02] (Merged) jenkins-bot: pep8 cleanup [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/292915 (owner: KartikMistry) [14:07:25] Analytics, Analytics-EventLogging, EventBus: Upgrade eventlogging kafka client used for producing - https://phabricator.wikimedia.org/T141285#2495615 (Ottomata) Ok, after a discussion with Luca today, and because T141336, I'm going to add an extra kafka handler for eventlogging that does not affect t... [14:40:27] back [14:55:12] Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2495774 (JAllemandou) [14:55:55] buuuu ganglia buuuuu [14:56:53] we have it in https://grafana.wikimedia.org/dashboard/db/server-board [14:56:58] huhu :) [14:57:09] * joal likes ganglia :) [15:15:30] urandom: aloha! [15:15:57] whenever you have time I'd need to discuss with you a code review : [15:15:59] :) [15:23:22] * elukey grabs a quick coffee before the meetingzz [15:25:51] elukey: Yo. [15:25:58] elukey: Sup? [15:27:32] o/ [15:27:46] https://gerrit.wikimedia.org/r/#/c/301083/ [15:29:26] just wanted to know if it makes sense [15:30:49] joal: ops sync? [15:31:00] ottomata: joining ! [15:32:09] elukey: that changeset is just making a tunable, tunable via puppet, which seems totally reasonable to me [15:32:26] Hi urandom :) [15:32:27] elukey: not 100% sure why changing it would be necessary [15:32:30] joal: Hi! [15:32:40] we're in meeting but we have a couple questions for you :) [15:33:06] joal: Ok :) [15:38:38] urandom: the rationale would be to reduce the amount of auth related timeouts, I think they are killing the aqs cluster sometimes :( [15:39:23] elukey: what queries are resulting in those timeouts? and, are you using the Cassandra super-user for regular query authentication by any chance? [15:41:02] urandom: not sure, I wanted your opinion to establish how to follow up.. basically we see write rimeouts that are stopping out regular jobs, and at the same time I can see in logstash auth related errors [15:42:30] elukey: i'm just curious because i haven't seen this before. [15:42:48] elukey: but i think any authentication of the super user requires a larger quorum [15:42:55] elukey: and so might be slower [15:43:52] elukey: though, in your case I see "Operation timed out - received only 1 responses.", so I guess you'd have a problem no matter what [15:46:52] elukey: to be honest, all of this is new in 2.2, (the way authz works), and i haven't looked at it closely [15:50:31] (PS7) Milimetric: [WIP] Process MediaWiki User history [analytics/refinery/source] - https://gerrit.wikimedia.org/r/297268 (https://phabricator.wikimedia.org/T138861) (owner: Mforns) [15:51:24] (CR) jenkins-bot: [V: -1] [WIP] Process MediaWiki User history [analytics/refinery/source] - https://gerrit.wikimedia.org/r/297268 (https://phabricator.wikimedia.org/T138861) (owner: Mforns) [15:52:58] urandom: one weird thing that I noticed from the puppet compiler is differences between restbase* and aqs/maps - http://puppet-compiler.wmflabs.org/3471/ [15:53:22] that looks a bit weird [15:55:26] is that weird? [15:55:26] will also ask to Filippo, I am a bit paranoid with the cassandra puppet stuff :) [15:55:52] not sure I expected not diff for all.. or the same for all, not half and half [15:56:02] it feels like that I missed some config [15:56:13] (scratch the "that") [15:56:41] maybe it is me being paranoid [15:57:10] i see, "why just those hosts?" [15:57:13] yeah, i dunno [15:57:39] the diff that is there seems to be saying that a new class property exists (true) [15:57:52] and there doesn't seem to be a config diff (which is good/expected) [15:58:15] so it's actually the ones that don't show a diff that are weird, isn't it? [15:58:20] yeah [15:58:33] and I've put single and multi instance aqs [15:58:43] ya [16:03:42] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2495990 (JAllemandou) a:JAllemandou>Ottomata [16:04:31] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2495992 (Milimetric) The data is accessible from stat1003, if you'd like access to the raw da... [16:09:08] Analytics, Pageviews-API: Count pageviews for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2495998 (Milimetric) \o/ (I don't often spam phabricator, but when I do, I'm really excited) :D [16:13:03] Analytics-Kanban: Page History: write scala for page history reconstruction algorithm - https://phabricator.wikimedia.org/T138853#2496017 (Milimetric) @mforns would be interested in some of that archaeology, and sadly I re-created the regexes for page moves that @Halfak built. But that's ok, that was a smal... [16:37:14] (PS8) Milimetric: [WIP] Process MediaWiki User history [analytics/refinery/source] - https://gerrit.wikimedia.org/r/297268 (https://phabricator.wikimedia.org/T138861) (owner: Mforns) [16:37:32] joal: ok, that gerrit push was the latest cod [16:37:34] *code [16:37:39] it's got the new join in it [16:37:52] and I filled milimetric.simplewiki_page_history with the result of the latest algorithm [16:37:56] so you can check against that if you want [16:37:56] (CR) jenkins-bot: [V: -1] [WIP] Process MediaWiki User history [analytics/refinery/source] - https://gerrit.wikimedia.org/r/297268 (https://phabricator.wikimedia.org/T138861) (owner: Mforns) [16:49:00] Thanks a lot milimetric ! [16:49:13] milimetric: Will try to merge your changes with what I have [16:58:41] ottomata: yt? [17:01:34] a-team: short staff meeting? [17:01:49] milimetric, ottomata: holaaa [17:19:30] wikimedia/mediawiki-extensions-EventLogging#574 (wmf/1.28.0-wmf.12 - 750ce2f : thcipriani): The build has errored. [17:19:30] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/750ce2fa59aa [17:19:30] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/147535151 [17:23:06] logging off, byeeee o/ [17:26:55] logging off as well a-team [17:27:00] tomorrow ! [17:39:03] Analytics, Collaboration-Team-Interested, Community-Tech, Editing-Analysis, and 6 others: statistics about edit conflicts according to page type - https://phabricator.wikimedia.org/T139019#2496351 (Neil_P._Quinn_WMF) p:Triage>Normal [17:39:07] Analytics, Editing-Analysis: Improve performance of graphs visualizing the sessions metric - https://phabricator.wikimedia.org/T130864#2496352 (Neil_P._Quinn_WMF) p:Triage>Normal [17:53:08] ottomata: yt? [17:58:35] Analytics, Community-Tech, Pageviews-API, Tool-Labs-tools-Other, and 2 others: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2496435 (MusikAnimal) [18:01:44] nuria_: ja talkign to dan [18:01:52] halfak: hiii, yt? [18:02:15] ottomata: k, lemme know when you have a min [18:06:14] nuria_: jaa wazzzup? [18:06:39] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2496479 (Samwalton9) It would be useful to just check the data in order to verify that it's w... [18:06:50] ottomata: two questions: 1) change about sighup: we wnat it to work in case of kill-1 only correct? [18:07:05] Analytics, Editing-Analysis, Performance-Team, VisualEditor, Graphite: Statsv down, affects metrics from beacon/statsv (e.g. VisualEditor, mw-js-deprecate) - https://phabricator.wikimedia.org/T141054#2496480 (Jdforrester-WMF) p:High>Triage [18:07:14] Analytics, Editing-Analysis, Performance-Team, VisualEditor, Graphite: Statsv down, affects metrics from beacon/statsv (e.g. VisualEditor, mw-js-deprecate) - https://phabricator.wikimedia.org/T141054#2485441 (Jdforrester-WMF) p:High>Triage [18:07:25] Analytics, Editing-Analysis, Performance-Team, VisualEditor, Graphite: Statsv down, affects metrics from beacon/statsv (e.g. VisualEditor, mw-js-deprecate) - https://phabricator.wikimedia.org/T141054#2496485 (Neil_P._Quinn_WMF) p:High>Triage [18:07:55] Analytics, Editing-Analysis, Performance-Team, VisualEditor, Graphite: Statsv down, affects metrics from beacon/statsv (e.g. VisualEditor, mw-js-deprecate) - https://phabricator.wikimedia.org/T141054#2496489 (Jdforrester-WMF) p:Triage>High [18:08:51] Analytics, Editing-Analysis, Notifications, Collab-Team-Q1-July-Sep-2016: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2496493 (Jdforrester-WMF) p:Triage>Normal [18:09:06] Analytics, Editing-Analysis, Notifications, Collab-Team-Q1-July-Sep-2016: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2188759 (Jdforrester-WMF) p:Normal>Low [18:09:40] ottomata: *we want it to work [18:15:23] o/ ottomata [18:15:27] Was at lunch [18:15:34] What's up? [18:16:57] hey halfak, we had a new take on the schemas that we wanted to run by you [18:17:02] nuria_ correct [18:17:20] ottomata: ok, then let's merge change, will do [18:17:22] halfak: can do here or batcave if you like [18:17:26] ottomata: needs to be rebased [18:17:47] I have a meeting in 13 mins. [18:17:52] Could batcave afterwards. [18:18:07] It's a short one, so I'll be back in 45 mins. [18:18:08] k, I'll type it out here and you decide if you wanna chat about it [18:18:13] kk sounds good. [18:18:44] basically, we were thinking we could leave the current schemas as are and make new schemas, one per entity (page, revision, user, etc.) [18:19:06] so page for example could have (title, is_redirect) [18:19:16] and of course it would have all the other properties, but to make a simple example [18:19:50] nuria_: ok will rebase... [18:19:51] each schema would also have metadata that handles any type of change to that entity like CREATE / UPDATE / DELETE [18:20:47] so for page, we could have: (create, A, false), (update, B | oldTitle: A, false), (delete, B, false) [18:21:32] so the schema would always be the same, but we could publish it to separate topics by the verb that represents the change or even more granular (as consumers need it) [18:21:49] it would always be backwards compatible by definition because entities don't lose properties in our world [18:22:42] halfak: let us know what you think after your meeting, "short" description over :) [18:22:55] nuria_: pushed rebased patch [18:23:44] milimetric, interesting idea. What does this gain? Also, I'm curious how PageRestoration will look. [18:24:36] Oh wait.. I thought about it differently and I can see a clear gain. All user-related events would have a "user" in the noun field. [18:24:50] yes [18:25:11] what it gains is it de-couples the data from the specifics of the database and mediawiki code [18:25:38] So, PageRestoration would look like {noun: "page", verb: "restoration", event: { ... event details ... }} [18:25:46] so we don't have to think about the fact that a revision updates a page_is_redirect property for example [18:25:49] yes [18:26:08] milimetric, all events also have a "user" who is the actor and a "comment". [18:26:11] though, we'd use "restore" no? [18:26:12] :) [18:26:17] Maybe we should break those out like noun and verb. [18:26:28] +1 for "restore" [18:26:34] yes, the user that's acting along with important metadata about the user would be included [18:26:41] like user_groups maybe [18:26:57] user_groups at the time of the action or as of current MediaWiki? [18:27:04] but the level of de-normalization is always up to the consumers because it's only important for performance [18:27:16] time of the action is always the king in these kinds of representations I think [18:28:17] IMO having a standard schema of {object/noun, action/verb, user, comment, timestamp, event_data} is highly desirable from a programming/querying perspective. [18:28:39] yeah, cool. I think it'll be really good especially for new folks who don't know all the ins and outs [18:28:44] +1 [18:28:51] sweet, thanks very much for thinking it through [18:29:05] No prob! Thanks for the ping :) [18:30:48] this gerrit ui, ... sigh* [18:31:07] nuria_, but it's Better(TM) ;) [18:32:07] halfak: sadness ...as a designer told me once " never complain when someone wants to remove an action from a ui". Well, this change is the total opposite of that. [18:34:47] AFAICT, all of my favorite actions have merely moved to somewhere new. [18:38:55] halfak: in teh same crammed up screen but yes, sure, they are all there [18:40:01] ottomata: merged, looking at tests to see if i can write one to test the case of catch it all pokemon exception. The next change though (removal of child processes from start) makes some tests fail. i can look at that too [18:41:02] Analytics, Analytics-EventLogging, Patch-For-Review: Convert EventLogging to use extension registration - https://phabricator.wikimedia.org/T87912#2496563 (Reedy) [18:41:05] Analytics, Analytics-EventLogging, FileAnnotations, Multimedia, and 2 others: Move efSchemaValidate out of global scope - https://phabricator.wikimedia.org/T140908#2496561 (Reedy) Open>Resolved a:Reedy [18:53:15] oh, interesting nuria_ [18:53:16] ok [18:53:24] thanks, i'm busy with schemas today, if you have a sec to look it'd be much appreciated [18:53:32] ottomata: will do. [18:53:39] nuria_: that is necessary for confluent-kafka-python, and probably wise in general [18:53:52] since tornado is forking processes, we shouldn't do things like open connections to kafka in the parent process [18:54:00] i'm not even really sure how that is working with kafka-python now :/ [18:54:03] ottomata: why not? [18:54:17] ottomata: normally a fork will duplicate all those in teh child processes, right? [18:54:25] ottomata: just like a unix fork will do [18:54:49] ottomata: lemme review how taht worked that though [18:54:55] *how that worked [18:55:31] haha, yeah me too, i know the fds get copied, but, i'm not sure how kafka would respond to that [18:55:53] nuria_: i'm not totally sure there, but i do know that confluent-kafka didn't work without forking first [18:58:11] ottomata: ok, i can look at tests and we can talk more about this tomorrow, need to look at unix fork to see if connections are shared or rather recreated [18:59:09] k [19:00:29] ottomata: ah no, correct, connections need to be created in the child processes looks like [20:29:44] ottomata: ok, tried for a while to write a test in which the writters throw an exception in send to test the pokemon catching code you committed and gave up as the way i had to set up things my test was not testing very much [20:33:59] hmmmm [20:34:10] hm, ok [20:34:32] yeah that sounds a little hard. hm. you'd probably have to make a special handler that throws an exception. [20:34:34] hmmmmMMMMMM [20:34:35] hm [20:34:40] nuria_: did you try that? [20:34:43] you coud probably do that in the test. [20:34:52] ottomata: ya, i did [20:34:58] @writes('exception') [20:34:59] def exception_writer: [20:34:59] throw Exception... [20:35:04] ok [20:35:48] ottomata: i did it bit differently but test did not seem of much substance [20:36:39] ottomata: regardless i was looking at the newer patchset for which tests fails and they are failing cause all child processes are bind-ed to the same port [20:37:17] ottomata: they try to bind and only the 1st one succeeds (which makes sense) [20:37:59] ottomata: ah no, waitttttt [20:38:47] ottomata: no , no, I am mistaken [20:39:21] ottomata: tests just need to be updated [20:44:03] looking [20:44:17] nuria_: uhhh eh? [20:44:35] which part are you mistaken about? [20:45:05] ottomata: that processes are fighting about the port, it just testing setup i think [20:46:11] hmmmm [20:46:11] ok [20:46:24] do the tests specify to run more than one process [20:46:24] ? [20:47:02] ottomata: they should run a main unit tests process + http one in background [20:47:37] ottomata: or http-like [20:48:42] ottomata: difference is that now tests are running "start" method but before they were not , easy to fix but i'd like to understand a bit what is going on [20:49:23] Ahhhhhhhh [20:49:24] i understand [20:49:36] because they instantiate the Service object, which now instantiates tornado stuff. [20:49:36] hmm [20:49:44] makes sense. [20:50:06] ottomata: right [20:50:21] ok nuria_ don't worry about it, that makes sense. i didn't run tests for that myself this morning because i was trying to get that pushed and running on beta fast so I could work with dan today [20:50:25] ottomata: but i bet it can be made to work without doing code changes just for tests [20:50:33] ja maybe [20:51:07] ottomata: nah, i want to make sure to test your changes anyways so i can keep looking at this tomorrow [20:53:27] ok [20:53:29] cool thanks [21:11:51] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2497020 (Ottomata) Hey yall, https://gerrit.wikimedia.org/r/301284 should be able capture all of the information that the A... [21:15:18] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2497034 (Pchelolo) Hm.. I don't really understand the idea here. How are the new schemas better then extending the old ones... [21:22:38] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2497042 (Milimetric) Sure, but I can't output everything due to the capsule containing privat... [21:33:49] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2497049 (Samwalton9) Sounds great, done. [21:38:23] (PS1) Milimetric: [WIP] Script loading of edit history [analytics/refinery] - https://gerrit.wikimedia.org/r/301293 [21:42:39] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2497077 (Ottomata) Its more than just DRYing them up. Repetition wasn't a consideration as Dan and I discussed this today.... [21:47:09] Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2497082 (ellery) @BBlack, @Nuria In order to run a randomized controlled experiment, you need to ensure that users are randomly assigned to treatment conditions at the start... [22:32:50] Analytics, Operations: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2497199 (Peachey88) [22:45:43] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2497228 (Pchelolo) >>! In T134502#2497077, @Ottomata wrote: > With these schemas, you can tell if a particular property has... [23:09:48] Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2497291 (BBlack) >>! In T135762#2497082, @ellery wrote: > As far as I can tell, the proposed method also violates the more important property that users need to be randomly a... [23:13:11] Analytics-EventLogging, DBA, ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2497306 (Jdforrester-WMF)