[02:34:55] Analytics, Operations: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2497823 (Dzahn) p:Triage>Normal [02:35:38] Analytics, Operations: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2495588 (Dzahn) a:Dzahn [06:35:19] Analytics-EventLogging, DBA, ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2497306 (jcrespo) This means dropping: ``` ImageMetricsLoadingTime_10078363 ImageMetricsCorsSupport_10884476 ImageMetricsCorsSupport... [06:40:21] Analytics-EventLogging, DBA, ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2497306 (jcrespo) [07:29:40] Analytics-EventLogging, DBA, ImageMetrics: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#2498084 (jcrespo) p:Triage>Normal [10:26:48] joal: changed https://grafana.wikimedia.org/dashboard/db/aqs-cassandra-system, I switched the two sstable/disk size legends to show current values (and not avg) + I set the standard view to 24hrs [10:26:53] hope that it is fine [10:27:09] moreover I am going to merge the auth caching code review after lunch [10:27:19] the first step will be to merge the cassandra class change [10:27:29] then we'll need to upgrade hiera [10:27:40] I am wondering if this setting can be added on the fly [10:28:42] mmmm not with nodetool apparently [10:28:53] anyhow, we'll need to restart cassandra on aqs100[123] as final step [12:07:47] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#2498526 (Samwalton9) [12:12:22] milimetric: Hi :) [12:13:06] elukey: sorry, I didn't notice the pings [12:13:47] elukey: graphs are fine, yes, and great for the old-aqs puppet patch ! [12:17:27] :) [12:17:52] elukey: as expected, loading this month took longer (hive computation in betweeen loading) [12:18:21] how much longer? (curious) [12:18:37] Number of compactions is higher than previous loads, so it seems preparing data is best practice for monthly loading (but I'd rather have another go with prepared-data loading to check :) [12:19:24] joal: but are we still using hadoop to prepare data first or are we using the "old" way? [12:19:28] I am missing somethign [12:19:50] 3rd month load: ~18h - 4th month load: 23h [12:20:00] elukey: data is always prepared in hadoop [12:20:24] elukey: difference is, for month 1/2/3, I had monthly data pre-prepared, so we were just loading cassandra [12:20:35] ahhhhh okok [12:20:46] Got this 4th month, data was not pre-prepared, so I prepared it on the fly (daily) then load [12:21:05] I need to check that part, I am ignorant about it [12:21:05] elukey: I have pre-prepared month 5 and 6 :) [12:21:27] elukey: Then we need to pre-pare and load month -1 to -6 [12:21:57] elukey: I started to load from jan-2016, so we'll need to backfill 2015 as well (hidden meaning of --^) [12:22:57] sure sure [12:25:41] Analytics-Cluster, Analytics-Kanban, Deployment-Systems, scap, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2498547 (elukey) @thcipriani sorry to bother you again, but we were wondering what would be the best way to migrate the repo. Afaiu merging... [12:28:02] Analytics, Research Ideas: Wikipedia main content losts sources because too reverts, try to preserve them - https://phabricator.wikimedia.org/T141177#2498572 (Rical) Thanks to help me, I'm a new bee about projects! You have my confidence to act for me if I mistake! [12:31:46] Analytics, Analytics-Cluster, Operations: Audit Hadoop worker memory usage. - https://phabricator.wikimedia.org/T118501#2498581 (elukey) Open>Resolved We have taken several steps to improve the situation during these months: 1) better monitoring of the Hadoop JVMs - https://grafana.wikimedia... [12:33:56] Analytics-Cluster, Graphite: kafka statsd metrics rate calculation yields double underflow - https://phabricator.wikimedia.org/T97277#2498592 (elukey) [12:33:59] Analytics, Operations: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2498590 (elukey) [12:37:23] Analytics-Cluster: Audit hyperthreading on analytics nodes. - https://phabricator.wikimedia.org/T90640#1063839 (elukey) @Ottomata can we close this task or do you want to follow up more? (Not sure if having an automated script that polls ht usage and alerts if not set would be worth the time) [12:40:36] Analytics-Cluster, Operations: Install hadoop-lzo on cluster - https://phabricator.wikimedia.org/T89290#1032492 (elukey) @Ottomata still worth doing this or not? [12:45:10] Analytics, Analytics-Cluster: Audit kernel version on analytics worker nodes - https://phabricator.wikimedia.org/T109834#2498626 (elukey) Open>Resolved > elukey@neodymium:~$ sudo -i salt -E 'analytics10(2[89]|3[0-9]|4[0-9]|5[0-7]).eqiad.wmnet' cmd.run 'uname -a' shows only > 3.13.0-91-generic #... [12:54:43] hi team! [12:55:20] Hey mforns !! [12:55:21] Hi :) [12:55:31] hello joal :] [12:55:50] mforns: o/ [12:56:17] hi elukey ! [12:56:50] how are youuu [13:03:53] hey joal [13:04:04] wow, that has been a quick mforns showup :) [13:13:58] joal: permissions_validity_in_ms: 30s, what do you think? (currently 2s) [13:14:29] elukey: I'd say at least that ... Maybe even more (I can't think of any reason it would be bad) [13:14:52] yeah I was trying to think about why not something like 10 minutes [13:15:11] let's use 30s for the moment to see what changes [13:15:18] maybe 60s [13:15:22] and then we'll increase if needed [13:15:25] elukey: a value like that would be a good test to check if the thing changes a lot on loading [13:15:26] sounds good? [13:15:31] sounds good [13:15:40] super [13:17:33] milimetric: I have found a weird behaviour in the scala algo [13:17:36] ok [13:17:45] wanna chat? [13:17:49] sure! [13:17:56] can I too? [13:18:08] Hey mforns, please come in :) [13:18:11] :] [13:27:52] joal: merged the 60s change, ready to restart cassandra one at the time [13:35:42] all right aqs1001 has been restarted [13:35:59] I'll wait a bit to make sure that everything is ok and then I'll restart aqs100[23] [13:58:04] Analytics-Cluster, Operations, ops-eqiad: kafka1013 hardware crash - https://phabricator.wikimedia.org/T135557#2498758 (elukey) Open>Resolved The nf_conntrack issue has been tracked in https://phabricator.wikimedia.org/T136094 We haven't seen any more recurrences of this strange issue and we... [13:59:08] working on aqs1002 now [14:05:33] aaand finally aqs1003 [14:06:51] Analytics-Cluster, Analytics-Kanban, Operations, Patch-For-Review: Build 0.8.2.1 Kafka package and upgrade Kafka brokers - https://phabricator.wikimedia.org/T106581#2498770 (Ottomata) [14:06:53] Analytics-Cluster: Audit hyperthreading on analytics nodes. - https://phabricator.wikimedia.org/T90640#2498768 (Ottomata) Open>Resolved CLOSED! Thanks! [14:07:32] Analytics-Cluster, Operations: Install hadoop-lzo on cluster - https://phabricator.wikimedia.org/T89290#2498771 (Ottomata) Open>declined I think not! Unless someone asks for it specifically. Declined. [14:08:39] Analytics-Kanban: Investigate why cassandra per-article-daily oozie jobs fail regularly - https://phabricator.wikimedia.org/T140869#2498773 (elukey) Next step is to wait some rounds of loading jobs to see if this change had the desired impact. [14:08:49] Analytics, Analytics-Cluster, Operations: Audit Hadoop worker memory usage. - https://phabricator.wikimedia.org/T118501#2498774 (Ottomata) I think we should close this one too. I think the work you did to up the heap size probably addressed this issue. If we see it again we can reopen. Ja? [14:10:49] Analytics-Kanban: Capacity projections of pageview API document on wikitech - https://phabricator.wikimedia.org/T138318#2498776 (elukey) @Nuria shouldn't we use https://wikitech.wikimedia.org/wiki/Analytics/AQS ? Even if the pageviews are the bulk of data, the capacity prediction would fit better for AQS. Wh... [14:13:15] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2498793 (Ottomata) > I don't agree with this argument at all. You can easily take it to extreme and state that since we are... [14:22:40] milimetric: good moroninig [14:22:41] you there? [14:23:20] morning yes [14:23:27] hiiii, qq [14:23:38] what are the full list of things that can change for a page during a move? [14:23:46] title, namespace, head rev_id [14:23:54] is_redirect can't, right? [14:24:00] if a page is a redirect before a move, it will be after the move, ja? [14:24:08] anything else? [14:58:56] Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2498955 (Nuria) >For experiments on "readers", we need to think carefully about how to minimize this problem and talk to other organizations > that do AB testing without a use... [15:15:15] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2498996 (Ottomata) Ok, I like it. I'm mostly concerned about enforcing consistency and backwards compatibility, and I thin... [15:26:51] Analytics-Kanban: Capacity projections of pageview API document on wikitech - https://phabricator.wikimedia.org/T138318#2499052 (Nuria) Sounds good. [16:04:25] Analytics-Kanban: Respawn the schema/field white-list for EL auto-purging {tick} - https://phabricator.wikimedia.org/T135190#2499146 (Nuria) Open>Resolved [16:04:27] Analytics, DBA, Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2499147 (Nuria) [17:05:36] Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2499403 (Milimetric) >>! In T135762#2497291, @BBlack wrote: >>>! In T135762#2497082, @ellery wrote: >> As far as I can tell, the proposed method also violates the more importa... [17:08:11] Analytics-Cluster, Analytics-Kanban, Deployment-Systems, scap, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2499422 (thcipriani) >>! In T129151#2498547, @elukey wrote: > @thcipriani sorry to bother you again, but we were wondering what would be th... [17:17:15] (PS3) Nuria: Normalize project parameter [analytics/aqs] - https://gerrit.wikimedia.org/r/299824 (https://phabricator.wikimedia.org/T136016) [17:18:31] milimetric: my change on format was not due to jshint, sorry about that, I have nevertheless added the rule about line length [17:31:09] milimetric, mforns: I have pinpointed the non-deterministic issue, but I can't explain it to myself [17:34:55] a-team - I'm off for today, see yall tomorrow ! [17:35:11] nite [17:35:56] laters! [18:14:06] Analytics-Cluster, Analytics-Kanban, Deployment-Systems, scap, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2499623 (elukey) >>! In T129151#2499422, @thcipriani wrote: > > Sorry for the wall of text, but it is to say: you can get back to your pre... [18:30:07] Analytics-Cluster, Analytics-Kanban, Deployment-Systems, scap, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2499656 (Ottomata) > 1. Run puppet on the targets. The main thing that changes here is that the ownership of /srv/deployment/analyti... [19:01:23] ottomata: i went of your patch #3 and chnaged just tests: https://gerrit.wikimedia.org/r/#/c/301148/6/tests/test_service.py [19:01:27] *changed [19:01:44] Analytics-Kanban: User history: rewrite the user history script to use the new algorithm - https://phabricator.wikimedia.org/T141468#2499768 (mforns) [19:07:37] Analytics-Kanban: EventBus Maintenace: Fork child processes before adding writers - https://phabricator.wikimedia.org/T141470#2499814 (Nuria) [19:07:57] Analytics-Kanban: EventBus Maintenace: Fork child processes before adding writers - https://phabricator.wikimedia.org/T141470#2499829 (Nuria) a:Ottomata [19:12:11] Analytics-Kanban: Load edit history data into Druid when data is ready for enwiki - https://phabricator.wikimedia.org/T131786#2499850 (Nuria) [19:14:19] Analytics-Kanban: Load edit history data into Druid - https://phabricator.wikimedia.org/T131786#2499860 (Nuria) [19:15:07] Analytics-Kanban: Research spike: load enwiki data into Druid to study whether we need lookup tables - https://phabricator.wikimedia.org/T141472#2499864 (Nuria) [19:16:04] Analytics-Kanban: Productionize loading of edit data into Druid (contingent on success of research spike) - https://phabricator.wikimedia.org/T141473#2499884 (Nuria) [19:17:02] Analytics-Kanban: Load edit history data into Druid - https://phabricator.wikimedia.org/T131786#2499907 (Nuria) [19:17:04] Analytics: Host edit data on Druid for all wikis. - https://phabricator.wikimedia.org/T138269#2499904 (Nuria) [19:18:00] Analytics-Kanban, Spike: Spike - Slowly Changing Dimensions on Druid - https://phabricator.wikimedia.org/T134792#2499924 (Nuria) [19:19:56] Analytics-Kanban: Scale MySQL edit history reconstruction data extraction - https://phabricator.wikimedia.org/T134791#2499929 (Nuria) [19:21:47] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for simplewiki - https://phabricator.wikimedia.org/T134790#2499931 (Nuria) [19:21:51] oh nuria_ inseresting [19:21:59] do you think that's better than delaying the port bind? [19:22:13] ottomata: yes, otherwise it no work [19:22:20] nuria_: , even with my recent change? [19:22:27] it worked for me...i thought [19:22:44] ottomata: test run, but not spwaning of n processes for me [19:22:48] oh, really? [19:22:52] hm [19:22:58] ok then, i think yours is better [19:23:04] ottomata: my idea is to fix the issue where the problem is and your code was good [19:23:15] ottomata: problem was when seting up async test cases [19:23:19] ok [19:23:34] ottomata: but your 1st stab at it was fine [19:23:35] nuria_: it'd be good to run some of thoses tests with multiple processes too [19:23:50] ottomata: it can be done, i just looked at the docs [19:24:05] ottomata: but that should be part of a different changeset [19:24:10] ottomata: i think no? [19:25:44] yeah nuria_ that sounds fine [19:29:35] Analytics-Kanban: Productionize edit history extraction for all wikis - https://phabricator.wikimedia.org/T141476#2499954 (Nuria) [20:01:42] milimetric: where will we be computing metrics such us pages created, in hive? (seems that would be the case but not sure) [20:02:21] ottomata: looks like jenkins is flaky https://integration.wikimedia.org/ci/job/tox-jessie/10192/console .. do you know how can we trigger another run? [20:03:40] nuria_: reportupdater I think [20:03:48] nuria_: i think its not flaky [20:03:49] 19:11:46 ./tests/test_service.py:241:5: E303 too many blank lines (2) [20:03:50] 19:11:46 ERROR: InvocationError: '/home/jenkins/workspace/tox-jessie/.tox/flake8/bin/flake8' [20:03:55] too many blank lines on line 241 [20:03:57] in test_service.py [20:04:07] nuria_: because oozie seems like overkill and reportupdater is so much easier for outsiders to understand and contribute to [20:04:41] (PS1) Milimetric: [WIP] Analyze external link insertion and deletion [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/301432 (https://phabricator.wikimedia.org/T115119) [20:06:01] ottomata: ahem you are right sir [20:06:23] milimetric: aham, report u[dater on top of hive correct? [20:07:04] Analytics: Read mw databases AND dumps (separately) to fill the revision_create schema - https://phabricator.wikimedia.org/T131781#2500065 (Nuria) [20:07:06] Analytics: Implement Pages Created & Count of Edits full vertical slice - https://phabricator.wikimedia.org/T131779#2500064 (Nuria) [20:08:00] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#2500072 (Milimetric) That patch I just pushed is the query to get you some basic data about... [20:08:30] Analytics: Reportupdater calculations for Pages Created and Edit counts - https://phabricator.wikimedia.org/T141479#2500073 (Nuria) [20:08:35] nuria_: on top of hive or druid, druid might be more efficient as far as computational resources go, but we'll have to see [20:09:22] Analytics: Reportupdater calculations for Pages Created and Edit counts - https://phabricator.wikimedia.org/T141479#2500073 (Nuria) We need to reserach whether actual metric computations will be done on hive on druid. [20:10:06] milimetric: ok, i have created subtasks and renamed some and closed others, this is how things look now: https://phabricator.wikimedia.org/T130256 [20:11:38] great nuria, thanks for doing that, gives us a good roadmap [20:11:52] milimetric: BTW, on the A/B testing ticket what brandon and I are saying is the same thing [20:12:04] milimetric: https://phabricator.wikimedia.org/T135762#2499403 [20:12:14] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#2500090 (Samwalton9) @Milimetric, @Sadads, and I took a look at the logs so far and they loo... [20:12:20] milimetric: right? as far as i can see, it's just bucket sizes which are different [20:12:43] milimetric: let me know if you think i am missing something big time [20:13:51] nuria_: I thought brandon meant there are two layers of buckets, one is used for randomizing [20:14:05] so when a new experiment comes in, and we want to run it on 10% of the people [20:14:19] we always have 1000 buckets which get assigned roughly the same way [20:14:35] (some people fall out of the buckets but maybe they have a good chance of falling back into the bucket) [20:15:20] then we take 10% of those 1000 buckets (100 buckets) randomly and run an experiment on them. [20:15:54] oh wait... maybe I'm missing something [20:16:56] no, that's right, so we have two layers of bucketing, one transient on top of one stable [20:17:33] milimetric: mmm, i think his take is just a way to ensure not randomness but rather that two experiments are not run on user X at the same time. Randomness comes from bucket assignation already. [20:17:55] milimetric: anyways, we will see replies taht might clarify this further [20:17:57] *that [20:18:29] the way I understood it is to ensure that, for example, buckets 1-50 do not get used for multiple experiments together [20:19:11] Since we will never have 1000 experiments (example) having a thousand buckets should do it [20:19:14] so take bucket 5 for example. If it's used for experiment A with all the other buckets 1-50, then when experiment B comes, we should use 5, 60, 74, 90, 980, etc. instead of 1-50 [20:21:35] Analytics-Kanban: Wikistats 2.0. Edit Reports: Setting up a pipeline to source Historical Edit Data into hdfs {lama} - https://phabricator.wikimedia.org/T130256#2500111 (Nuria) [20:21:49] anyway, yeah, we'll see other replies, but I appreciate the problem, it's important that we solve it somehow. [20:24:34] milimetric: no, i just do not see the problem, though with the amounts of experiments we do and the size of our user base i do not think we need to go beyond the basics , specially when our bucketization algorithm depends on cookies and it will be as sticky as cookies can be. [20:24:43] milimetric: sorry [20:24:46] milimetric: retrying [20:25:54] milimetric: I do not see the problem about users being affected by different experiments and thus experiments rendering biased results given userbase and number of experiments. [20:35:12] sure, we don't run too many experiments right now [20:35:19] but once we give people this tool, there's no way to predict [20:35:57] this solution is like building a highway system before you switch people from horses to cars. Sure, it's a bit premature, but it's still good planning [20:36:08] and the system would be harder to change later [20:37:03] but even if we run two experiments on the same set of buckets, say 1-50, then the same set might get a really great improvement followed by a really bad idea, or vice versa, that definitely affects their behavior in the second experiment [20:37:04] milimetric: our number of experiments is capped by developer time and engineers though, given our achitecture these are all FE web experiments [20:37:37] yeah, there will be lots of them, I guarantee that [20:37:53] at one point or another, everyone in the org has asked me if we have an A/B testing framework [20:39:10] milimetric: sure but still that doesn't put the number of experiments we run (wanted to run) *ever * even close to 100 [20:39:28] milimetric: we do not have to take this decision now, of course [20:51:19] Analytics, DBA, Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2500140 (Neil_P._Quinn_WMF) @Nuria, @mforns, I have updated the documentation at [wikitech:Analytics/EventLogging](https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Data_ret... [20:53:53] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2500150 (Ottomata) In order to support nested entity based schemas without losing context, I've made a change to the way ev... [20:54:23] nuria_: you have jenkins success! [20:56:54] nuria_: you are my favorite eventlogging reviewer [20:57:01] i have another one for ya, just added you as reviewer :) [21:48:34] bye team, see you tomorrow! [22:14:13] (PS1) Addshore: Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301504 (https://phabricator.wikimedia.org/T119070) [22:24:51] (PS2) Addshore: Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301504 (https://phabricator.wikimedia.org/T119070) [22:25:04] (PS1) Addshore: Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301506 (https://phabricator.wikimedia.org/T119070) [22:25:54] (CR) Addshore: [C: 2] Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301504 (https://phabricator.wikimedia.org/T119070) (owner: Addshore) [22:25:57] (CR) Addshore: [C: 2] Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301506 (https://phabricator.wikimedia.org/T119070) (owner: Addshore) [22:26:10] (Merged) jenkins-bot: Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301504 (https://phabricator.wikimedia.org/T119070) (owner: Addshore) [22:26:13] (Merged) jenkins-bot: Add script to track wikidata dump downloads [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301506 (https://phabricator.wikimedia.org/T119070) (owner: Addshore) [22:33:37] (PS1) Addshore: dumpDownloads - use log dir from config [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301512 (https://phabricator.wikimedia.org/T119070) [22:37:42] (PS1) Addshore: Throw exceptions on non existant config keys [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/301513