[01:03:57] 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#3100661 (10Nuria) mmm.. I think these selects are going to work in the case of a major number of duplicates (like the ones we were seeing on popUp schema) but I certainly overcounted duplicates using a simila... [01:24:15] (03PS1) 10Gergő Tisza: Add test.wikipedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342782 (https://phabricator.wikimedia.org/T160484) [01:32:54] (03PS1) 10Gergő Tisza: Do not filter test.wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) [01:39:32] (03CR) 10jerkins-bot: [V: 04-1] Do not filter test.wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [02:05:03] 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review: Change userAgent field to user_agent_map in EventCapsule - https://phabricator.wikimedia.org/T153207#3100705 (10Krinkle) [02:05:06] 06Analytics-Kanban, 06Performance-Team: Update webperf EventLogging consumers for userAgent schema change - https://phabricator.wikimedia.org/T156760#3100703 (10Krinkle) 05Open>03Resolved [02:05:47] 10Analytics-EventLogging, 06Analytics-Kanban, 13Patch-For-Review, 10Scap (Scap3-Adoption-Phase1): Stop using global python setup.py install for eventlogging deploy - https://phabricator.wikimedia.org/T131263#3100710 (10Krinkle) [02:05:49] 10Analytics, 10Analytics-EventLogging, 13Patch-For-Review, 10Scap (Scap3-Adoption-Phase1): Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#3100711 (10Krinkle) [02:05:53] 10Analytics, 10Analytics-EventLogging, 06Performance-Team: Stop using global eventlogging install on hafnium (and any other eventlogging lib user) - https://phabricator.wikimedia.org/T131977#3100708 (10Krinkle) 05Open>03Resolved [02:09:19] (03PS2) 10Gergő Tisza: Do not filter test[2].wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) [02:14:23] (03CR) 10jerkins-bot: [V: 04-1] Do not filter test[2].wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [02:46:12] (03PS3) 10Gergő Tisza: Do not filter test[2].wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) [02:53:52] (03CR) 10Nuria: "Adding @milimetric, were we ask to remove test.wikipedia before?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [02:59:09] 10Analytics, 10Pageviews-API: Enable Pageviews API for test.wikipedia.org - https://phabricator.wikimedia.org/T160484#3100464 (10MZMcBride) Blacklisted where? [05:25:34] 10Analytics, 10Analytics-EventLogging: Support third-party use by eliminating hard dependency on Varnish - https://phabricator.wikimedia.org/T45601#3100867 (10bd808) It seems like this could be done with a SpecialPage in EventLogging itself. The event payload could be emitted as a structured PSR3 log event to... [06:00:49] 10Analytics, 10Pageviews-API: Enable Pageviews API for test.wikipedia.org - https://phabricator.wikimedia.org/T160484#3100915 (10Tgr) Links due whenever jenkinsbot sobers up. [07:06:51] 10Analytics, 10Analytics-Cluster: Enable hyperthreading on analytics100[12] - https://phabricator.wikimedia.org/T159742#3100958 (10elukey) Yes sure! [07:09:31] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3100960 (10Marostegui) >>! In T156844#3099522, @Ottomata wrote: > Oh yeah, rats, I totally forgot to put this in our budget request. Hm. do db1046 and db1047 host just EL da... [07:18:27] 10Analytics, 06Operations, 13Patch-For-Review: Remove cronspam from stat1002 to root@ - https://phabricator.wikimedia.org/T145606#3100964 (10elukey) 05Open>03Resolved [08:27:34] * elukey afk for a bit [08:28:00] joal: will be back in ~one hour, we can deploy / add more instances to aqs beta / other when I am back [09:26:13] elukey: Hi [09:36:00] 10Analytics, 10Pageviews-API: Enable Pageviews API for test.wikipedia.org - https://phabricator.wikimedia.org/T160484#3100464 (10JAllemandou) Currently filtered out of pageviews by definition: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics... [09:38:58] (03CR) 10Joal: [C: 04-1] "Thanks for the patch, one nit (see comment inline). However this patch is not enough for test.wikipedia to appear in pageviews (See commen" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342782 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [09:40:07] (03CR) 10Joal: [C: 032] Do not filter test[2].wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [09:41:41] (03CR) 10Joal: [C: 04-1] "Please disregard comment on pageview definition - I missed the patch you also provided in refinery-source. Comment on date is still valid " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342782 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [09:48:10] (03Merged) 10jenkins-bot: Do not filter test[2].wikipedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/342784 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [09:48:28] o/ [09:50:44] elukey: do we add machines to AQS beta? [09:51:52] joal: I have been thinking if it makes sense and what are the pros/cons [09:52:41] it is good from the perspective of keeping beta as close as possible to prod [09:52:59] the default keyspace replication is 3 [09:53:32] and there was also another problem related to having the read quorum that fdans mentioned (didn't get the time to investigate) [09:54:02] on the other hand, we'll need to keep a "mini" cassandra cluster up and running, configuring puppet accordingly (do we need multiple instances too?) [09:54:33] * elukey thinks out loud just for clarity [09:54:42] elukey: I don't think we need multiple instances (as in multiple instances per machine) [09:58:12] (I am checking the deployment-prep's config) [09:58:56] (and the restbase config) [10:08:41] ok I am creating aqs02 [10:08:46] let's see how it goes [10:09:04] it will take a bit though, since we'll need to add the puppet config in labs for the new cassandra cluster [10:10:00] (03PS2) 10Gergő Tisza: Add test.wikipedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342782 (https://phabricator.wikimedia.org/T160484) [10:23:38] (03CR) 10Joal: [C: 031] "LGTM - Waiting for the rest of the team to confirm and merge." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342782 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [10:25:35] deployment-aqs02 and 03 are building :) [10:28:54] awesome elukey :) Thanks for that [10:42:54] hello team :] [10:46:16] (03CR) 10Mforns: "LGTM! Left 1 comment, but I'm OK with merging as is. Cheers" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/342601 (https://phabricator.wikimedia.org/T160311) (owner: 10Joal) [10:48:44] (03CR) 10Mforns: [C: 031] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341586 (https://phabricator.wikimedia.org/T160152) (owner: 10Joal) [10:51:46] (03CR) 10Mforns: [C: 031] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342030 (https://phabricator.wikimedia.org/T160153) (owner: 10Joal) [10:56:03] (03CR) 10Mforns: [C: 031] Add mediawiki history spark jobs to refinery-job (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T144717) (owner: 10Joal) [10:58:40] (03CR) 10Mforns: [C: 031] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) (owner: 10Joal) [11:11:52] (03CR) 10Mforns: [C: 031] "LGTM!" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328154 (https://phabricator.wikimedia.org/T141473) (owner: 10Joal) [11:21:08] (03PS1) 10Elukey: Add new beta aqs instance hostnames to the scap config [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342815 [11:21:44] (03CR) 10Elukey: [V: 032 C: 032] Add new beta aqs instance hostnames to the scap config [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342815 (owner: 10Elukey) [11:23:08] deploying on aqs02 and aqs03 [11:23:25] cassandra is not working due to a missing jar (the prometheus jmx exporter) [11:23:30] need to figure out how to deploy it :D [11:35:10] still trying to make the new instances working people [11:35:35] elukey: I'm sure I'll from you well you'll have working system :) [11:35:40] there are some scap misunderstadings between me, tin and cassandra [12:29:18] (03PS2) 10Joal: [WIP] Add oozie job for standard metrics computation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342197 (https://phabricator.wikimedia.org/T160151) [12:33:33] (03PS7) 10Joal: Add oozie job loading MW history in druid [analytics/refinery] - 10https://gerrit.wikimedia.org/r/328154 (https://phabricator.wikimedia.org/T141473) [12:54:22] Hi halfak and milimetric - I have nothing special today [12:55:55] (03PS3) 10Joal: [WIP] Add oozie job for standard metrics computation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342197 (https://phabricator.wikimedia.org/T160151) [13:00:35] o/ joal [13:01:08] Maybe just a quick chat about the status of sklearn in spark? [13:01:38] halfak: didn't work on it :( [13:01:42] I imagine that things might not have changed, but I'm closer to running a big spark job to get historic article quality scores. [13:01:44] Gotcha. [13:02:16] halfak: when you say closer, how long before you go for it? [13:02:52] Hmm... I might be able to give it a try some time this week. [13:03:02] Might just use multiprocessing and stat1003. [13:03:18] k halfak [13:08:58] (will working on aqs, there seems to be a broken setting in deployment-prep..) [13:10:48] I'm around, figured from the above we're not meeting though, right? [13:12:36] I haven't joined the call, but I can! [13:16:35] let me know if we actually do folks :) [13:16:52] halfak, milimetric --^ [13:19:52] taking a break lads [13:39:13] * elukey commutes to the office [13:39:25] aqs beta status - still broken but less than this morning :D :D [13:39:34] need to fix the cassandra seeds and we should be ok [14:14:39] nice now aqs02 is dying because of OOMs of nodejs -.- [14:17:01] - seeds: deployment-restbase01.deployment-prep.eqiad.wmflabs,deployment-restbase02.deployment-prep.eqiad.wmflabs [14:17:04] grrrrrr [14:19:03] cassandra::target_version: '2.1' [14:19:06] * elukey cries in a corner [14:19:37] this is for aqs01 [14:22:28] elukey thank you so much for taking care of this ^_^ [14:23:09] joal: whenever you want we can deploy aqs (I can do it myself if you feel comfortable about it) [14:30:28] (03PS4) 10Mforns: Add script to generate WSC abbrevs to domain map [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) [14:32:26] elukey: if the prometheus jar is still bugging you - I pinged Eric and pointed him to this channel [14:32:33] (03CR) 10Mforns: ">" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) (owner: 10Mforns) [14:33:02] fdans: we're in meeting with elukey discussing about cassandra [14:33:13] fdans: depending on what foes on, we'll go for it after [14:36:58] (03CR) 10Mforns: Add spark job to aggregate historical projectviews (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [14:40:02] gwicke: thanks! Fixed in deployment-prep's puppet config, working now :) [14:46:05] * urandom waves at elukey [14:53:24] ottomata: does this minichange LGTY? https://gerrit.wikimedia.org/r/#/c/342205/ [14:56:31] ah fdans ! Found one interesting thing [14:56:34] aqs::cassandra_default_consistency: localOne [14:56:40] this is for prod [14:57:18] but we probably don't set it for beta [14:57:33] ohh nice [14:57:46] where is that elukey? [14:58:44] in puppet.. [14:58:49] so this might be one of the problems [14:58:51] I'll correct it [14:59:03] elukey: what's up? [15:00:15] will be couple minutes late to standup, finishing other meeting [15:00:27] fdans: change looks fine to me, but i don't have a lot of context there [15:01:04] milimetric: standduppp [15:01:28] urandom: o/ - a bit of mess in deployment-prep but and I am trying to fix :) [15:02:09] elukey: k, let me know if you need anything [15:05:56] urandom: current status - http://giphy.com/gifs/hulu-mindy-kaling-the-project-lahiri-3oz8xAX8CrZuU91VLO [15:15:34] 10Analytics, 10Analytics-Wikistats: Wikistats stalls with 'Out of memory!' - https://phabricator.wikimedia.org/T160533#3102378 (10ezachte) [15:26:33] 10Analytics, 10Analytics-Wikistats: Wikistats stalls with 'Out of memory!' - https://phabricator.wikimedia.org/T160533#3102452 (10ezachte) 05Open>03Resolved Both issues fixed in WikiCountsInput.pm and WikiCountsLog.pm [15:29:38] joal: i gotta start workign with services on librkdkafa upgrade [15:29:46] let's talk version name later today or maybe tomorrow [15:29:56] sure ottomata - is it a deal breaker? [15:30:01] deal breaker? [15:30:20] oh, is it already out there being used? [15:30:22] this version name [15:30:22] ? [15:30:35] ottomata: not used yet - but at the verge of :) [15:30:40] Present in many places [15:37:29] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3102481 (10Nuria) ping @jcrespo Is the solution of renaming tables easier on the dabase end? that would work great for us too. Please let us know. [15:38:07] (03CR) 10Nuria: [C: 032] Add spark job to aggregate historical projectviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [15:38:47] (03CR) 10Nuria: [C: 032] "Mergin these changes, let's make sure to document them in wikitechin case we need to rerun" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [15:38:49] (03CR) 10Nuria: [V: 032 C: 032] Add spark job to aggregate historical projectviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [15:39:00] thanks nuria ! [15:39:05] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3102484 (10jcrespo) Not sure all details of that-but yes, renaming a table is an instant operation, doing a schema change can take up to a keep per table and server. [15:43:37] (03Merged) 10jenkins-bot: Add spark job to aggregate historical projectviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337593 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [15:44:50] * mforns restarts computer to fix audio [15:45:56] (03CR) 10Nuria: [V: 032 C: 032] "Merging, looks good. Thanks for doing changes." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/338786 (https://phabricator.wikimedia.org/T158330) (owner: 10Mforns) [15:47:45] (03PS4) 10Joal: [WIP] Add oozie job for standard metrics computation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342197 (https://phabricator.wikimedia.org/T160151) [15:49:13] (03CR) 10Joal: [V: 032 C: 032] Add test.wikipedia to the pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/342782 (https://phabricator.wikimedia.org/T160484) (owner: 10Gergő Tisza) [15:49:37] 06Analytics-Kanban: Enable Pageviews API for test.wikipedia.org - https://phabricator.wikimedia.org/T160484#3102515 (10JAllemandou) [15:51:16] 06Analytics-Kanban: Enable Pageviews API for test.wikipedia.org - https://phabricator.wikimedia.org/T160484#3100464 (10JAllemandou) Let's not forget to update refinery jar version bump at deploy time for those two patches to be successful. [15:52:44] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3102530 (10Nuria) Excellent, let us know what you think is a good time on your end to do this and we will take an outage accordingly. For us, the sooner the better. Ideally (i think) we might... [15:54:29] milimetric: do you have aminute for me? [15:55:25] joal: I have 5 [15:55:35] batcave [15:55:36] ? [15:55:38] milimetric: you know me enough :) [15:55:42] OMW ! [15:56:22] fdans: you therE? [15:56:41] elukey: oui! [15:57:16] so I applied the localOne patch, but as I was saying in standup it might be better to wipe the whole cluster and re-create all the keyspaces (with proper replication) again [15:57:24] do you have a lot of data loaded? [15:57:51] no, I don't have a problem with wiping the cluster :) [15:58:00] all right, wiping :) [15:58:08] after this the cluster should be ready to go [16:00:37] yissss [16:02:07] 10Analytics, 10Analytics-General-or-Unknown: kafkatee not consuming for some partitions - https://phabricator.wikimedia.org/T73056#3102631 (10Liuxinyu970226) [16:03:49] 06Analytics-Kanban, 10DBA: Change length of userAgent column on EL tables - https://phabricator.wikimedia.org/T160454#3102668 (10Ottomata) > change https://github.com/wikimedia/eventlogging/blob/master/eventlogging/jrm.py#L79 for length of varchars to what? (@jcrespo to advice) FYI, the comment on this line sa... [16:05:02] (03PS14) 10Joal: Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 (https://phabricator.wikimedia.org/T160155) (owner: 10Milimetric) [16:07:28] !log Wiped AQS Beta cassandra cluster [16:07:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:10:58] elukey@deployment-aqs03:~$ nodetool status [16:10:58] Datacenter: datacenter1 [16:10:58] ======================= [16:10:58] Status=Up/Down [16:10:58] |/ State=Normal/Leaving/Joining/Moving [16:11:00] -- Address Load Tokens Owns (effective) Host ID Rack [16:11:03] UN 10.68.18.237 101.88 KB 256 62.3% 25af9396-9f12-4d84-9b24-b9b2d9742974 rack1 [16:11:06] UN 10.68.17.125 82.18 KB 256 67.8% ad17c86d-e842-4ada-b2c3-f8c3f8f7ac8d rack1 [16:11:09] UN 10.68.17.90 103.44 KB 256 69.9% d1eb626b-2a77-4969-9efc-ada6c16e8875 rack1 [16:11:12] \o/ [16:13:06] * mforns 1 vs 0 headset [16:13:42] mforns: what user do you use to load data to aqs beta? cassandra? [16:14:08] elukey, I've never done it, but yes, in the docs it says cassandra [16:14:24] okok :) [16:14:28] elukey: can we move forward with fdans ? [16:14:47] (03PS9) 10Joal: Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) [16:16:07] elukey ok for me to test? [16:16:17] joal: just deployed aqs/deploy to all the hosts, it went super fine [16:16:24] elukey: awesome :) [16:16:29] cluster is up and running and wiped, you are free to go [16:16:34] let me know if there are any isuses [16:16:36] *issues [16:16:42] fdans: do you mind trying a beta deploy? [16:17:14] joal that was the plan right? [16:18:12] correct :) [16:19:20] joal, elukey is this through the procedure described in aqs docs? [16:20:57] fdans: the last aqs/deploy has been already deployed, not sure if you have other patches to merge first? [16:23:36] oh in that case there is no need to redeploy. My only pending change is in refinery joal elukey [16:24:10] (03PS15) 10Joal: Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 (https://phabricator.wikimedia.org/T160155) (owner: 10Milimetric) [16:24:21] forgot about that fdans :) [16:25:04] joal: I'm going to add data to the v2 keyspace and make sure the data is being pulled from there [16:25:36] fdans: awesome [16:26:08] (03PS10) 10Joal: Add oozie jobs for mw history denormalized [analytics/refinery] - 10https://gerrit.wikimedia.org/r/341030 (https://phabricator.wikimedia.org/T160074) [16:26:10] let's also check that the keyspaces are replicated [16:26:19] Good call elukey u [16:26:34] elukey: replicated? [16:26:44] ahh across the instances [16:26:54] yes sir [16:27:16] see I start to understand some things :D [16:27:36] you know more than me now about keyspaces :) [16:28:24] the keyspacealist [16:30:50] joal: I probably need to change the keyspace here https://github.com/wikimedia/analytics-aqs-deploy/blob/master/scripts/insert_monitoring_fake_data.cql#L14 [16:31:15] ohhh ! Well spotted fdans ! [16:31:22] I had forgotten about that one ! [16:31:42] haha just realised when thinking about fake data [16:31:57] will send a change in a bit [16:36:01] elukey: any change in configuration could be causing this? [16:36:04] https://www.irccloud.com/pastebin/AODVO8mu/ [16:36:16] see https://issues.apache.org/jira/browse/CASSANDRA-11574 [16:37:58] never encountered this issue before [16:38:50] me too [16:39:38] but we are running 2.2.6 now [16:39:45] and there is a fix in 2.2.7 [16:39:54] so before this you were testing on 2.1 [16:40:11] haha way to hit the bullseye [16:40:42] In prod we have 2.2.6, deployment-prep was left behind [16:42:05] so shall we upgrade to 2.2.7 elukey? [16:45:56] fdans: well it is not that easy, we do it in a controlled way via the Services team [16:46:03] there are tons of things to test etc.. [16:46:19] right, of course [16:46:33] * fdans goes back to knowing nothing [16:47:03] nono I hate to do the grumpy ops person that says no :) [16:47:20] but we a cassandra upgrade need to be really really really tested [16:47:30] me and Joseph had a lot of "fun" during the past months [16:48:46] elukey: understood, that makes total sense :) [16:51:19] hmm, for the purpose of testing that aqs is using the right keyspace I could manually insert a couple of rows into cassandra and fetch them with the api [16:52:02] does this seem acceptable in absence of the COPY from csv option joal? [16:54:49] fdans: this is very accpetable yes :) [16:55:00] terrific [16:56:41] bye team, see you tomorrow! [16:57:15] bye mforns ! [16:57:25] milimetric: about trveling from Pague to Vienna [16:57:42] milimetric: I asked for an afternoon train, since my train vendor didn't offer any night train [16:58:23] milimetric: I suggested to travel the 16h52 - 20h49 train on thursday evening [16:58:53] sounds great joal [16:59:14] milimetric: wanted to let you know [16:59:25] that's great, we can have lunch, relax, and leave [16:59:36] I'm really looking forward to it [17:00:08] milimetric: so do I :) Hack with team, then hack with volunteers :) [17:00:17] fdans: I was talking with joal and we think maybeonce this code is deployed you can move to help marcel with the FE of the reportcard? We probably need a short meeting to decide what we need for a 1st version [17:00:19] HAAAAAAAACK ! [17:01:30] nuria: sure, but what do you mean by FE? [17:01:39] frontend [17:01:46] ooohhhh [17:01:48] oh of course, silly me [17:01:52] old fashion! [17:14:39] 06Analytics-Kanban, 10Analytics-Wikistats: Visual prototype for community feedback for Wikistats 2.0 iteration 1. - https://phabricator.wikimedia.org/T157827#3017940 (10Nuria) Ping @Erik_Zachte visuals are downloadable as pdf [17:18:26] elukey: /srv/deployment/analytics/aqs/deploy/src in aqs-01 doesn't seem to contain the latest state of the aqs repo [17:18:52] is this a sign that the deploy didn't do it, or am i missing something? [17:19:10] fdans: possibly elukey didn't pull before deploy? [17:19:51] fdans: And actually, merge != deploy, so maybe your ptch ahsn't even been deployed? [17:20:42] yeah, I was working with the assumption that the latest patch had been pulled joal [17:20:59] I will redeploy aqs beta with the latest changes if that's ok with you elukey [17:26:54] tqking 1/2h break, will be back [17:28:43] 06Analytics-Kanban, 15User-Elukey: AQS: Verify that node not being able to restart logs locally to errorlog not to logstash - https://phabricator.wikimedia.org/T155791#3103077 (10elukey) Verified today while working on AQS beta that we are not logging correctly when node fails to start, and this is not the per... [17:32:54] fdans: I pulled the last changes, have you checked git log [17:32:55] ? [17:33:22] fdans: sometimes unix tricks you do cd /srv/deployment/analytics/aqs/deploy/src [17:33:25] and git log [17:33:29] and see what you get [17:34:10] elukey: shouldn't we update the submodule hash in aqs-deploy? [17:34:30] fdans: let me check [17:34:42] right now it's pointing to an old commit in aqs, afaics [17:36:05] (I'm familiar with this because in carto we had a library with 3 sub-submodules and it was paaaainful to update anything) [17:37:14] have you guys updated aqs-deploy after the changes to aqs-src? [17:37:21] elukey: it should be pointing at b747875 , it's now pointing at dd519b8 [17:37:36] nope, sorry [17:37:54] the last one in aqs-deploy is https://github.com/wikimedia/analytics-aqs-deploy/commit/e0da1bded35e803731d0442a293300ef98c76a27 [17:38:06] (plus stuff that I committed but related to scap) [17:38:58] will push a CR now, if you'd be so kind of verifying it I'd be forever grateful elukey [17:39:19] fdans: are you familiar with the docker process to update the aqs-deploy repo? [17:39:31] oh [17:39:41] https://wikitech.wikimedia.org/wiki/Services/Deployment [17:40:07] it basically creates what joal committed in the link above [17:40:39] we use it to freeze all the node dependecies [17:40:50] to avoid using npm-update/install [17:42:01] and also updates the submodule's sha [17:42:20] I'd suggest to wait for joal tomorrow and do it with him [17:42:26] wdyt? [17:44:17] elukey sure, sounds good [17:45:19] elukey: is this done in your local machine or in tin? [17:45:59] local machine, then it creates a gerrit review after that [17:46:08] once merged, you pull it from tin [17:46:13] and deploy [17:47:22] 10Analytics-Tech-community-metrics: "git_top_authors" widget has a slightly confusing column "Projects" which is always "1" - https://phabricator.wikimedia.org/T160554#3103251 (10Aklapper) [17:53:30] 10Analytics, 10ChangeProp, 10Edit-Review-Improvements-ReviewStream, 10EventBus, and 4 others: Set up the foundation for the ReviewStream feed - https://phabricator.wikimedia.org/T143743#3103281 (10Ottomata) Status update? :) [17:56:29] back [17:56:34] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3103312 (10RobH) So there are not going to be a lot of server chassis that can accomodate an off the shelf GPU card. Additionally, it would then not have any kind of... [17:56:47] elukey:, fdans: given hour, I'd feel better doing that tomorrow morning [17:57:40] sure joal tomorrow I'll definitely start at a normal hour, sorry for dragging this [17:58:19] np fdans :) [17:58:46] fdans, elukey: tomorrow morning, 11am ? [17:58:55] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3103329 (10RobH) Once we get the GPU options hammered down, all the rest of the specs are easy in comparison. [17:58:57] sounds good [17:59:05] cool fdans :) [17:59:30] ottomata: any chance I might have some of your time? [17:59:58] joal: sure, am async deploying scb stuff with services folks [17:59:59] what's up? [18:00:09] ottomata: VERSIONNNNNN !@ [18:00:14] :D [18:00:21] haha [18:00:26] bc real quick? [18:00:46] i might have to block you on scb deploy here and there? [18:00:47] heheh [18:01:14] sure ottomata OMW [18:01:37] joal: +1 for tomorrow morning :) [18:01:41] * elukey goes afk! [18:01:43] byyeee [18:01:57] 10Analytics-Tech-community-metrics: Explain slightly different Git commit numbers for some authors between "git_top_authors" and "Authors" - https://phabricator.wikimedia.org/T160557#3103352 (10Aklapper) [18:02:39] do you guys like maps? https://www.oreilly.com/ideas/drawing-a-map-of-distributed-data-systems [18:05:31] (03PS1) 10Fdans: Change keyspace name to project_v2 in fake data script [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342876 (https://phabricator.wikimedia.org/T156312) [18:07:14] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3103386 (10RobH) Please also note a concern was raised about the driver support of these GPU options: >>! In T148843#3075519, @Ladsgroup wrote: > Regarding GPU optio... [18:08:01] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3080331 (10MoritzMuehlenhoff) Note that the Nvidia OpenCL drivers are closed-source (as the other parts of the Nvidia drivers). Note sure about AMD, but they've becom... [18:12:45] milimetric: ottomata was too persuasive :) [18:13:06] heh, I don't have strong opinions, it's all good [18:13:39] milimetric: Idea is to use a single field instead of two: snapshot=labs-2017-03 [18:13:49] milimetric: and use snapshot name [18:14:03] nuria: any objection --^ ? [18:14:18] hm, interesting, seems good [18:15:00] just a thought, a good convention for snapshot partition woudl be helpful [18:15:02] would something like [18:15:11] 2017-03_public [18:15:16] or maybe that is the usual case [18:15:17] 2017-03 [18:15:22] and then for internal/private [18:15:26] 2017-03_private [18:15:29] or 2017-03_internal [18:15:30] ? [18:15:52] ottomata: I like the idea of having only the date for public [18:15:53] the intention is to hopefully have the public/labs sourced history be the only one we use, right? [18:15:54] if we can? [18:16:14] ya, that would be a better interface for users, if we expect them to specify the snapshot partition [18:16:14] ottomata: 2017-03_internal is nice [18:16:29] cool +1 from me :) [18:16:31] thanks yall [18:26:27] hjoal: sounds good [18:26:33] joal: sounds good [18:26:56] joal: let's make sure to document as research would need to know wall this when they help us vet metrics [18:27:08] sure nuria [18:27:25] nuria: first updating the patches tomorrow morning, then documenting :) [18:27:55] joal: ya, no rush we can document after deploying once things have had a time to bake [18:28:05] yes [18:29:08] hey all, gonna leave early today, will work some hours on friday to make up [18:29:11] laters! [18:29:24] Bye ottomata, thanks again for the good idea :) [18:35:11] fdans: yt? I can help you with getting the patch ready to aqs deploy [18:35:27] nuria yeah! [18:35:36] batcave? [18:36:20] fdans: ok [19:09:41] gone for dinner a-team, see you tomorrow [19:10:11] (03PS1) 10Nuria: Update aqs to b747875 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342882 [19:18:26] (03CR) 10Fdans: [V: 032 C: 032] Update aqs to b747875 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/342882 (owner: 10Nuria) [19:52:35] elukey: (no need to respond) i did do the workarround described here : https://issues.apache.org/jira/browse/CASSANDRA-11574 to get copy cmdworking again in cassandra [19:53:00] elukey: meaning that i modified root@deployment-aqs01:/usr/lib/pymodules# mv ./python2.7/cqlshlib/copyutil.py [19:53:20] elukey: ejem... i hope this in place modification will let you sleep at nite [19:53:36] elukey: on the plus side it will let us load data [19:53:40] hopefully [19:54:08] cmd works but files that i exported earlier fail with the same bogus error i had at first that urandom help troubleshoot [19:54:18] urandom: yt? with a bit of free time [20:01:36] cc fdans re: see scroll about copy cmd [20:02:14] oh awesome [20:02:59] fdans: do try w your data, i think our cassandra upgrade might have affected how data has to be formatted [20:03:34] nuria, the only thing is that scap is failing to deploy [20:03:58] was going to address it tomorrow morning with luca [20:04:09] 19:20:56 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/aqs/deploy', '-g', 'default', 'fetch', '--refresh-config'] on deployment-aqs03.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed. [20:04:15] fdans: do try to fix your docker with pchelo in services chat [20:04:45] fdans: maybe try to fix your docker with Pchelolo in services chat? the scap issue we can probably do after [20:05:26] yeah I'm with that nuria [21:19:55] 10Analytics-Tech-community-metrics: Clarify differences between similar widgets - https://phabricator.wikimedia.org/T160576#3104035 (10Aklapper) [21:20:03] 10Analytics-Tech-community-metrics: Explain slightly different Git commit numbers for some authors between "git_top_authors" and "Authors" - https://phabricator.wikimedia.org/T160557#3103352 (10Aklapper) The same is true for [[ https://wikimedia.biterg.io/edit/app/kibana#/visualize/edit/git_commits_organization... [21:20:32] 10Analytics-Tech-community-metrics: Explain slightly different Git commit numbers for some authors between "git_top_authors" and "Authors" - https://phabricator.wikimedia.org/T160557#3104053 (10Aklapper) [21:20:49] 10Analytics-Tech-community-metrics: Clarify differences between similar widgets - https://phabricator.wikimedia.org/T160576#3104035 (10Aklapper) [21:21:09] 10Analytics-Tech-community-metrics: Clarify differences between similar widgets - https://phabricator.wikimedia.org/T160576#3104035 (10Aklapper) p:05Triage>03Low [21:38:14] (updated log location) [21:44:25] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3091603 (10Halfak) I'm also noticing that most queries stay queued now. Running queries directly against LabsDBs results in far better performance so it might be something in Quarry. Celery maybe. [21:53:02] 06Analytics-Kanban, 10ChangeProp, 06Operations, 10Reading-Web-Trending-Service, 06Services (done): Upgrade librdkafka 0.9.4 on SCB and Varnishes - https://phabricator.wikimedia.org/T159379#3104126 (10mobrovac) 05Open>03Resolved This has been completed. Thank you @Pchelolo and @Ottomata [22:11:52] (03PS1) 10Catrope: Add beta feature graph for RCFilters [analytics/limn-ee-data] - 10https://gerrit.wikimedia.org/r/342946 [22:18:10] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Go through default Kibana widgets; decide which ones are not relevant for us and remove them - https://phabricator.wikimedia.org/T147001#3104210 (10Aklapper) p:05Normal>03High [22:19:21] milimetric: looks like data in the Data Lake stops at this January 11? [23:52:43] neilpquinn: I was about to set up a meeting, we have never data but from labs (public by default) [23:54:06] neilpquinn: we want to vet public data with private data and after put the data lake in labs so everyone can take advantage of it. We are finalizing all these changes by the end of quarter, will set up meeting to communicate [23:54:22] nuria: so there will be no private data lake? [23:54:35] neilpquinn: there will be 1 data lake, will be private [23:54:47] neilpquinn: at 1st we thought we could not import data from labs [23:54:58] neilpquinn: but teh recent work by dbas made that possible [23:55:06] sorry, [23:55:21] neilpquinn :"there will be 1 data lake, will be public" [23:55:33] neilpquinn: makes sense? [23:56:15] neilpquinn: now, data will exist is hadoop and some other data storage on labs [23:56:37] neilpquinn: but the data itself will be the same, if this makes sense [23:58:29] neilpquinn: let us know if it doesn't [23:59:51] nuria: I think I understand. I assume you mean the data will be the same except for private data like deleted pages?