[03:04:55] Hello, I noticed something strange! When you make a request with the API for a date without data, Google Chrome and Firefox react differently. [03:05:09] On Chrome, it shows : {"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found","title":"Not found.","method":"get","detail":"The date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet. Please check https://wikimedia.org/api/rest_v1/?doc for more information.","uri":"/analytics.wikimedia.org/v1/pageviews/top/fr.wikipedia.org/all-access/2018/01/08"} [03:05:37] On Firefox: File not found [03:05:37] Firefox cannot find the file at https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia.org/all-acces/2018/01/08. [03:06:00] Does anyone know if this can be fixed? :) [03:06:48] I think it only does that if you request it as HTML. Firefox formats it because it's a 404 [03:08:19] in other words, you need to use JavaScript or an add-on to make API requests [03:09:26] That's what I do, but I think the mistake can be confusing [08:16:07] o/ elukey - let me know when you're here if I can help [08:45:33] joal: o/ [08:54:06] from the eventlogging side, I think that we are good to start the purging script again \o/ [08:54:36] \o/ elukey :) congratulations- It's been a massive amount of work done [09:02:24] before cheering I'll wait for the first run to complete :D [09:06:16] now I am trying to fix the last puppet issues for hadoop in labs [09:06:25] and I'll spin up a new host to replicate an1003 [09:25:23] joal: today I'd need to reboot 5/6 hadoop workers to better test the new kernel, what do you think? [09:25:42] sounds good elukey - it was planned IIRC :) [09:39:02] helloooo a-team [09:39:11] Morning fdans :) [10:07:05] !log drain + reboot analytics1029,1031->1034 for kernel updates [10:07:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:13:31] joal: do you have a few minutes to look at the aqs deployment shenanigans? [10:14:19] fdans: I do ! [10:14:34] fdans: I think I'll need elukey though [10:14:50] yessss team europe! [10:16:41] fdans, elukey: IMO what is needed is some logging [10:17:16] And actually fdans, given this deploy modifs impact cassandra and not druid, you should be able to deploy in beta to make sure things work [10:17:28] I completely forgot about that fdans [10:18:08] joal: I did that before merging the aqs patch [10:18:24] fdans: let me guess .... it worked ! [10:18:43] I can do it again but I tested it at several stages before the holiday break and it worked [10:18:50] it did work yes :) [10:20:28] mwarf :) [10:21:17] joal: this command https://github.com/wikimedia/analytics-aqs-deploy/blob/master/scap/checks.yaml#L5 [10:21:23] where is it? [10:21:45] or is it just running aqs's tests? [10:22:20] fdans: this runs the AQS default-values checks [10:22:39] defined in v1 yaml file when staying monitored true [10:23:18] I seee, so I could have missed that in aqs unit tests + beta testing right? [10:23:39] I'm going to retest in beta joal, I'll update in a bit [10:23:40] hm, possible but not expected [10:23:48] fdans: sure [10:24:00] fdans: Can you remind me the name of our aqs machines in beta? [10:24:21] joal: deployment-aqs01.deployment-prep.eqiad.wmflabs [10:24:28] Thanks mate: ) [10:28:57] fdans: after a quick review, I think you're right, problem comes from fake data not being present [10:29:36] joal: oh, I thought checks in scap were made against production data! [10:30:07] fdans: I let you double check the dates for which we test data :) [10:30:52] joal: I don't understand, is this in beta or in prod? [10:30:58] fdans: prod [10:31:05] oh in the defaults! [10:31:30] fdans: we should actually call this "monitoring values" instead of default [10:31:38] haha 1970 [10:31:45] the settings in x-amples [10:32:05] hold on but other metrics also use 1970 joal [10:32:43] they do fdans [10:32:56] which is why I talk about fake data :) [10:33:20] fdans: https://github.com/wikimedia/analytics-aqs-deploy/blob/master/scripts/insert_monitoring_fake_data.cql [10:33:41] OOOOOO [10:33:56] i seee, I see [10:34:08] ok now all the dots connect in my head joal [10:34:26] Yay :) [10:35:07] so here's one question joal, can I straight send a patch to aqs-deploy adding the fake data patch or do I have to go again through the docker step? [10:35:55] fdans: Given the patch involve no js, sending it manually is the way [10:36:26] fdans: Once done, data is to be manually inserted [10:36:57] cool, on it now, merci beaucoup joal :D [10:39:24] fdans: I should have noticed that before - Many excuses [10:47:27] (03PS1) 10Fdans: Add fake data filling script for pageviews - top by country [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/402802 [10:58:37] (03CR) 10Joal: [C: 04-1] Add fake data filling script for pageviews - top by country (031 comment) [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/402802 (owner: 10Fdans) [11:01:51] joal: analytics1029,30-34 have the new kernel [11:01:59] (03PS2) 10Fdans: Add fake data filling script for pageviews - top by country [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/402802 [11:02:07] (28 and 35 are journa nodes so I prefer to do them as last step) [11:03:43] fdans: I'm surprised deploy to beta worked --> no keyspace named local_group_default_T_top_bycountry [11:04:09] elukey: ok ! [11:04:41] joal: that's weird, I may have dropped it [11:05:09] I definitely used its DESCRIBE to create the keyspace in prod cassandra the other day [11:05:25] I think that was the last time I tested [11:05:45] fdans: maybe you used test_pageviews_bycountry? [11:06:01] nono, it was the same keyspace name as in prod joal [11:06:35] i only used test_pageviews_bycountry at the very first stage [11:06:39] fdans: was it in labs? [11:06:58] yep, in the machine I passed you earlier [11:07:15] fdans: meaning, in labs and not in beta? [11:07:46] in deployment-aqs01.deployment-prep.eqiad.wmflabs [11:07:46] !log re-run webrequest-load-wf-text-2018-1-8-8 (failed after some reboots due to kernel updates) [11:07:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:08:21] joal mystery solved - I did drop the keyspace for some reason [11:08:34] just checked my last command in cqlsh [11:08:44] cassandra@cqlsh> drop keyspace "local_group_default_T_top_bycountry"; [11:44:26] * elukey lunch! [12:08:12] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3882490 (10dr0ptp4kt) @Shilad I just wanted to note that I'm back from the long period of family leave (everything's good, BTW) and saw your comment. We're not 10... [12:12:40] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain difference in number of repositories when trying to manually exclude imported third party repositories - https://phabricator.wikimedia.org/T184420#3882502 (10Aklapper) [12:14:02] joal: ok to merge + deploy? [12:14:19] fdans: tested in beta? [12:20:14] joal: but nothing in AQS has changed, just populating the example fake data right? [12:20:49] the changes that will be deployed were tested in beta before merging to aqs [12:21:10] fdans: Point is to try to have similar env in beta and prod [12:21:38] fdans: if it wasn't for me rying to test your request against beta, we could have merged the incorrect one [12:22:07] I know it's taking time, but testing in beta first (when possible) should really be done [12:24:27] fdans: I also know this wass not the answer you expected :S [12:24:32] I get that joal: what you're saying is I should test the fake data in beta, but don't I need to have that patch merged to git pull in deployment-tin and run scap in beta? [12:25:19] scap won't work with checked-out git heads [12:25:46] fdans: I'll be super happy with the request manually ran in cassandra beta [12:26:14] ok [12:26:41] restarting aqs on beta to create keyspace [12:28:48] Thanks a lot fdans :) [12:29:04] running query 👌 [12:29:15] Awesome :) [12:30:15] fdans: Ready to merge then :0 [12:30:15] https://www.irccloud.com/pastebin/KPxWCnlj/ [12:30:17] :)( [12:31:09] joal: sorry that I misunderstood what you meant when asking about testing in beta [12:31:23] (03CR) 10Fdans: [V: 032 C: 032] Add fake data filling script for pageviews - top by country [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/402802 (owner: 10Fdans) [12:31:36] fdans: No worries- Thanks for having asked for clarifications :) [12:31:56] fdans: you went faster than me in merging ;) [12:32:11] fdans: ame idea here: no need to redeploy [12:32:38] fdans: I'll manually run the query, and then we'll deploy (and can actually do it with the updated aqs-deploy repo [12:33:02] ok [12:33:37] !log Update fake-data in cassandra adiing top-by-country needed row [12:33:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:33:44] Done! [12:34:44] fdans: you can deploy if elukey blesses that :) [12:34:57] thank you joal elukey !! [12:35:37] entering deployment machine :) [12:36:22] !log Deploying AQS [12:36:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:38:23] elukey: I'm guessing I'm still not in the group to deploy aqs? :D I forgot about that, sorry [12:39:35] .away [12:39:40] ufffff [12:39:47] sorry that was meant for irssi :D [12:39:52] :) [12:40:01] fdans: I can do it [12:40:07] fdans: you are not in the group correct, we'd need to formally ask it [12:40:40] but since it involves a sudo grant, we'll need to wait for three days and then get the ops meeting ack [12:40:58] so best that we can make it happen is next week [12:41:51] elukey: I can open a task, thank you :) [12:43:06] !log Deploy AQS from tin [12:43:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:54:47] 10Analytics-Kanban, 10Operations, 10ops-eqiad: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3882607 (10elukey) Now the BMC/IPMI doesn't seem to be happy: ``` elukey@dbstore1002:~$ sudo ipmi-chassis --get-chassis-status ipmi_cmd_get_chassis_status: BMC busy elukey@dbstore10... [12:58:42] joal: I'd reboot kafka1020 now if you are ok [12:58:51] ok for me elukey [12:58:54] or kafka1012, better [12:59:47] !log reboot kafka1012 for kernel updates [12:59:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:02:40] 10Analytics-EventLogging, 10Analytics-Kanban, 10Tracking: Update client-side event validator to support (at least) draft 3 of JSON Schema - https://phabricator.wikimedia.org/T182094#3882610 (10phuedx) [13:07:26] fdans: AQS deployed !! [13:07:44] awyisssssss thank you joal !!! [13:08:01] ;) thanks for accepting all my not-fun comments fdans [13:08:09] pushing PR to restbase [13:08:26] thanks for your infinite patience with me joal [13:14:16] ok kafka1012 back serving traffic at full power [13:14:26] let's see how much the new kernel changes its perf [13:14:30] cc: moritzm [13:16:45] elukey: thanks, let's have it settle for an hour (so that caches etc. are fully warmed up) and assess the performance hit [13:17:12] elukey: as of now, nthing really noticeable in hadoop nodes AFAICT [13:18:50] joal: yep everything looks fine [13:23:51] ftr joal https://github.com/wikimedia/restbase/pull/940 [13:25:37] Sounds correct to me fdans :) [13:29:02] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain decrease in number of volunteer patchset authors for same time span when accessed 3 months later - https://phabricator.wikimedia.org/T184427#3882640 (10Aklapper) p:05Triage>03High [13:37:54] for kafka1012 there "seems" (might be too early to say that) a little increase in cpu utilization (~5%) [13:37:57] https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&var-server=kafka1012&var-datasource=eqiad%20prometheus%2Fops&from=1515415701470&to=1515418523733 [13:38:22] but overall it is really nothing to worry about, a negligible regression [13:40:59] !log reboot analytics10[36-39] for kernel updates [13:41:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:41:11] elukey: it's interesting to notice that steal/system and user show some modification [13:43:12] if this turns out to be the price to fix meltdown, I am happy :) [13:43:23] I hear that [13:45:11] 10Analytics-Kanban, 10Operations, 10ops-eqiad: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3882714 (10Cmjohnson) @elukey yes, the server will need to be powered down for a minute to unlock the Idrac. Can we do this right after meeting today or do you want to schedule for to... [13:46:25] 10Analytics-Kanban, 10Operations, 10ops-eqiad: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3882716 (10elukey) @Cmjohnson Would it be fine tomorrow around this time? Or whenever you prefer, I'd need to send an email and announce the downtime, better to alert people :) [13:46:51] need to schedule maintenance for dbstore1002 [13:48:32] 10Analytics-Kanban, 10Operations, 10ops-eqiad: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3882721 (10Cmjohnson) @elukey. Let’s schedule for 1500UTC tomorrow. [13:54:00] 10Analytics-Kanban, 10Operations, 10ops-eqiad: dbstore1002 possibly MEMORY issues - https://phabricator.wikimedia.org/T183771#3882726 (10elukey) downtime announced to engineering@ and analytics@ [13:54:37] aaand announced [14:11:45] first graph ported to prometheus - https://grafana.wikimedia.org/dashboard/db/prometheus-analytics-hadoop?panelId=4&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics [14:11:49] \o/ [14:13:34] Yay elukey ! [14:13:45] aking a break before standup a-team [14:13:54] k! [14:30:13] mforns: o/ - the cleaner has restarted, this time should be right one :) [14:30:25] elukey, yes! :D [14:30:47] mforns: EL took ~4h to catch up with the backlog on saturday, it was way quicker than I thought [14:30:57] did you see my consistency checks in the task? [14:31:03] do you think that we should run other ones? [14:32:15] (03PS8) 10Fdans: Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) [14:33:57] elukey: looking at kafka1012 there's in fact a approx 5% increase, but we're still very comfortable within spare capacity, so seem fully acceptable [14:35:05] (03CR) 10jerkins-bot: [V: 04-1] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [14:38:54] moritzm: yep exactly! [14:46:08] in the meantime, I rebooted up to analytics1039 (28/35 excluded since they are more delicate) [14:46:27] I'd proceed with 10 hosts per day [14:51:27] sounds good! [14:56:13] elukey, no sorry didn't, looking now [15:02:02] elukey, looks good to me! [15:02:25] maybe I would do one single query to NavigationTiming_17216284 to see if there were holes [15:02:27] like: [15:03:49] select left(timestamp, 9), count(*) from NavigationTiming_17216284 where timestamp > '20180101000000' group by 1; [15:04:18] I don't think I can access master... [15:05:16] 10Analytics, 10MediaWiki-Vagrant: role::spark should compile into a catalogue without dependency cycles - https://phabricator.wikimedia.org/T184151#3882952 (10Ottomata) I think we should remove all hadoop/cdh related puppet from mediawiki-vagrant :) [15:05:21] mforns: db1108 should have caught up if you want to check in there [15:05:35] elukey, ok! [15:07:47] morning [15:08:32] elukey: i'm planning on deploying that hadoop proxyuser change soon and bouncing namenodes, if that's ok with you [15:08:37] (and resourcemanagers i think) [15:09:03] ottomata: o/ - sure, I completed my reboots [15:09:20] ottomata: whenever you have time, can we chat about what kafka cluster to use for hadoop in labs? [15:09:35] oh ya i have time! [15:09:49] elukey, done, looks good!! [15:09:50] k3-1 and k3-2 would probably work fine [15:09:52] its kafka 1.0 [15:10:02] mforns: yyyeeeaaahhhhh [15:10:08] unless, you think we should install a more analytics style 0.9 cluster for testing... [15:10:08] :D [15:10:09] hmmm [15:10:11] i think it won't matter [15:10:19] yeah [15:10:22] especially since i'm running a test camus in prod from 1.0 [15:10:23] jumbo [15:10:28] of the canary data [15:10:37] we really just want to test running camus, and test that refine runs [15:10:38] I just need to add to hiera in labs the definition of the cluster [15:10:46] for what? [15:11:04] camus does this $kafka_config = kafka_config('analytics') [15:11:07] ahhh [15:11:09] ahhh [15:11:09] hm [15:11:09] (I mean, its puppet code) [15:11:21] role::analytics_cluster::refinery::job::camus [15:11:34] we could do the profile hiera role parmater shuffle [15:11:38] and make that a parameter [15:11:51] the kafka cluster in labs is called 'k3' [15:12:13] we can move all the refinery roles to profiles [15:12:56] ah ottomata, I rebooted kafka1012 today. The new kernel seems to add a ~5% cpu usage, but the rest looks good [15:13:06] if you are ok I'd reboot another host [15:15:17] +1 [15:16:20] oh nice elukey you already have that coordinator role too [15:16:33] mind if i take a moment and try to make a patch to move the jobs to profiles? [15:16:36] (03PS2) 10Mforns: [WIP] Improve WikiSelector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) [15:17:19] hmm we should probalby move analytics_cluster::refinery::job::data_check to analytics1003 it hink [15:17:32] hmm, different patch though.. [15:19:50] ottomata: please go ahead! [15:20:20] cool, you already have a nice refinery profile too :) [15:23:10] it should be ok buuut please double check it since IIRC we are not using it yet [15:23:26] !log reboot kafka1013 for kernel updates [15:23:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:28:23] elukey: no? oh ok [15:28:39] looks pretty fine to me [15:28:41] :) [15:31:55] \o/ [15:41:54] ottomata: code review looks really good! [15:42:51] cool, i'll merge this, run it, and then add paramater for camus kafka cluster [15:43:17] does pcc look good? [15:43:27] ah yes [15:44:48] it works! [15:46:14] hmm, it'd be much nicer elukey if we could pull in the kafka cluster name for the actual data we need [15:46:19] not a general 'camus' kafka cluster [15:46:20] like [15:46:23] webrequest is in jumbo [15:46:26] eventlogging is in analytics [15:46:27] etc. [15:46:28] buuut hm [15:46:34] i guess maybe those would be different profiles, dunno [15:46:51] ah yes I got what you mean [15:47:54] for now i'll just leave it [15:48:00] but might refactor when we start moving clients [15:48:01] we will see [15:48:10] elukey: https://gerrit.wikimedia.org/r/#/c/402847/ [15:48:20] oops, need to use it [15:48:21] but ya [15:50:12] ooo i think i broke stat1005 puppet, oops [15:50:16] data_check role is included there [15:50:17] fixing [15:50:25] hm, i'm going to move it to analytics1003. [15:50:27] ottomata: I wouldn't use a default in here buuut I am +1 for the change [15:50:30] it should be fine ** [15:50:36] elukey: yeha, its just making it so you can proceed [15:50:40] i think we need to refactor all of that more [15:51:33] OH [15:51:35] ottomata: do we have a analytics kafka cluster in labs? (I mean, definied in hiera like that) [15:51:36] my change didn't puppet merge... [15:51:37] huh? [15:51:43] elukey: no, you need to set it in labs :) [15:51:59] :P [15:59:54] but, e.g. if there was one in deployment prep, it would work. if we ever set up this in deployment-prep (with jumbo, etc.) then the default would work [16:00:43] ottomata: I knowwww I was kidding [16:00:52] hehe [16:07:24] moritzm: intesting, https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&var-server=kafka1013&var-datasource=eqiad%20prometheus%2Fops - kafka1013 doesn't show the 5% regression [16:07:30] so it might not even be related to the kernel [16:11:13] elukey: another nice one: https://gist.github.com/jobar/45274b8e0d383cf2ab791d4783d4cd8a [16:12:09] ahahaah [16:19:37] (03CR) 10Fdans: "just one last tiny bit" (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) (owner: 10Nuria) [16:41:37] 10Analytics, 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-kafka03 due to full disk - https://phabricator.wikimedia.org/T184235#3883319 (10fdans) [16:41:39] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Provision new Kafka cluster(s) with security features - https://phabricator.wikimedia.org/T152015#3883318 (10fdans) [16:44:10] 10Analytics, 10Analytics-Wikimetrics: Problem opening Alfagems cohort - https://phabricator.wikimedia.org/T95530#3883356 (10fdans) 05Open>03declined [16:44:39] 10Analytics-Kanban: Publishing project anomaly data for censorship researchers. Evaluate privacy threats - https://phabricator.wikimedia.org/T183990#3883357 (10fdans) [16:44:55] (03CR) 10Milimetric: [V: 032] Fix bad input events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/402379 (https://phabricator.wikimedia.org/T170764) (owner: 10Milimetric) [16:49:19] 10Analytics, 10Analytics-Wikistats: Wikistats v2 has a weird localization issues on search - https://phabricator.wikimedia.org/T182987#3840462 (10fdans) For now let's overwrite the header of the sitematrix request so that we always get English. We'll rethink this once we tackle internationalisation. [16:49:32] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats v2 has a weird localization issues on search - https://phabricator.wikimedia.org/T182987#3883402 (10fdans) [16:52:36] 10Analytics-Kanban, 10Analytics-Wikistats: [Wikistats2] The detail page for tops metrics does not indicate time range - https://phabricator.wikimedia.org/T182990#3883417 (10fdans) [16:53:05] 10Analytics, 10Analytics-Wikistats: Alpha Release: Link on message box on stats.wikimedia.org to new wikistats 2.0 domain - https://phabricator.wikimedia.org/T177643#3883418 (10fdans) 05Open>03Resolved a:03fdans [16:54:02] elukey: ops q3 goal: * Upgrade Varnish from v4 to v5 on all remaining Traffic clusters [16:54:02] ??? [16:54:03] is that ok? [16:54:07] with varnishkafka? [16:55:09] ottomata: yes yes it is, as long as they go up to 5.1 [16:55:20] (misc is already running v5) [16:55:46] 5.2 contains breaking changes to the varnish log api [16:56:08] ok cool [16:57:06] 10Analytics-Kanban, 10Analytics-Wikistats: Beta Release: Wikistats: support annotations - https://phabricator.wikimedia.org/T178015#3883444 (10fdans) [16:58:14] 10Analytics, 10Analytics-Wikistats: Beta release: Wikistats: Corners of dashboard miniatures overflow when no data - https://phabricator.wikimedia.org/T178812#3883451 (10fdans) 05Open>03Resolved a:03fdans [17:01:13] 10Analytics, 10Analytics-EventLogging, 10User-Elukey: Run eventlogging purging script on beta labs to avoid disk getting full - https://phabricator.wikimedia.org/T171203#3883479 (10fdans) a:03elukey [17:01:25] 10Analytics-EventLogging, 10Analytics-Kanban, 10User-Elukey: Run eventlogging purging script on beta labs to avoid disk getting full - https://phabricator.wikimedia.org/T171203#3457322 (10fdans) [17:03:21] ottomata: you ok with deprioritising this? https://phabricator.wikimedia.org/T166937 [17:04:15] yes [17:11:36] 10Analytics: Investigate whether we could calculate "hourly unique devices" - https://phabricator.wikimedia.org/T163789#3883521 (10Milimetric) 05Open>03declined We don't have time for this now, but I'm documenting this idea in the Unique Devices docs, so as to not forget about it: https://meta.wikimedia.org/... [17:14:07] elukey: since i'm going to bounce namenodes, should I do server reboots for kernel too? [17:14:52] oh yes that would be awesome [17:15:37] 4.9.65-3+deb9u1~bpo8+2 is installed in an100[12] [17:15:40] so we are fine [17:15:55] please reboot whenever you like :) [17:27:51] Gone for diner, back after [17:30:13] ok cool, elukey will do then [17:35:15] ottomata: sorry to bother again :( there is still a screen session (SCREEN -S mysqlbackup) on an1003 [17:35:25] it is alarming in icinga [17:36:08] hm ok! [17:36:23] gone [17:38:42] https://twitter.com/MaziyarPanahi/status/950319408710410240 [17:41:03] wow [17:41:04] --> https://multivac.iscpif.fr/ ("Please do not ask for any DATABASE DUMP nor THE ENTIRE DATASETS! :-)") [17:41:10] haha [17:43:04] wow indeed! [17:47:24] * elukey off (after the ops meeting) [17:58:42] hey ottomata something's weird with git fat on stat1005:/srv/deployment/analytics/refinery [17:58:51] when I do git status I get a bunch of errors [17:59:42] checking [18:01:52] milimetric: weird, i think its ok though [18:02:02] apparently git status needs to write temp files in the .git/fat/objects directory [18:02:06] but you aren't allowed to write there [18:02:16] but, it looks ok though [18:02:16] i think [18:02:17] k, cool, I'll just finish deploying then [18:02:25] thanks for checking [18:02:26] did you see that while deploying? [18:02:35] from scap? [18:02:41] no [18:02:50] I just did a git status in addition to checking the latest was pushed [18:11:48] hey joal, I left a note for you in the blog post draft re: updating the figshare entry and Meta page [18:12:09] check it out and let’s think of possible options, I understand where you guys are coming from [18:14:53] in meetings in 15 mins but should be around otherwise [18:25:16] 10Analytics-Cluster, 10Analytics-Kanban: Support multi DC statsv - https://phabricator.wikimedia.org/T179093#3883770 (10Krinkle) [18:27:47] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3883799 (10DarTar) @leila thanks for pushing this forward. I am supportive of this request, I also believe this might deserve a bit of a broader discussion to understand data consumer needs (rings a bell, @Capt_Swing... [18:41:29] Hi DarTar, sorry I was away for diner: ) [18:48:12] (03PS7) 10Nuria: Replacing JSON download with CSV download [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) [18:48:46] (03CR) 10Nuria: Replacing JSON download with CSV download (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/401814 (https://phabricator.wikimedia.org/T183192) (owner: 10Nuria) [18:49:53] 10Analytics, 10Analytics-Wikimetrics, 10MediaWiki-Vagrant, 10Patch-For-Review: role::wikimetrics should compile into a catalogue without dependency cycles - https://phabricator.wikimedia.org/T184154#3883852 (10bd808) 05Open>03Resolved a:03bd808 "Fixed" by dropping the role. It would be awesome to hav... [18:50:34] DarTar: from a quik look at figshare, looks like no team-account exist [18:51:25] DarTar: Maybe best is to leave it as is from an author perspective and add some content about Analytics Team maintaining the regular releases in the description? [18:52:15] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3821170 (10Nuria) @DarTar neither analytics nor cloud team would be working on this, see above. This work is on @ArielGlenn 's backlog who is the ops engineer that supports dumps. [19:07:35] !log Deployed refinery and synced to hdfs [19:07:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:14:16] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2734568 (10Krenair) lucky you didn't go with nvidia: https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/ [19:28:53] milimetric, fdans: somehow funny http://www.commitstrip.com/en/2018/01/08/new-year-new-frameworks/?setLocale=1 [19:30:01] haha, 52 frameworks... yeah, I think I could come up with a list [19:30:08] sails.js!! [19:30:09] :) [19:30:19] :) [19:30:37] i’m going to make a framework called joal.js [19:31:07] fdans: please put some malicious code in there ;) [19:31:23] it’ll do the same things as Vue.js but all the components will have a beard [19:31:32] :D [19:31:36] fdans: I have a cold, I can't concentrate enough to do your review [19:31:42] sorry :( [19:31:47] I'll try tomorrow [19:31:54] fair enough! [19:32:27] joal: back from meetings, re: figshare it feels unfair but happy to oblige if that’s what you guys prefer. Shall I add a generic mention to the analytics team in the project template on Meta then? [19:33:19] DarTar: Works for me :) [19:33:33] nuria_: I’ll bring up the dump stuff briefly tomorrow during tech mgmt, it’s clear this is unplanned work, but I am trying to figure out where this could sit, specifically it should turn into a resource ask for FY19 [19:33:45] joal: cool [19:33:50] DarTar: The amount of work we have put for this is not comparable to the one Ellery or yourself have [19:37:09] joal: we tend to have a pretty inclusive notion of contributorship to datasets we release in Research. While the author piece is the least relevant metadata bit for the registry I’m in general interested in crediting everyone who’s made a contribution that’s not just one-off consultation. I’d be happy to list all of you guys otherwise if that’s an option, it sucks that figshare doesn’t support group/team authorship tbh [19:38:56] DarTar: It is a difference of view-points between research and engineering I think: it's not really important for us, and we don't feel we are authors, but I hear your point - The way you prefer is good :) [19:39:11] kk cool [19:40:05] milimetric: another interestin [19:40:09] milimetric: https://hackernoon.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5 [19:43:29] joal: did you write the section in the blog post draft with the descriptive stats about the graph? [19:43:43] or is that from bearloga? [19:44:09] DarTar: I wrote that - This is why is not as good as the rest :D [19:44:15] lol [19:44:35] I’ll ping you there with some comments then [19:47:15] joal, this is amazing. This dude invented a new kind of hacker: gray hat. The attack is so good and thoroughly thought out to evade detection [19:47:34] milimetric: the post is really good [19:47:55] omg “I have to do a search for my credit card numbers and usernames in case I’ve captured myself. Isn’t that funny!” [19:48:26] milimetric: I loved it to and kept me glued until the end where it says “This post is entirely fictional" [19:48:40] This is something I’ve thought of, and it’s terrifying: “perfectly possible to ship one version of your code to GitHub and a different version to npm.” [19:49:17] DarTar: this is why I love fiction, if done well it’s more real than reality [20:10:55] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - https://phabricator.wikimedia.org/T184475#3884167 (10Atsme) [20:31:22] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - https://phabricator.wikimedia.org/T184475#3884167 (10Milimetric) From the Wiki selector, you have to first type "Wikipedia" and then "English" (or press Enter to auto-complete when it's highlighting what you want). The selector is un-intuitive, we're working... [20:33:08] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - https://phabricator.wikimedia.org/T184475#3884221 (10Aklapper) a:05Atsme>03None @Atsme: Removing you as assignee as I don't think that you plan to fix this task. Please correct me if I'm wrong. Also, it is unclear where exactly (URL) and why you are unabl... [20:40:13] Gone for tonighjt a-team [20:42:06] 10Analytics, 10Analytics-Cluster, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3884249 (10Shilad) @dr0ptp4kt Thanks for chiming in, and I hope the leave was restful! My read on the AMD + tensorflow / keras situation is that support is movi... [20:44:30] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3884250 (10Nuria) Also, I think once dumps infrastructure is migrated to cloud "labs" probably @bmansurov can team up with @ArielGlenn and that might be the fastest way to get this done? [21:20:49] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#3884344 (10Ottomata) [21:21:13] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#3594088 (10Ottomata) [21:31:26] 10Analytics, 10Analytics-Wikistats, 10ORES, 10Scoring-platform-team: Discuss Wikistats integration for ORES - https://phabricator.wikimedia.org/T184479#3884392 (10awight) [22:18:11] 10Analytics, 10Puppet: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3884526 (10Krenair) [23:16:41] nuria_: if you are about https://phabricator.wikimedia.org/T184085 [23:16:58] this is asking for you to sign off on the access expiry extenstion by slaporte [23:17:10] im just trying to push it through as part of ops clinic duty, so checking you were aware? [23:17:11] =] [23:20:58] 10Analytics, 10Puppet: analytics VPS project puppet errors - https://phabricator.wikimedia.org/T184482#3884526 (10Andrew) I enabled puppet on druid-test02 and now it says: ``` Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error...