[07:09:11] hello yall [07:10:36] hola! [08:40:06] 10Analytics: Mediarequests Examples Giving Errors - https://phabricator.wikimedia.org/T241863 (10fdans) 05Open→03Resolved a:03fdans Just updated it with working examples [08:43:54] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - bot edits / new articles - https://phabricator.wikimedia.org/T241922 (10fdans) @FocalPoint Thanks for asking! Have you checked out the edits metric? https://stats.wikimedia.org/#/all-projects/contributing/edits/normal|line|2-year|editor_type~anonymou... [08:46:24] 10Analytics: Change link in wikis footer so that they point to stats.wikimedia.org - https://phabricator.wikimedia.org/T244961 (10fdans) p:05Triage→03High [08:47:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update the AMD ROCm prometheus metric exporter to take into account changes to rocm-smi - https://phabricator.wikimedia.org/T236007 (10elukey) 05Open→03Resolved [08:48:14] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Add request_bytes as measure in Druid's webrequest_sampled_128 - https://phabricator.wikimedia.org/T240681 (10elukey) 05Open→03Resolved [08:49:01] 10Analytics: Statement of work for new designer in wikistats - https://phabricator.wikimedia.org/T223478 (10fdans) This document is now in the Analytics Drive [08:49:15] 10Analytics, 10Analytics-Kanban: Statement of work for new designer in wikistats - https://phabricator.wikimedia.org/T223478 (10fdans) a:03fdans [08:51:11] 10Analytics, 10Operations, 10serviceops, 10vm-requests, and 2 others: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) ` elukey@ganeti2001:~$ sudo gnt-group list Group Nodes Instances AllocPolicy NDParams row_A 4 34 preferred ovs=False, ssh_po... [08:51:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Analytics datasets should be under a free license - https://phabricator.wikimedia.org/T244685 (10fdans) [08:58:41] 10Analytics, 10Operations, 10serviceops, 10vm-requests, and 2 others: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10MoritzMuehlenhoff) Does this really need 8 GB RAM and 8 CPUs? The machine that this will replace (kraz) uses a single CPU (and hardly uses it) and... [09:00:55] 10Analytics, 10Operations, 10serviceops, 10vm-requests, and 2 others: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) >>! In T244719#5875487, @MoritzMuehlenhoff wrote: > Does this really need 8 GB RAM and 8 CPUs? The machine that this will replace (kraz) u... [09:01:04] 10Analytics, 10Operations, 10serviceops, 10vm-requests, and 2 others: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) [09:09:41] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10Krenair) >>! In T234234#5875356, @elukey wrote: > 4) About the low usage of irc.wikimedia.org - yes I agree that few bots are using it (~300) Am I going mad or isn't that actua... [09:14:45] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10elukey) >>! In T234234#5875512, @Krenair wrote: >>>! In T234234#5875356, @elukey wrote: >> 4) About the low usage of irc.wikimedia.org - yes I agree that few bots are using it (... [09:18:00] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) ` elukey@cumin1001:~$ sudo cookbook sre.ganeti.makevm codfw_B --link public --memory 8 --disk 40 --vcpus 4 irc2001.wikimedia.org START... [10:05:43] 10Analytics: Create a Kerberos identity for foks - https://phabricator.wikimedia.org/T244773 (10elukey) ` elukey@krb1001:~$ sudo manage_principals.py create foks --email_address=jsutherland@wikimedia.org Principal successfully created. Make sure to update data.yaml in Puppet. Successfully sent email to jsutherla... [10:11:22] 10Analytics: Create a Kerberos identity for foks - https://phabricator.wikimedia.org/T244773 (10elukey) 05Open→03Resolved a:03elukey [10:36:36] FYI, there's a disk space icinga warning for notebook1004 for /srv [10:40:33] sigh [10:40:37] thanks! [10:40:40] will check in a sec [11:27:20] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10elukey) Ok current status: * irc2001.wikimedia.org is running * puppet is set to role::system::spare, waiting for a new role/cluster combinati... [11:27:52] * elukey lunch! [11:28:27] fdans: qq - are you doing something with oozie + mediarequest? [11:30:27] anyway, nothing horribly urgent, will check later :) [11:38:41] elukey: yes! backfilling of daily top mediarequests [13:22:26] Hi team - I just joined as kids are asleep - There is something wrong with oozie [13:26:13] oozie-lib referenced in jobs is different from the one present on HDFS [13:26:27] /user/oozie/share/lib/lib_20200204183338 [13:26:46] in hdfs [13:26:53] while job expect: /user/oozie/share/lib/lib_20191216144244 [13:27:15] I really have no clue why it happens now [13:36:07] !log Kill-restart webrequest bundle to see if it mitigates the error [13:36:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:38:37] Restart of jobs doesn't mitigate the problem - Waiting for elukey to try to shake oozie [13:48:05] ottomata: sorry to rush you but I could do with some ops help [13:48:14] oh ok whats up? [13:48:31] oozie is flipped - Can't find its lib [13:48:43] I have checked folder, and indeed the expected one is not there [13:49:05] the oozie sharelib? [13:49:08] yes [13:49:27] I have no clue why it started today [13:49:48] Last lib has been created 2020-02-04 [13:49:49] oozie admin -shareliblist seems ko? [13:49:49] ok [13:49:50] no? [13:50:55] it does, but jobs fail complaining about files not present, and it seems because of folders not present [13:51:01] hello, just got back :) [13:51:06] heya elukey [13:51:17] hmmm [13:51:25] there is one lib folder....but it says 20200204 ... [13:51:26] Can we try to restart oozie? [13:51:30] why is there a receent one from th is monoth? [13:51:46] sorry don't get it --^ [13:51:55] there is a single oozie share lib in hdfs [13:51:59] created on feb 4 [13:51:59] yes [13:52:03] I have seen hat [13:52:09] +t [13:52:10] did we make a new sharelib thi smonth? [13:52:16] I can't recall! [13:52:25] I actually don't think [13:52:41] yeah i wouldn't expect us to [13:52:43] I restarted oozie for the spark-env changes, wondering if it made any change [13:52:46] we usually only do that if we upgrade something [13:52:47] when was it? [13:53:00] elukey last tues feb 4 [13:53:24] that matches with the date on the lib dir [13:53:48] right [13:54:02] heh, that's what i'm saying, a new one was created then [13:54:10] we know because the sahrelib dirs are named after their creationdate [13:54:15] lib_20200204183338 [13:54:20] yes yes [13:54:25] This is weird [13:54:33] but does oozie create a new shared lib when restarted? [13:54:36] no [13:54:44] puppet will do it if [13:54:45] unless => '/usr/bin/hdfs dfs -ls /user/oozie | grep -q /user/oozie/share', [13:55:00] yes I meant if puppet does it after oozie is restarted [13:55:10] mmmm [13:55:40] no it shoudln't [13:56:11] I am not saying it should, but all points in that direction [13:56:19] also folks, from SAL, we restarted oozie on 2020-02-03, but not 04 [13:56:47] hm yeah [13:56:48] File does not exist: hdfs://analytics-hadoop/user/oozie/share/lib/lib_20191216144244/hive2/libfb303-0.9.3.jar [13:56:49] nothing seems to have happened particuarly on 04 [13:56:51] very weird. [13:58:14] and there is nothing on the 4th [13:58:45] joal can I just rerun one of these webrequest load jobs to try stuff? [13:58:47] i want to repro [13:58:54] then i'm going to run sharelib update and see if anything changes [13:59:10] sure ottomata, I tried to restart the webrequest bundle a while back, didn't work [13:59:21] Do ou want me to do it again? Or a restart is enough? [13:59:46] hm [13:59:49] i guess yeah hm [13:59:54] the job needs restarted if the sharelib changes? [14:00:05] ok [14:00:09] i'll just run the sharelibupdate [14:00:11] then you restart bundlke [14:00:12] one sec [14:00:17] sure [14:00:58] hm, FYI something new [14:01:04] This request requires HTTP authentication. [14:01:05] when [14:01:17] in the future we'll need to change how we updatesharelib [14:01:23] whoha [14:01:45] e.g. when we auto add spark2 sharelib after spark upgrade, we use the REST api to update the sharelib, because the CLI had been flaky [14:01:47] trying the CLI... [14:02:06] yup [14:02:09] [ShareLib update status] [14:02:09] sharelibDirOld = hdfs://analytics-hadoop/user/oozie/share/lib/lib_20191216144244 [14:02:09] host = http://an-coord1001.eqiad.wmnet:11000/oozie [14:02:09] sharelibDirNew = hdfs://analytics-hadoop/user/oozie/share/lib/lib_20200204183338 [14:02:09] status = Successful [14:02:11] hm [14:02:15] we might not even neeed a job resetart? [14:02:21] goign to just rerun an individual hour [14:03:31] one thing that I noticed now is that /usr/bin/hdfs dfs -ls /user/oozie | grep -q /user/oozie/share may run even if there is a temporary issue with the hdfs command no? [14:03:40] like a network timeout etc.. [14:03:50] yeah i tmight... [14:03:54] hm [14:04:44] maybe we could create a script that execs every time, doing the "unless" check in bash with some safe guards [14:05:02] heh, w don't have syslogs from feb 4 to find out [14:05:14] WAT? [14:05:45] oh, we only ave a week of them joal that's all [14:05:50] Ah ok [14:06:05] looks better [14:06:06] https://hue.wikimedia.org/oozie/list_oozie_workflow/0011413-200203112045319-oozie-oozi-W/?coordinator_job_id=0006212-200110143753542-oozie-oozi-C&bundle_job_id=0006211-200110143753542-oozie-oozi-B [14:06:10] it's weird though that the thing only bites us now, isn't it? [14:06:14] very weird [14:06:31] ok, back in the game [14:06:34] elukey: this command only ever should run the first time oozie is instatlled [14:06:44] I'm gonna manually restart failed jobs [14:07:13] ottomata: so you suggest to just move it in oozie's docs, rather than in puppet? [14:07:21] yeah maybe [14:07:34] could be an option yes [14:07:58] althought we also do the db creete stuff too [14:08:01] kinda nice to have it all done [14:08:42] we could wrap those into some bash scripts, with more guards [14:08:46] yeah [14:09:02] I'd vote for this option, and then if it doesn't work we remove them [14:09:07] hm we could also just add another command to the unless [14:09:15] unless oozie admin -shareliblist | grep ... [14:09:50] wonoder what that retval is with no sharelib. [14:10:30] I mentioned the bash script since we could use set -x and have it abort if the retval is not zero [14:10:46] ottomata: you unintentionally restarted the coordinator I killed: ) [14:10:55] ? [14:10:55] Will kill it anew, and rerun the new one [14:11:04] I killed https://hue.wikimedia.org/oozie/list_oozie_coordinator/0006212-200110143753542-oozie-oozi-C/ [14:11:22] and restarted https://hue.wikimedia.org/oozie/list_oozie_bundle/0011398-200203112045319-oozie-oozi-B [14:11:23] oh [14:11:27] you killed the whole thing [14:11:33] ok sorry i just went from a oozie hue url [14:11:34] ok [14:11:42] By rerunning an action from the killed bundle, it restarted the coord :0 [14:11:43] ya restart all new ones, they should work now [14:11:47] ack [14:11:49] sorry [14:11:53] Will kill again the old and rerun new [14:11:54] elukey: sure that sounds good too [14:11:55] np :) [14:12:49] ottomata: helloooo there's a lil issue with the v2 old link [14:12:53] yes? [14:13:05] this works https://stats.wikimedia.org/ [14:13:09] but this doesn't https://stats.wikimedia.org [14:13:11] 10Analytics: Request for Kerberos identity for fsalutari - https://phabricator.wikimedia.org/T245024 (10Fsalutari) [14:13:15] sorry add v2 [14:13:30] this works https://stats.wikimedia.org/v2/ [14:13:30] but this doesn't https://stats.wikimedia.org/v2 [14:16:29] hm [14:21:49] good catch fdans [14:21:49] https://gerrit.wikimedia.org/r/c/operations/puppet/+/571726/1/modules/statistics/templates/stats.wikimedia.org.erb [14:21:51] should do it [14:22:48] ottomata: oh cool, and that addition catches both cases then [14:23:45] joal: if you need a hand with the jobs let me know [14:25:08] almost done elukey [14:25:39] 10Analytics: Request for Kerberos identity for fsalutari - https://phabricator.wikimedia.org/T245024 (10elukey) ` elukey@krb1001:~$ sudo manage_principals.py create fsalutari --email_address=flavia.salutari@telecom-paristech.fr Principal successfully created. Make sure to update data.yaml in Puppet. Successfully... [14:27:46] huh, the oozie admih -shareliblist just calls the REST API via java [14:28:50] PROBLEM - yarn.wikimedia.org HTTPS on analytics-tool1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [14:30:01] wow [14:30:29] I guess it's me overloading hue through restarts [14:30:53] weird, lemme check [14:31:00] fun day :D [14:33:26] !log restart hue on analytics-tool1001 [14:33:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:34:00] better now [14:34:19] Thanks elukey [14:34:30] RECOVERY - yarn.wikimedia.org HTTPS on analytics-tool1001 is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.009 second response time https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [14:34:47] didn't check all the logs but requests were piling up due to hue [14:34:58] elukey: my bad - sorry :( [14:35:09] will try to go gently [14:35:11] 10Analytics, 10Patch-For-Review: Request for Kerberos identity for fsalutari - https://phabricator.wikimedia.org/T245024 (10elukey) 05Open→03Resolved a:03elukey Please re-open if anything is missing! [14:37:12] PROBLEM - Hue CherryPy python server on analytics-tool1001 is CRITICAL: PROCS CRITICAL: 2 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [14:37:41] hm i suspect ooozie shareliiblist always returns 0 [14:37:42] :/ [14:38:07] hmm maybe not [14:38:41] what the hell hue [14:39:08] ok I think I have restarted everything that was wrong - Will drop as Lino is awake - See y'all at standup [14:39:36] so hue's init scripts are so great that a restart leaves 2 processes running (old and new) [14:39:39] sigh [14:39:41] just killed the old one [14:40:30] RECOVERY - Hue CherryPy python server on analytics-tool1001 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [14:40:46] heh yeah it does, need to parse i guess. [14:42:21] hm, actually....we can't run oozie shareliblist...the oozie server has to be running or that, and on install at this point it isn't yet. [14:46:10] hmmm elukey i think our hypothosis of what when wrong isnt' right. [14:46:18] that exec does [14:46:19] require => [Cdh::Hadoop::Directory['/user/oozie'], File['/usr/bin/oozie-setup']] [14:46:26] and those always issue hdfs dfs -ls commands [14:46:38] so at least right before the unless command ran [14:46:43] hdfs dfs -ls succeeded [14:47:00] so, unless there was a really quick issue in between those [14:47:09] i don't see how that could happen [14:47:11] and, if there was [14:47:15] a bash wrapper wouldn't help [14:47:21] same thing could happen there [14:48:07] well a timeout could have happened with the hdfs dfs in unless, not really impossible.. and with a bash script we should gather the output of hdfs ls etc.., and then check it, with set -x. In this way it shouldn't fail [14:48:17] sorry the same thing shouldn't happen [14:48:44] the other alternative is to restart oozie now in say hadoop test, and see what happens [14:48:48] maybe we can repro there [14:49:00] (just to rule out the restart event) [14:49:30] i think a bash script would fail in the same way if the problem was hdfs dfs -ls failling very temporarily [14:49:39] puppet ends up running in success [14:51:26] # for require [14:51:26] hdfs dfs -test -e /user/ooozie [14:51:26] # for unless [14:51:26] hdfs dfs -ls /user/oozie | grep -q /user/oozie/share [14:51:26] # then if unless returns 1 [14:51:26] oozie-setup sharelib create [14:51:28] hdfs dfs -ls would return a non zero exit code, and set -x would abort. Maybe not using unless [14:51:45] right, but so should puppet. [14:52:09] yes ok we can change puppet as well [14:52:15] no i mean right now as is. [14:52:19] this exec should not run if [14:52:27] hdfs dfs -test -e /user/oozie fails [14:52:39] we grep afterwards no? [14:52:46] the first check is the require [14:52:50] require => [Cdh::Hadoop::Directory['/user/oozie'] [14:52:58] which runs hdfs dfs -test [14:53:02] if that fails, the exec won't run [14:53:05] since its preq fails [14:53:25] yes but nothing prevents a timeout to happen in the exec after the require was ok [14:53:26] prereq* [14:53:37] true, but that is true for hte bash script too [14:53:38] they are separate things [14:54:00] it is just a series of commands run in succession, checking retvals [14:54:05] which is what a bash wrapper would do too [14:54:44] if hdfs dfs etc.. fails, its ouput does not contain any /user/oozie/share etc.. and grep would then return 1 [14:54:50] it is not the same as a bash script [14:55:11] because we'd need to do more checks in there, not a simple | [14:55:14] this is my point [14:56:30] hm, ah i think i see, you want to catch the potential failure of the exact unless hdfs ls before the grep, and prevent to run sharelib update if it fails...ok sorry i get it [14:56:59] it seems so unlikely to me that this is what happened; the require succeeded bur the hdfs dfs -ls right after failed [14:57:31] yes I agree, mine was only an idea to rule out corner cases.. it didn't happen before, so it must be some weird corner case [15:01:49] tried to restart oozie in test and run puppet, can't repro (just to rule out this) [15:30:52] so in hadoop test I just moved the namenodes to 2.8.5 as part of the rolling upgrade procedure [15:30:55] so far nothing explodes [15:31:04] the datanodes are the next ones [15:51:23] 10Analytics, 10Wikidata, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Add time limits to scripts executed on stat1007 as part of analytics/wmde/scripts - https://phabricator.wikimedia.org/T243894 (10Rosalie_WMDE) a:03Rosalie_WMDE [15:56:09] ah lovely if I upgrade yarn together with hdfs there is a problem [15:56:35] PROBLEM - Zookeeper Alive Client Connections too high on an-conf1001 is CRITICAL: 1091 ge 1024 https://wikitech.wikimedia.org/wiki/Zookeeper https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [15:56:49] ourch [15:58:11] stopped all the new daemons [15:58:30] oh oofo [15:58:31] what the hell [15:59:04] ok it is going down [15:59:32] RECOVERY - Zookeeper Alive Client Connections too high on an-conf1001 is OK: (C)1024 ge (W)512 ge 0 https://wikitech.wikimedia.org/wiki/Zookeeper https://grafana.wikimedia.org/dashboard/db/zookeeper?refresh=5m&orgId=1&panelId=6&fullscreen [15:59:37] I have a meeting now, will check in a bit [15:59:40] sigh [16:03:56] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Patch-For-Review: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10Ottomata) > since the plan was to switch the datatype back to array eventually. FYI, if you need a new da... [16:50:55] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10mpopov) [16:52:22] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10mpopov) [16:58:12] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10elukey) Me and Joseph tried a lot to make this work, but we think that the solution might be so have kerberos auth where datagrip runs (so laptop or own pc), tha... [17:01:19] :S [17:23:42] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Patch-For-Review: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10AndyRussG) >>! In T244771#5877159, @Ottomata wrote: > FYI, if you need a new datatype, you should just ma... [17:27:11] 10Analytics, 10Analytics-Cluster: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10RobH) So to put some of the figures I just posted in IRC about this: In eqiad 10G racks, we have the following port totals using SFP-T (and thus using 1G in a 10G rack): row a: 64, row b: 33, ro... [17:30:02] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10mpopov) Aw, bummer :( thank you so much for trying though! [17:46:26] 10Analytics, 10Analytics-Kanban, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10akosiaris) >>! In T238658#5830499, @Ottomata wrote: > @akosiaris I just did a bit of benchmarking in staging. As I added mor... [17:53:29] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Patch-For-Review: Refining is failing to refine centranoticeimpression events - https://phabricator.wikimedia.org/T244771 (10Nuria) @Ottomata we keep 90 days of raw data right? If so i vote from dropping all rerfined data and re-re... [17:55:53] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10kzimmerman) >>! In T245040#5877446, @elukey wrote: > Me and Joseph tried a lot to make this work, but we think that the solution might be so have kerberos auth w... [18:33:54] going off, for the moment the hadoop test cluster is ok [18:34:08] zookeeper seems quiet, and puppet disabled on the hadoop nodes that I am working on [18:34:24] BUT, in the unlikely event that anything explodes, stop all java daemons on [18:34:29] ok! [18:34:32] analytics1028/1029/1031 [18:34:55] ottomata: it was yarn causing that mess with zookeeper :( [18:35:08] will try to sort it out tomorrow [18:35:09] sigh [18:35:16] u da best ! [18:35:52] o/ [18:42:27] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10Dzahn) ` Debug: Augeas[ens5_v6_token](provider=augeas): sending command 'set' with params ["/files/etc/network/interfaces/iface[. = 'ens5']/pre... [18:46:36] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10Dzahn) The primary network interface is missing from /etc/network/interfaces. There is only loopback in there. Why that is is another question.... [18:46:45] 10Analytics, 10Analytics-Cluster: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10Ottomata) What matters most for us in terms of row placement is an even-ish spread. Hm, 10 of the nodes we are replacing are in Row B. We also currently only have 9 hosts in row C anyway, so pe... [18:48:58] 10Analytics, 10Analytics-Kanban, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) Ok thanks! Will try that! [19:16:31] milimetric: Heya - would you have aminute ? [19:20:44] joal: in 2 min! [19:20:51] sure milimetric :) [19:23:57] 10Analytics: Spike [2019-2020]. GPU enabled computations. How to do that best - https://phabricator.wikimedia.org/T217367 (10Nuria) 05Open→03Declined [19:24:56] ok all yours joal [19:25:00] \o/ [19:25:05] To the cave :) [19:39:58] 10Analytics, 10Operations, 10serviceops, 10vm-requests, 10User-Elukey: Create a replacement for kraz.wikimedia.org - https://phabricator.wikimedia.org/T244719 (10MoritzMuehlenhoff) Given that Luca also had an error during initial setup related to name resolution, this sounds like some error related to th... [19:52:58] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10JAllemandou) Thinking of the future, if we decide presto is the way to go for analysts, [[ https://github.com/airbnb/airpal | Airpal ]] seems a good candidate. I... [20:35:10] 10Analytics, 10Analytics-Kanban, 10serviceops, 10Patch-For-Review: Clarify multi-service instance concepts in helm charts and enable canary releases - https://phabricator.wikimedia.org/T242861 (10Ottomata) > Should we use main_app.name instead of service.name? I think yes is the answer. I just updated [[... [21:48:12] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - bot edits / new articles - https://phabricator.wikimedia.org/T241922 (10FocalPoint) @fdans thank you, indeed, not exactly the same, but with a bit of processing, I may get something similar to what I was looking for. The tables of Wikistats 1 seem ted... [22:22:58] 10Analytics, 10Multimedia, 10Tool-Pageviews: Allow users to query mediarequests using a file page link - https://phabricator.wikimedia.org/T244712 (10Nuria) >Personally I think you should just use {project}/{filename} That is what I am proposing but I think "project" is not the right term. In the wikimedia e... [22:33:29] 10Analytics, 10Analytics-Kanban, 10serviceops: Clarify multi-service instance concepts in helm charts and enable canary releases - https://phabricator.wikimedia.org/T242861 (10Ottomata) Ok, applied for staging eventgate-analytics. I think it works! First, because the 'analytics' release already existed, I... [23:29:30] 10Analytics, 10Product-Analytics: Request for instructions for using DataGrip in the Kerberos paradigm - https://phabricator.wikimedia.org/T245040 (10Nuria) i think airpal is going to be the future hue, ya. [23:32:14] thanks elukey for setting up my kerberos auth. still having trouble running hive queries though :( [23:32:25] "The number of live datanodes 8 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached." [23:32:41] oh, it's this "Cannot create directory /tmp/hive/foks. Name node is in safe mode." [23:39:33] 10Analytics, 10Product-Analytics: Develop a consistent rule for which special pages count as pageviews - https://phabricator.wikimedia.org/T240676 (10kzimmerman) p:05High→03Medium a:05nshahquinn-wmf→03None The scope of this is (increasingly) extensive and requires changing existing definitions; current... [23:48:32] foks: mmm, what queries are you running? [23:49:14] nuria: I'm running one against wmf.webrequest [23:49:28] I'll be honest, I have not a whole lot of idea what I am doing [23:49:28] foks: can you paste query here? [23:49:55] It's Legal-related so I'll replace what I'm running it for with foobar, but sure [23:50:13] foks: i think reading docs before using cluster might help, querying petabytes of data is not an intutive thing [23:50:20] foks: let me see [23:50:20] yes that is fair [23:50:23] hive -e "use wmf; select dt, ip, client_ip, uri_host, uri_path, uri_query, agent_type, pageview_info['page_title'] as page_title, page_id, namespace_id, year, month, day, hour from wmf.webrequest where year=2020 and month=2 and day=3 and is_pageview=true and uri_host in ("fr.wikipedia.org", "fr.m.wikipedia.org") and uri_path="/wiki/Foobar" order by dt, ip limit 1000000000" > ./stat-legal-fr-2020-02-10.tsv [23:50:39] I will probably also narrow by hour [23:50:52] foks: narroring by hour 1st would help [23:51:02] nod [23:52:01] foks: let me rerun your query [23:52:05] I actually only need ten minutes of data [23:52:20] nuria: let me PM you