[07:06:10] holaaa [08:36:01] Hi nuria - Kids day for me, will be there a siesta time and this evening [09:24:41] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Jupyter-Hub: Trouble accessing Jupyter Lab - https://phabricator.wikimedia.org/T231365 (10Dzahn) Icinga told us that the systemd state on both notebook1003 and notebook1004 was degraded. Looking at the service that failed i saw: ` ● jupyter-iflorez-si... [09:29:10] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Jupyter-Hub: Trouble accessing Jupyter Lab - https://phabricator.wikimedia.org/T231365 (10Dzahn) on notebook1004 there was only 1 failed service, the one for @ebernhardson which i started. That made Icinga alerts recover. On notebook1003 though starting... [10:45:55] hi nuria - would you be here by any chance? [11:00:09] (03PS1) 10Joal: Correct geoditors-yearly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) [11:03:51] joal: ya, jelou [11:06:10] nuria: I'm wondering about how to provide CRs for SLAs and hive actions correction for queues - One per workflow (means a lot of CR), or one global (means a big one) ? [11:09:15] nuria: I'd prefer do it all at once (1 CR for all SLAs, and 1 for all queue corrections) - Would that work for you? [11:12:29] joal: i think so, more than CR issue would be re-starting jobs no? [11:12:55] joal: but we can do a couple and let them bake, do another couple... [11:12:58] restarting jobs will be cumbersome but not very hard [11:13:27] (03CR) 10Nuria: [C: 03+2] Correct geoditors-yearly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [11:13:39] nuria: restarting by bundles can be done - Changing all of them (code) makes it easier in my mind (less follow up) [11:14:02] (03CR) 10Nuria: [C: 03+2] "Have we tested job (even with another time interval) to verify output?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [11:14:19] nuria: currently doing that --^ :) [11:18:25] (03PS2) 10Joal: Correct geoditors-yearly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) [11:23:42] hellooo teammm [11:23:46] Hi mforns :) [11:24:08] hm - unexpected issue: we don't have a 'country_info' table [11:25:00] Shall I write results with country code as provided in the original geoeditors table, or do we want to generate a country_info table ? [11:25:35] IIRC there were discussions around creating such a table (Francisco and Dan mentionned it I think), but I have not found it in wmf database [11:31:46] (03CR) 10Mforns: [C: 03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [12:00:52] joal: related to that, we are maintaining a table of country codes, names, regions, and Global South/North classification, as well as a similar table for wikis, in the canonical_data database in Hive (see also https://github.com/wikimedia-research/canonical-data). I've felt for a while that we should upstream those into datasets that y'all maintain; maybe we could with help you write the logic :) [12:05:48] Awesome neilpquinn :) [12:05:54] I'll use that! [12:16:45] 10Analytics, 10Analytics-Kanban: Set up automatic deletion for netflow datasource in Druid - https://phabricator.wikimedia.org/T229674 (10mforns) @ayounsi Great, thanks. I'm not sure if we can change the granularity of the data within a single data set, say have the latest 3 months be minutely, and the rest b... [12:19:29] (03PS3) 10Joal: Refactor geoditors-yearly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) [12:21:01] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10mforns) @Nuria, who would be responsible for migrating netflow data set to be ingested into the event pipeline? @ayounsi, I understand that once the netflow ingestion is migrated to the event pipel... [12:30:00] (03CR) 10Mforns: "LGTM, let's fix the jenkins problem, and merge!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/531148 (https://phabricator.wikimedia.org/T230514) (owner: 10Fdans) [12:32:21] 10Analytics, 10Analytics-Kanban: Turnilo: Remove count metric for edit_hourly data cube - https://phabricator.wikimedia.org/T230963 (10mforns) @MNeisler the count metric was removed from Turnilo, let me know if there's any problems. Thanks! [12:33:00] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10Nuria) @mforns: We can do it but let's tackle that once we have done the dataset releases we have as high priority for this quarter. Does that sound good? @ayounsi on your end this means that any m... [12:33:38] 10Analytics, 10Analytics-Kanban: Turnilo: Remove count metric for edit_hourly data cube - https://phabricator.wikimedia.org/T230963 (10Nuria) 05Open→03Resolved [12:34:07] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Jupyter-Hub: Trouble accessing Jupyter Lab - https://phabricator.wikimedia.org/T231365 (10Ottomata) Sorry, was talking with Iflorez in IRC last night and didn't update here. My upgrade (and downgrade) clearly had some problems. I'm working on this. [12:34:13] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Jupyter-Hub: Trouble accessing Jupyter Lab - https://phabricator.wikimedia.org/T231365 (10Ottomata) a:03Ottomata [12:34:22] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10mforns) @Nuria > We can do it but let's tackle that once we have done the dataset releases we have as high priority for this quarter. Does that sound good? Sure! Makes sense. [12:38:20] (03PS4) 10Joal: Refactor geoditors-yearly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) [12:39:20] (03CR) 10Joal: [V: 03+2] "Tested on cluster by changing dependencies and time interval." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [12:39:43] nuria, mforns - Sorry there have ben some small modifications --^ :S [12:39:52] k lookin! [12:39:59] Thanks mforns :) [12:41:47] 10Analytics, 10Patch-For-Review: Refactor quenename into HQL hive2 action oozie jobs - https://phabricator.wikimedia.org/T231002 (10JAllemandou) [12:47:02] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Jupyter-Hub: Trouble accessing Jupyter Lab - https://phabricator.wikimedia.org/T231365 (10Ottomata) I think the issue had to do with the pip install command I was using to upgrade/rollback user venvs. I was using --ignore-installed instead of --force-re... [13:05:24] (03CR) 10Mforns: "LGTM, left one coment, but not critical." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [13:42:41] (03PS3) 10Joal: Update oozie job for yarn queue to work [analytics/refinery] - 10https://gerrit.wikimedia.org/r/531682 (https://phabricator.wikimedia.org/T231002) [13:44:25] (03CR) 10Joal: [V: 03+2] Refactor geoditors-yearly job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [13:44:53] (03CR) 10Joal: "Sorry for the awefull CR :(" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/531682 (https://phabricator.wikimedia.org/T231002) (owner: 10Joal) [13:53:26] (03CR) 10Mforns: Refactor geoditors-yearly job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [14:07:29] (03PS5) 10Joal: Refactor geoditors-yearly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) [14:08:23] (03CR) 10Joal: [V: 03+2] Refactor geoditors-yearly job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [14:19:20] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! I saw you Verified+2 before, so adding that as well. Will deploy today." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532974 (https://phabricator.wikimedia.org/T215655) (owner: 10Joal) [14:19:58] Thanks mforns --^ :) [14:20:16] thanks joal for the change :], I will deploy that today, right now I'm reviewing the other patch (queue_name) [14:20:28] that one is really not nice :( [14:20:51] I will deploy that today as well [14:21:17] and there will be lots of oozie jobs to restart :SSSSS [14:21:27] all of them almost :( [14:21:47] mforns: Shall we wait for the SLAs patch, so that we restart all of them once, and not twice? [14:21:58] oh.. yes [14:22:04] so no deploy today for that? [14:22:35] mforns: I won't have the CR for today - I probably can have it for tomorrow [14:22:50] joal, should I shift the train to tomorrow? [14:22:57] 2 solutions: deploy today without queue, or deploy tomorrow with queue and SLA [14:23:02] I don't mind too much [14:23:15] joal, I can help with the SLA patch [14:25:00] mforns: and try to have it this evening? [14:25:38] I think it's less stress if we do it for tomorrow, and deploy it next week (or tomorrow) with restarts [14:25:52] mforns: If we rush, it's error prone [14:26:01] joal, no need to hurry I think! I'm feeling a bit blraff today, so prefer shifting deployment to tomorrow and go slowly [14:26:24] works for me mforns - I'll provide the patch with SLAs tomorrow end of morning so you can review it? [14:26:24] I meant helping to have it tomorrow [14:26:39] mforns: reviewing those is actually worse than writing them :D [14:26:44] 'course, I can review [14:26:56] xDDD no, it's OK [14:27:20] OK, then shifting refinery deploy to tomorrow [14:29:06] Thanks mforns - I'll help tomorrow after deploy for the jobs restarts [14:29:49] ok :D [14:34:33] 10Analytics, 10Analytics-SWAP, 10Product-Analytics: Provide Python 3.6 on SWAP - https://phabricator.wikimedia.org/T212591 (10Neil_P._Quinn_WMF) 05Resolved→03Open >>! In T212591#5432055, @elukey wrote: > As FYI we have now Python3.7 + libpython3.7 on notebooks: > > Caveat: since those nodes are still Str... [14:38:55] (03CR) 10Nuria: [C: 03+2] "So, all jobs are running on default queue now, right?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/531682 (https://phabricator.wikimedia.org/T231002) (owner: 10Joal) [14:39:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refactor quenename into HQL hive2 action oozie jobs - https://phabricator.wikimedia.org/T231002 (10Nuria) [15:25:51] ottomata: hello, yt? [15:26:09] hello yes! [15:29:10] nuria: wassup? [15:30:34] ottomata: me no compredou but "eventlogging_filter_is_allowed_hostname" (transform function in refine) [15:30:51] ottomata: is only called per partition, not per record? [15:31:03] ottomata: we can talk after standup [15:31:07] should be per record [15:31:12] it is a dataframe filter [15:31:17] wait checking... [15:31:21] (before responding..) [15:31:48] yes dataframe filter [15:31:54] just like an SQL where clause [15:32:26] PartitionedDataFrame is just a wrapper we made up to carry hive partition info along with the dataframe [15:33:19] (03CR) 10Mforns: [C: 03+1] "LGTM as well! The ones in archive_job_output are removed because they were not supposed to be there anyway right? Because it's just a bash" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/531682 (https://phabricator.wikimedia.org/T231002) (owner: 10Joal) [15:36:50] ottomata: let me put more debugging and check [15:41:32] (03PS4) 10Fdans: Change partition structure to year/month/day/hour. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532725 (https://phabricator.wikimedia.org/T229817) [15:44:13] (03PS5) 10Fdans: Change partition structure to year/month/day/hour. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532725 (https://phabricator.wikimedia.org/T229817) [15:47:30] 10Analytics, 10Analytics-SWAP, 10Product-Analytics, 10Patch-For-Review: Upgrade all SWAP users to JupyterLab 1.0 - https://phabricator.wikimedia.org/T230724 (10Ottomata) So, after more investigation, I think the reason my upgrade to 1.0.9 failed was because I was upgrading user venvs with the --ignore-inst... [15:49:55] Hello @ottomata, I'm online and ready to test [15:50:14] ok iflorez [15:50:23] so does jupyterlab work fine for you right now on 1004? [15:53:19] yes, I'm on JupyterLab on notebook4. So far so good. [15:53:31] ok, i'm going to upgrade your jupyterlab and restart your server [15:53:41] ok [15:54:27] ok try it now iflorez [15:55:42] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Product-Analytics, 10Patch-For-Review: Upgrade all SWAP users to JupyterLab 1.0 - https://phabricator.wikimedia.org/T230724 (10Ottomata) [15:57:35] yes, I'm on JupyterLab on notebook4. So far so good. [15:57:55] greattt [15:58:52] hoorah! [15:59:08] shall I test notebook3? [15:59:38] (03CR) 10Joal: "Thanks a lot for the review :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/531682 (https://phabricator.wikimedia.org/T231002) (owner: 10Joal) [15:59:55] ok lemme upgrade there. [16:00:51] ok try it on 1003 now iflorez [16:01:33] nuria: mforns standup! :) [16:02:40] On notebook3: [16:02:40] "500 : Internal Server Error [16:02:40] Failed to start your server. " [16:03:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Refactor quenename into HQL hive2 action oozie jobs - https://phabricator.wikimedia.org/T231002 (10JAllemandou) a:03JAllemandou [16:05:03] @ottomata, I wonder if the 500 error is related to the 503 message I received when I last tested yesterday? Shall I restart the server as prompted? [16:06:22] yes [16:08:12] fdans: there currently are 2 mediarequests jobs running, one from prod and one from your user - I guess you're testing and the alert is from that? [16:08:45] joal: yes, and it's making me miserable :) [16:25:11] iflorez: is it working? [16:26:34] no [16:26:36] hmm same problem! [16:27:23] yeah same issue with npm/node dependency parsing. [16:27:29] interesting [16:33:04] It is interesting that this time it is the opposite notebook that is having the issue. Yesterday, at the end of the day, and before the rollback, notebook3 was running but notebook4 was not. This morning, notebook4 is running and notebook3 is not. [16:34:07] joal: found the GII e-mails, but they are not super clear, i have forwarded you the thread , i think only edits to wikipedia are counted but there is no talk of anonymous versus not [16:34:33] ok nuria - thanks for the forward, will ask leila :) [16:34:40] joal: so to your point about wikidata, it is not included [16:35:20] and uxses wikipedia only, looks we're good on that side nuria :) [16:35:47] joal: ok! [16:45:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month/year - https://phabricator.wikimedia.org/T215655 (10JAllemandou) Ping @leila for a question on data definition. For the GII report we are counting edits: - By coutry - for all namespaces and namespace 0 only, - f... [16:48:18] (03PS6) 10Fdans: Change partition structure to year/month/day/hour. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/532725 (https://phabricator.wikimedia.org/T229817) [17:03:51] yeah very strange iflorez [17:32:52] iflorez: i downgraded and restarted your server on 1003 [17:32:54] is it ok now? [17:33:50] yes, JupyterLab on notebook3 is running [17:33:55] very strange [17:33:55] ok [17:34:43] JupyterLab on notebook4 is running [17:41:49] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10ayounsi) Makes sens as well! Out of curiosity, what would happen if the schema is incorrect? (forgot to update, typoed, etc.) [17:44:46] oh joal sorry forgot to ping you after meeting [17:44:48] am here if you still are! [17:57:56] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Product-Analytics, 10Patch-For-Review: Upgrade all SWAP users to JupyterLab 1.0 - https://phabricator.wikimedia.org/T230724 (10Ottomata) After upgrading on notebook1003, iflorez still encountered still had the same strange node dependency issues from y... [18:11:53] 10Analytics, 10Analytics-Kanban: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Ottomata) I can successfully launch a mysqld process using one of these backups and read data from it. [18:21:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month/year - https://phabricator.wikimedia.org/T215655 (10leila) @diego please review below as well since you worked with GII folks during the past iteration: >>! In T215655#5446486, @JAllemandou wrote: > Ping @leila fo... [19:42:49] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Product-Analytics, 10Patch-For-Review: Upgrade all SWAP users to JupyterLab 1.0 - https://phabricator.wikimedia.org/T230724 (10Neil_P._Quinn_WMF) >>! In T230724#5446663, @Ottomata wrote: > After upgrading on notebook1003, iflorez still encountered stil... [19:55:21] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10Ottomata) > Spark 2.4 comes with scala 2.12 that offers experimental support for Java 11 I don't see this one! https://github.com/apache/spark/blob/v2.4.3/pom.xml#L158 has scala 2.11.12 which according to https://docs.... [20:20:14] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) Ah ha, my previous test didn't work because I hadn't distributed the pyspark 2.4.3 deps anywhere, and it was loading the old ones. My process of shippin... [20:43:03] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) Spark 2.4.3 also works just fine in YARN from stat1007 with Java 8 and Python 3.5. I'm going to build a new .deb that includes zipped up python dependen... [21:10:45] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month/year - https://phabricator.wikimedia.org/T215655 (10diego) > @diego please review below as well since you worked with GII folks during the past iteration: Looks ok to me. [22:14:38] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Jclark-ctr) Host moved cmjohnson. advised to move out if row B in to 10G racks leave 1 in B ` host... [22:30:14] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson