[00:23:00] 06Analytics-Kanban: Check abnormal pageviews for XHamster - https://phabricator.wikimedia.org/T158071#3173880 (10Nuria) Android Samsung tablets from EU countries and US make the bulk of pageviews on 2016 and 2017 . There is not even desktop browsers on the top 10 uas that browsed that site , Chrome appears spa... [00:31:40] 06Analytics-Kanban: Check abnormal pageviews for XHamster - https://phabricator.wikimedia.org/T158071#3173884 (10Nuria) >It seems the number of unique IPs that visited /wiki/XHamster is significantly less than articles that received around the same number of pageviews. Not sure what was the interval you looked a... [00:36:04] 06Analytics-Kanban: Check abnormal pageviews for XHamster - https://phabricator.wikimedia.org/T158071#3173890 (10Nuria) Maybe wikipedia page is the "launching" pad for some popular video? I could not find anything obvious on citations and such. Closing as there is nothing here that points to a bot, rather traf... [00:38:51] (03PS3) 10Bearloga: [cirrus] Distinguish morelike vs fulltext api search requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/345863 (owner: 10DCausse) [00:54:43] 10Analytics-Cluster, 06Analytics-Kanban: Provision new Kafka clusters in eqiad and codfw with security features - https://phabricator.wikimedia.org/T152015#3173926 (10RobH) [06:21:27] 10Analytics, 06DC-Ops, 06Operations, 10ops-eqdfw: SATA errors for stat1004 in the dmesg - https://phabricator.wikimedia.org/T162770#3174169 (10elukey) [07:52:58] 10Analytics-Tech-community-metrics: Author names that include commata or "and" are split into separate identities in the frontend - https://phabricator.wikimedia.org/T161241#3174254 (10Aklapper) p:05Triage>03Low [08:06:16] 10Analytics-Tech-community-metrics: Git code repository is listed but not all recent activity in it is shown on wikimedia.biterg.io - https://phabricator.wikimedia.org/T161211#3174308 (10Aklapper) a:03Aklapper I need to check this a bit more first, assigning to me [08:39:30] (03CR) 10Joal: "@nuria: Right, the commit message you suggested would have been better, but original one has already been merged by ottomata :(" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347635 (owner: 10Joal) [08:45:20] wow available space on hdfs is now ~1,7 PiB [08:45:21] :D [08:45:33] elukey: YAYYYYY ! [08:45:49] elukey: and available RAM for workers is now 2Tb :) [08:46:06] * joal will boost some of the workers for text processing :-P [08:46:11] joal: I am about to merge https://gerrit.wikimedia.org/r/#/c/347814/ [08:46:35] and will have to roll restart yarn nodemanagers and hdfs datanodes during the next couple of days [08:46:37] elukey: commit message [08:46:56] ?? [08:46:58] to have Xmx and Xmx settings --> Xms and Xmx ? [08:47:16] uff yes you are right :) [08:47:40] fixed thanks :) [08:47:42] anyhowww [08:47:45] elukey: except from reading the commit message, I don't even try to look at the code ;) [08:47:55] elukey: no bother for rolling restart [08:47:58] this will add Xms to datanodes and nodemanagers [08:48:05] cool [08:48:07] that hopefully will ease a bit GC [08:49:31] 06Analytics-Kanban: Make sure oozie workflows sent e-mail if they fail - https://phabricator.wikimedia.org/T162742#3173163 (10JAllemandou) [08:49:34] 06Analytics-Kanban: Restart oozie jobs for email alerts correction - https://phabricator.wikimedia.org/T162715#3174436 (10JAllemandou) [08:54:01] elukey: https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=Analytics+cluster+eqiad&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [08:54:28] elukey: cluster is empty 10 minutes before a clock-hour -- That is so AWESOMEEEE ! [08:55:44] :) [09:02:54] changes live on analytics1040 [09:43:02] elukey: I also think we could try https://github.com/Quantiply/grafana-plugins/tree/master/features/druid [09:44:04] also elukey, I'd love to try https://github.com/metabase/metabase [09:49:58] joal: nice! [09:50:04] Just merged your change btw :) [09:50:08] thanks [09:50:14] elukey: Could you restart pivot? [09:50:19] already done [09:50:30] okeyyyy [09:50:31] :0 [09:50:53] elukey: what would be an easy way to test metabase? [09:51:29] I'd say labs.. [09:51:39] joal: another thing - are you currently using clickhouse? [09:51:46] elukey: not currently no [09:51:47] otherwise I'd like to shut it down [09:51:50] please [09:53:31] !log stop Clickhouse on druid100[123] [09:53:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:53:56] elukey: Will you start some presto instead? [09:53:58] :-P [09:54:07] * joal hides far away [09:54:10] ahhahah [09:54:11] :) [09:54:40] elukey: I'm more and more thinking of that, given for instance that metabase requests presto AND druid [09:55:22] hopefully presto is debianized : [09:55:22] :D [09:56:47] elukey: I don't know ! from https://prestodb.io/docs/current/installation/deployment.html it seems to be a jar [09:57:23] we can open a phab task to test it and maybe package it if we like it :) [09:57:34] elukey: sounds awesome :) [10:05:32] going afk for a bit (doctor + lunch!) [10:05:33] ttl! [10:30:07] taking a break a-team - later ! [12:03:51] joal: automatic yarn-node-manager restart with cumin [12:03:51] cumin 'R:class = role::analytics_cluster::hadoop::worker and not analytics1030*' 'systemctl restart hadoop-yarn-nodemanager' -b 3 -s 10 [12:04:03] (analytics1030 is not reachable because of hw failures) [12:04:18] -b == batch -s == seconds between each batch [12:04:23] \o/ [12:05:27] so atm I al rolling restart the yarn daemons to pick up Xms [12:05:34] then I'll do the hdfs datanodes [12:05:39] veeery slowly [12:05:46] like 2 at the time every minute [12:19:54] also updated https://grafana.wikimedia.org/dashboard/db/analytics-hadoop [12:23:53] ok started the datanodes restart [12:24:04] batches of two hosts with 60s pause between each of them [12:53:33] 10Analytics-Tech-community-metrics, 07Regression: Only display organizations defined in Wikimedia's DB (disable assuming orgs via hostnames in email addresses) - https://phabricator.wikimedia.org/T161308#3175071 (10Albertinisg) >>! In T161308#3172186, @Aklapper wrote: > @Albertinisg: Thanks for looking into th... [12:54:11] elukey: This just seems almost too easy ;) [12:54:42] all restarted and it seems with no impact [12:55:53] elukey: I have not seen anything so far [12:57:00] 10Analytics-Tech-community-metrics: Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3175081 (10Albertinisg) >>! In T157898#3172526, @Aklapper wrote: >>>! In T157898#3170220, @Albertinisg wrote: >>> I merged https://github.com/Bitergia/med... [13:42:51] joal: do you run refinery scala unit tests in intellij? [13:43:10] ottomata: I do [13:47:08] help me? :) [13:47:53] ottomata: in meeting, after, for sure :) [13:48:58] k [14:10:07] ottomata: ready for yaaa [14:10:22] yaa [14:10:23] bc [14:10:37] yup [14:35:52] joal: http://stackoverflow.com/questions/4652095/why-does-the-scala-compiler-disallow-overloaded-methods-with-default-arguments [15:08:23] 06Analytics-Kanban: Verfify MaxMind is updated regularly - https://phabricator.wikimedia.org/T162616#3175357 (10Nuria) a:05Milimetric>03Ottomata [15:15:05] 06Analytics-Kanban: Verfify MaxMind is updated regularly - https://phabricator.wikimedia.org/T162616#3175363 (10Ottomata) > 1030 didn't See: T162046 > Isn't this regex wrong Totally, and this is so unused. I'll remove this view altogether. Dan, I think you're getting a little mixed up with what classes inclu... [15:25:55] milimetric: it looks super good, but I have one little nit that is probably due to me not being in the target audience (that is people part of round 1) [15:26:30] when you mention "Detail page" it takes a bit for the (dumb Luca) reader to figure out that you are taking about the Contributing page [15:27:02] and a doubt rises - is a detail page contributing or reading or content? [15:27:07] or all of them? [15:27:29] the rest is very straightforward [15:29:16] elukey: thanks very much, the Detail page is detail about whichever category you're in, so there's a Contributing Detail which uses the Blue color, a Content detail using the Yellow, and a Reading detail with Green. I was wondering about this too, maybe reinforcing it by underlining the selected category with its color on the top navigation. Have you [15:29:17] clicked around the prototype? https://analytics-prototype.wmflabs.org/ [15:29:47] I'll make this more obvious in the consultation page, but I'm curious if it makes sense in the prototype [15:30:20] milimetric: oh yes I checked it, after a sec is super clear [15:30:41] but only reading might be confusing in the beginning because my brain tried to look for "Detail" in the image [15:34:35] ok, updated a little blurb under the Detail section, it says "The Detail page shows metric details for metrics in the Contributing, Reading, or Content categories. It uses color and category highlighting on the top navigation to indicate which category you are in. This is not reinforced too strongly because the metric selector and topic explorer both allow [15:34:36] cross-category navigation." [15:35:54] milimetric: i also added a link to talk page , as i did not know those existed until i got this job [15:36:35] Oh yeah, good call Nuria, me neither but now I'm used to clicking Discussion [15:37:10] k, making the list checking it twice now [15:57:35] a-team: meeting in batcave? [15:57:49] yeah [16:00:15] (03CR) 10Mforns: [C: 031] "LGTM" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) (owner: 10Joal) [16:23:10] joal: scala q if you have a sec [16:23:18] ottomata: in meeting again [16:23:55] k [16:56:53] (03PS3) 10Milimetric: [WIP] Design thoughts for AQS edit history API [analytics/aqs] - 10https://gerrit.wikimedia.org/r/347637 [17:06:25] (03PS4) 10Milimetric: [WIP] Design thoughts for AQS edit history API [analytics/aqs] - 10https://gerrit.wikimedia.org/r/347637 [17:07:19] * elukey goes afk! [17:07:19] o/ [17:07:46] 06Analytics-Kanban: Design document for wikistast prototype backend - https://phabricator.wikimedia.org/T162817#3175684 (10Nuria) [17:16:49] hey ottomata, scal ? [17:19:23] ya [17:19:29] cave? [17:19:30] bc? [17:24:39] 06Analytics-Kanban: Design document for wikistats prototype backend - https://phabricator.wikimedia.org/T162817#3175786 (10Nuria) [17:32:59] (03PS5) 10Ottomata: [WIP] Spark + JSON -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [17:35:50] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Spark + JSON -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [17:41:55] 06Analytics-Kanban: Design document for wikistats prototype backend - https://phabricator.wikimedia.org/T162817#3175863 (10Nuria) Ping @milimetric please take a look, i started design document with some notes from meeting today [17:44:53] halfak: Would you have a minute for me? [17:48:45] halfak is not around, I'm gone ! [17:48:49] later a-team ! [17:48:57] bye joal ! [17:51:32] ebernhardson: yt? [17:57:47] nuria: yup [17:58:18] * ebernhardson guesses this is about the weird cirrus-specific UDF [18:02:40] laters! [18:28:46] ebernhardson: nahm, i wanted to ask where is the code you guys are developing for your machine learned added model for search results [18:32:16] (03CR) 10Nuria: [WIP] Update banner monthly job to reuse index (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/347653 (https://phabricator.wikimedia.org/T159727) (owner: 10Joal) [18:38:18] nuria: well, i have a POC that works developed in decemeber, but it was all quickly written, no tests and no review [18:39:20] nuria: so right now i'm writing it a bit more carefully, adding tests, and going through review. I have patches for generating click logs (merging webrequests and cirrus logs), sampling those click logs to # of queries per wiki, and using a statistical model for generating relevance labels from sessions on those clicks [18:40:26] next things to work out are feature generation (shipping queries from analytics to prod to generate queries in elasticsearch), merging those features with the statistical labels, formatting all that for a training library, performing hyperparamter optimization with the training library, and then converting the library model output into something the elasticsearch plugin can use [18:41:08] so the plans are fairly well laid out, some of it has patches, very little is merged [18:43:57] the goal is to have most of this plubing worked out by the end of may, so by june we can focus more directly on feature engineering [18:53:36] ebernhardson: i see, still much in flux [18:54:21] ebernhardson: where do you think the feature computation will happen? the cluster? (seems like it would be will suited, not sure if there are other plans) [18:54:34] nuria: feature computation will happen inside elasticsearch [18:55:04] nuria: for the most part at least, we may at some point come up with document specific features, like ores wp10, that are calculated externally and provided at index time [18:55:11] ebernhardson: ohhh, ok i guess i know nothing about it cause i did not think that was possible [18:55:38] ebernhardson:but i guess is like "index" calculations just a different pipeline [18:56:09] nuria: most of the features are the same kinds of things elasticsearch already calculates, like term frequencies, bm25, etc. Some things like one-hot encoding of popular templates/categories can be done at indexing time so elasticsearch would have individual fields with booleans [18:57:52] ebernhardson: i see, and the input for features comes from your data harvesting on cluster, is taht so? [18:57:54] *that [18:59:17] nuria: right. So for example currently we generate lots of individual features, like bm25 query vs title, bm25 query vs title.plain, etc and then combine them in a fairly straight forward weighted sum. Switching to machine learning as the first step mostly takes all these things we already calculate, but letting a machine learning algo decide how to combine them into ranking based on user behaviour statistics that are done in the a [19:00:07] and having machine learning figure out how to combine them opens us up to being able to add more features, currently the weighted sums are hand tuned equations with 20+ variables which is incredibly hard to adjust by hand [19:00:18] k [19:00:28] i think i got the big picture [19:01:51] the analytics cluster mostly comes in by: collecting web requests, collecting search logs, merging them into click logs, running statistics to label the click logs, and training the ML algo [19:09:23] 10Analytics-Tech-community-metrics, 06Developer-Relations, 10Differential: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#3176185 (10Aklapper) p:05Normal>03Low [19:12:35] 06Analytics-Kanban: Check abnormal pageviews for XHamster - https://phabricator.wikimedia.org/T158071#3176191 (10MusikAnimal) >>! In T158071#3173884, @Nuria wrote: > Not sure what was the interval you looked at but number of cities seems pretty spread out. For example see Germany for cities with motre than 100 v... [19:22:42] 06Analytics-Kanban: Check abnormal pageviews for XHamster - https://phabricator.wikimedia.org/T158071#3176204 (10Nuria) >Maybe it's wrong to assume any page in top 20 or so should have a high unique IP count, but it seems crazy to me that on a single day, only ~3500 unique IPs viewed XHamster, yet the total page... [19:33:02] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Have "Last Attracted Developers" information for Gerrit (already exists for Git) - https://phabricator.wikimedia.org/T151161#3176235 (10Aklapper) Very welcome surprise by Bitergia (thanks folks!): This should be possible soon. Admins can... [19:42:49] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Have "Last Attracted Developers" information for Gerrit (already exists for Git) - https://phabricator.wikimedia.org/T151161#3176245 (10Aklapper) Config: For Wikimedia, we can kill the "New Authors per First Project"/`C_Gerrit_Demo_Proje... [19:44:30] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3176248 (10Nuria) [19:44:32] 06Analytics-Kanban: Document and publicize AQS legacy page counts endpoint - https://phabricator.wikimedia.org/T159959#3176247 (10Nuria) 05Open>03Resolved [19:48:29] 10Analytics-Tech-community-metrics: Git's "Last Attracted Developers" lists established developers and developers without a First Commit Date - https://phabricator.wikimedia.org/T161309#3176255 (10Aklapper) p:05Normal>03Low Bitergia folks said that this is a known bug (across all installations) and that they... [20:19:23] (03PS6) 10Ottomata: [WIP] Spark + JSON -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [20:21:48] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Spark + JSON -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [20:23:46] (03PS7) 10Ottomata: [WIP] Spark + JSON -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [20:25:05] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Spark + JSON -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [21:00:11] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017), 07Regression: Only display organizations defined in Wikimedia's DB (disable assuming orgs via hostnames in email addresses) - https://phabricator.wikimedia.org/T161308#3176513 (10Aklapper) Awesome! For the records (and for anyone who... [21:03:59] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Reviewers per month" in Kibana - https://phabricator.wikimedia.org/T151559#3176525 (10Aklapper) [21:04:11] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Reviewers per month" in Kibana - https://phabricator.wikimedia.org/T151559#2821054 (10Aklapper) [21:04:13] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Reviewers per month" in Kibana - https://phabricator.wikimedia.org/T151559#2821054 (10Aklapper) p:05Normal>03Low > Leaves us with people performing reviews... We quickly discussed this in today's meeting and given other priorities this will like... [21:30:32] 10Analytics-Tech-community-metrics: Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3176608 (10Aklapper) [21:30:34] 10Analytics-Tech-community-metrics: On the "Git" dashboard, filtering on one organization still lists authors who are with another organization - https://phabricator.wikimedia.org/T157709#3176609 (10Aklapper) [21:30:36] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017), 07Regression: Only display organizations defined in Wikimedia's DB (disable assuming orgs via hostnames in email addresses) - https://phabricator.wikimedia.org/T161308#3176607 (10Aklapper) [22:05:55] 10Analytics-Tech-community-metrics, 07Regression: Git repo blacklist config not applied on wikimedia.biterg.io - https://phabricator.wikimedia.org/T146135#3176714 (10Aklapper) p:05High>03Normal **** 1) A trivial workaround exists. See next comment. 2) Having the blacklist //not// applied has an adva... [22:06:39] 10Analytics-Tech-community-metrics, 07Regression: Git repo blacklist config not applied on wikimedia.biterg.io - https://phabricator.wikimedia.org/T146135#3176718 (10Aklapper) == How to get the blacklist applied manually == **Git:** Go to https://wikimedia.biterg.io/app/kibana?#/dashboard/Git and enter the fo... [22:51:23] 10Analytics-Tech-community-metrics, 07Regression: Git repo blacklist config not applied on wikimedia.biterg.io - https://phabricator.wikimedia.org/T146135#3176891 (10Aklapper) [22:55:44] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3176902 (10Aklapper) According to Bitergia, T157898 and T161235 (and to some extent T157709 though half of that is... [22:55:46] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): https://wikimedia.biterg.io shows 2017 contributors who are not listed in mediawiki-identities/wikimedia-affiliations.json - https://phabricator.wikimedia.org/T161235#3125956 (10Aklapper) According to Bitergia, T157898 and T161235 (and t... [22:55:48] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): On the "Git" dashboard, filtering on one organization still lists authors who are with another organization - https://phabricator.wikimedia.org/T157709#3013972 (10Aklapper) According to Bitergia, T157898 and T161235 (and to some extent T... [23:15:38] 10Analytics-Tech-community-metrics, 06Developer-Relations, 10Differential: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#3176994 (10Aklapper) a:05Lcanasdiaz>03None Asked WMF's RelEng team for [[ https://www....