[06:57:36] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Improve Wikistats2 map zoom - https://phabricator.wikimedia.org/T198867 (10sahil505) [06:57:38] 10Analytics-Kanban, 10Outreach-Programs-Projects, 10Google-Summer-of-Code (2018): [Analytics] Improvements to Wikistats2 front-end - https://phabricator.wikimedia.org/T189210 (10sahil505) [09:12:29] Hi bearloga, I have goodies for you when you're up : https://gist.github.com/jobar/55394f0fdc5cc90875afc5da8826b573 [10:13:21] 10Analytics: Singapore does not appear on wikistats map - https://phabricator.wikimedia.org/T199571 (10fdans) Yeah, there are a bunch of countries that do not appear in the map due to the topojson's resolution not being enough. This includes among others Singapore, Liechtenstein, Andorra, and a lot of island cou... [10:31:54] 10Quarry, 10Cloud-Services: Enable ores_classification tables on labsDB - https://phabricator.wikimedia.org/T199987 (10Halfak) [10:31:56] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design, 10Readers-Web-Kanbanana-Board: [Spike 8hrs] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [10:36:12] * elukey lunch! [10:59:26] joal: you about sir? [11:16:33] Hi Seddon [11:16:44] How may I help? [11:21:19] 10Quarry, 10Cloud-Services: Enable ores_classification tables on labsDB - https://phabricator.wikimedia.org/T199987 (10chasemp) After a bit of poking and feeling like things /should/ be working if the right scripts were run I issued: | ores_classification | | ores_model | > maintain-views... [11:41:54] @joal I was wondering if I could possibly have access to Hue [11:47:15] Seddon: I see no reason why not :) [11:48:26] joal: do you need anything from myself? [11:49:58] Seddon: do you have an access to stat1004 for instance? [11:50:22] If so, it means you are in the correct LDAP group, and it's just about adding you in hue [11:50:33] If not, I think it requires some more LDAP updates [11:51:21] Seddon: elukey will be a lot more knowledgeable than me on the topic [11:51:31] He's gone for lunch, but should be back soon [11:51:48] Also Seddon, there [11:52:07] are a couple important things when querying hive, and in particular partitions [11:52:17] joal: I'm in analytics-privatedata-users [11:52:35] Seddon: Then you're good to go only with hue update [11:53:50] Seddon: Can you remind me your ldap username (seddon ?) [11:54:50] Seddon: about using hive for querying, in case you've not already read that: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries [11:55:51] joal: Yep, Seddon :) https://github.com/wikimedia/puppet/blob/production/modules/admin/data/data.yaml#L2770 [11:56:44] Seddon: You know puppet code base better than I do ;) [11:57:17] Seddon: sync on hue done, you can try to login [11:58:48] joal: Done! [11:58:56] joal: Working well [11:59:05] Great :) [11:59:17] Seddon: Don't hesitate to ask if you have questions :) [11:59:28] joal: Will do :) [12:44:40] ah nice ops work done while I was away :) [12:44:45] thanks joal ! [12:45:21] elukey: Seddon did the hard part of knowing if he was in the right group, suncing hue was easy ;) [12:45:31] o/ elukey :) [12:58:56] 10Quarry, 10Cloud-Services: Enable ores_classification tables on labsDB - https://phabricator.wikimedia.org/T199987 (10Halfak) 05Open>03Resolved a:03Halfak Thank you! [13:53:44] joal: fyi I am raising the Xmx/Xms settings for jumbo to 2g now (including a rolling restart [13:53:49] will start with jumbo1001 [13:54:41] elukey: ack! [14:04:57] joal: interesting https://grafana.wikimedia.org/dashboard/db/kafka?refresh=5m&panelId=43&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=jumbo-eqiad&var-cluster=kafka_jumbo&var-kafka_broker=All&from=now-1h&to=now [14:05:33] young gen collection counters went down [14:08:30] elukey: in interview, will be back [14:33:29] indeed elukey - Seems that some space makes kafka GC happier :) [14:43:52] 10Analytics: Generate pagecounts-ez data back to 2008 - https://phabricator.wikimedia.org/T188041 (10Milimetric) @CristianCantoro where do you have access? The files have to end up on terbium at some point, but I can move them to the right place if you put them anywhere. [14:51:19] 10Analytics: Turn off old geowiki jobs - https://phabricator.wikimedia.org/T190059 (10Milimetric) @fdans no the jobs are still running, ping me and we can go around together making sure everything's off and cleaned up. [14:53:32] 10Analytics, 10Mobile-Content-Service, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-iOS-App-Backlog: As an end-user I shouldn't see non-articles in the list of top articles - https://phabricator.wikimedia.org/T124082 (10Milimetric) We still want to improve our endpoints in the future, allowing users... [15:00:08] (03CR) 10Mforns: [V: 032 C: 032] "LGTM! Nice fix!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/446610 (https://phabricator.wikimedia.org/T198867) (owner: 10Sahil505) [15:03:23] fdans: standup [15:03:30] sorry [15:03:31] nvm [15:05:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging sanitization - https://phabricator.wikimedia.org/T199898 (10Milimetric) p:05Triage>03Normal [15:09:05] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Improvements to Wikistats2 chart popups - https://phabricator.wikimedia.org/T192416 (10sahil505) [15:09:55] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Improvements to Wikistats2 chart popups - https://phabricator.wikimedia.org/T192416 (10sahil505) [15:29:02] 10Analytics, 10Analytics-EventLogging, 10Wikimedia-log-errors: EventLogging-based extensions cause errors on test2.wikipedia.org - https://phabricator.wikimedia.org/T196309 (10Milimetric) This is one of the things that bugs Andrew about EL, so I'm putting it in the list of things to work on for Modern Event... [15:29:41] 10Analytics, 10Analytics-EventLogging, 10Wikimedia-log-errors: EventLogging-based extensions cause errors on test2.wikipedia.org - https://phabricator.wikimedia.org/T196309 (10Milimetric) p:05Triage>03Normal [15:32:34] 10Analytics, 10Analytics-Wikistats: Add annotation explaining this spike - https://phabricator.wikimedia.org/T200020 (10Milimetric) p:05Triage>03Normal [15:33:35] 10Analytics, 10Datasets-General-or-Unknown: Add checksums pageviews dataset - https://phabricator.wikimedia.org/T199461 (10Milimetric) p:05Triage>03Normal [15:33:49] 10Analytics, 10Datasets-General-or-Unknown: Add checksums pageviews dataset - https://phabricator.wikimedia.org/T199461 (10Milimetric) p:05Normal>03Triage [15:35:04] 10Analytics, 10Datasets-General-or-Unknown: Add checksums pageviews dataset - https://phabricator.wikimedia.org/T199461 (10Milimetric) p:05Triage>03Low [15:35:29] 10Analytics, 10Analytics-EventLogging: [EL sanitization] Retroactively sanitize (including hash and salt appInstallId fields) data in the events database - https://phabricator.wikimedia.org/T199902 (10Milimetric) p:05Triage>03Normal [15:35:31] 10Analytics, 10Analytics-EventLogging: [EL sanitization] Retroactively sanitize (including hash and salt appInstallId fields) data in the events database - https://phabricator.wikimedia.org/T199902 (10Milimetric) p:05Normal>03High [15:35:56] 10Analytics, 10Analytics-EventLogging: [EL sanitization] Store the old salt for 2 extra weeks - https://phabricator.wikimedia.org/T199900 (10Milimetric) p:05Triage>03Normal [15:36:10] 10Analytics, 10Analytics-EventLogging: [EL Sanitization] Set up salt creation and rotation - https://phabricator.wikimedia.org/T199899 (10Milimetric) p:05Triage>03Normal [15:36:12] 10Analytics, 10Analytics-EventLogging: [EL Sanitization] Set up salt creation and rotation - https://phabricator.wikimedia.org/T199899 (10Milimetric) p:05Normal>03High [15:36:39] 10Analytics: Table view of timely results in wikistats 2 should be ordered in time descending - https://phabricator.wikimedia.org/T199693 (10Milimetric) p:05Triage>03High [15:37:41] 10Analytics: Singapore does not appear on wikistats map - https://phabricator.wikimedia.org/T199571 (10Milimetric) p:05Triage>03Normal [15:38:31] 10Analytics: Increase topojson resolution: Singapore does not appear on wikistats map - https://phabricator.wikimedia.org/T199571 (10Milimetric) [15:39:13] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 5 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10Milimetric) [15:42:31] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Decommision edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Milimetric) p:05Normal>03High [15:42:34] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Decommision edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Milimetric) a:03Milimetric [15:45:55] 10Analytics, 10Analytics-General-or-Unknown, 10Trust-and-Safety, 10Wikimedia-Extension-setup: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963 (10Milimetric) 05Open>03declined This looks outdated, please open with more info if needed. [16:02:54] 10Analytics: Check EventBus doesn't need broker sync to enqueue - https://phabricator.wikimedia.org/T200025 (10Milimetric) [16:07:36] k elukey, I read it, super clear as your docs always are [16:07:48] thanks! [16:07:52] I was going to say that maybe mirror maker was making the problem worse and ask if that's set up in labs [16:08:01] but then I saw that it didn't replicate anything [16:08:05] cause the topics were empty [16:08:22] yeah exactly, it saved us [16:08:24] that was my only new thought [16:09:28] cool, I'm gonna bike over to lunch and I'll let you know if I think of anything else [16:11:50] ttl! [16:17:19] 10Analytics: Count the number of video plays - https://phabricator.wikimedia.org/T198628 (10Milimetric) Yes, almost. Clicking "play" and also deciding what it means that someone "viewed" the video. Like, does watching 1/2 of it count? What about skipping through it? Those questions have standard answers on s... [16:19:00] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Make the colors used the line charts in Wikistats 2 more easy to recognize. - https://phabricator.wikimedia.org/T183184 (10Milimetric) I like the new colors, nice job @sahil505 [16:20:33] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042 (10Milimetric) @DStrine: ok, let us know when you have a solid plan for it and how it fits in with other work. [16:24:07] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) I see, @Itzike, it does sound like the server we have might be the best place. It might not be able to support the traffic though, let me do some quick checking to see. Moving this task to an actiona... [16:24:32] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) p:05Triage>03Normal [16:24:45] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) a:03Milimetric [16:30:00] Howdy, folks! I just submitted an indexing task for Druid using this ingestion schema: https://gist.github.com/bearloga/c311cdcd3a61f4435b4b006cf119c30e and the data at hdfs://analytics-hadoop/tmp/wmf_gsc.csv; can someone please verify that Druid is now ingesting the CSV or if it error'd? [16:33:47] (I submitted it directly over http with curl and got a 200 response code so at least it got accepted.) [16:48:11] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 5 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) A little recap and current status. We tried obtaining heap dumps from memory-fat processes in codfw, but these wer... [16:55:50] 10Analytics, 10Research: Provide data dumps in the Analytics Data Lake - https://phabricator.wikimedia.org/T186559 (10EBernhardson) Parsing is one of the more expensive things in mediawiki. Due to the expense the ParserOutput is serialized into a multi-layer cache (memcached, then mysql) so for the most part y... [17:02:58] I also have a JSON-formatted dataset and the appropriate ingestion spec on standby ready to go in case the CSV doesn't work. [17:04:54] joal milimetric: would either of you have a moment today to help me with ^? (please and thank you) Nuria was helping me yesterday but isn't available today to continue [17:49:28] hi bearloga! I am not a big expert but I can try to help [17:50:00] so I am seeing in druid1001's middle manager an indexation for index_hadoop_gsc_alletc.. [17:50:33] that failed for Caused by: java.lang.IllegalStateException: Optional.get() cannot be called on an absent value [17:50:49] that I think is indicating a configuration error in the json [17:51:30] did you use as template one of our druid json indexation spec in the refinery? [17:52:04] elukey: ah, thank you. and yup, I did. I'm going to make some changes and try again in a couple of minutes. [17:52:19] not sure if needed but "type" : "hadoop", seems missing [17:52:24] in ioConfig [17:54:52] for the datasource name, if this is a test we have a convention of using test_ as prefix [18:03:08] elukey: I'll prefix with "test_" in this next attempt. would you be able to delete the existing "gsc_all" datasource? I'm seeing it in turnilo right now and it's got 1 observation from yesterday's attempts so we should just get rid of it [18:03:42] bearloga: this is the part that I don't know, we'll need to wait for master joal :) [18:19:27] elukey: just submitted the indexing for "test_gsc_all" datasource. do you see anything good or bad in the logs? [18:20:47] elukey: using this ingestion spec https://www.irccloud.com/pastebin/tsmrJ2jt/druid-json-spec_country-all.json [18:24:09] checking [18:25:00] seems failed again, Caused by: java.lang.IllegalStateException: Optional.get() cannot be called on an absent value [18:27:34] elukey: crap crackers :( do you see anything wrong with the spec I posted? because it looks correct to me [18:29:41] elukey: maybe it's an issue with the json-formatted data? I'm going to submit an indexing job that ingests the csv-formatted data just in case so at least we can narrow it down to being a problem with the spec [18:30:25] elukey: submitted [18:32:34] bearloga: mmm qq - why is the intervals array not specified in dataSchema ? [18:32:49] it seems in inputSpec [18:32:53] is it correct? [18:34:44] elukey: oh you're right, that is in the wrong spot [18:35:13] elukey: thanks for noticing! alright, let me fix and try again :) [18:38:07] elukey: submitted. hopefully this is the one :D [18:38:32] 2018-07-19T18:38:08,528 INFO org.apache.hadoop.mapreduce.Job: map 0% reduce 0% [18:38:36] much better :) [18:38:52] it is indexing now [18:39:45] YES YES YES! elukey thank you very much! [18:41:04] should be in druid now! I am seeing success [18:42:18] elukey: AWESOME!!! and yup, there it is in turnilo. [18:43:53] elukey: do you know if I need to add it manually to Superset's list of datasources? [18:46:44] IIRC in superset we already added druid as source [18:46:49] so it should be there (in theory) [18:53:30] elukey: ah yup, just needed to press "scan for new datasources" :) thanks so much for helping me through this1 [18:54:59] super happy that I was helpful :) [19:23:28] sorry I missed your ping bearloga, I'll delete the gsc_all datasource [19:23:33] thanks for covering Luca [19:23:36] milimetric: thanks! [19:25:26] the docs are here, but it's ok if you leave it to us to do: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Preparation_for_deletion [19:27:38] k, it looks deleted but it'll still show up in the UI until it refreshes datasources: https://turnilo.wikimedia.org/#gsc_all [19:36:39] 10Analytics, 10Research: Provide data dumps in the Analytics Data Lake - https://phabricator.wikimedia.org/T186559 (10Milimetric) It may be monstrous to run, but that may be something we need to do to extract certain metrics historically, like how categories change. Over this quarter and the next we'll look i... [19:40:46] ah milimetric didn't know about https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Preparation_for_deletion, nice! [19:40:51] thanks :) [19:51:50] 10Analytics: Table view of timely results in wikistats 2 should be ordered in time descending - https://phabricator.wikimedia.org/T199693 (10sahil505) I feel that this option should be given to the user, as in there should be a toggle button for changing the order from ascending to descending or vice-versa. This... [20:03:59] org.apache.spark.SparkException: Task not serializable :'(((((( [20:24:50] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042 (10LilyOfTheWest) @Milimetric Hashing out the first step of what's needed from the community organizer end is relatively straightforwa... [20:42:46] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042 (10DStrine) There are a bunch of privacy issues. This requires a lengthy research project and some level of community involvement. @Js... [20:50:19] 10Analytics: Table view of timely results in wikistats 2 should be ordered in time descending - https://phabricator.wikimedia.org/T199693 (10mforns) Makes sense to me! [22:01:06] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042 (10LilyOfTheWest) @DStrine what are the timelines you're looking into for setting aside time for this? Knowing how long you expect to...