[03:46:58] 10Analytics: third party domain data is getting refined - https://phabricator.wikimedia.org/T228557 (10Nuria) Unit tests work fine so there must be something about the way this change is applied to the df that makes it not work, pining @Ottomata to work on this tomorrow [08:38:30] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) p:05Triage→03High [08:38:34] sigh [08:38:50] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for HDFS namenodes on an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) [08:39:07] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for HDFS namenodes on an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) [08:51:34] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for HDFS namenodes on an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) Some useful links: * https://yarn.wikimedia.org/cluster/apps/RUNNING doesn't show anything running since the 11/12th * https://tools.wmflabs.org/... [09:17:53] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for HDFS namenodes on an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) Found the explanation, and added a useful graph to the Hadoop dashboard: https://grafana.wikimedia.org/d/000000585/hadoop?panelId=87&fullscreen&o... [09:34:56] just restarted the namenode on an-master1002 (the standby) for --^ [09:35:05] let's see if it fixes the issue for the moment [09:53:34] seems so, restarting also the other namenode [10:01:09] done! [10:22:17] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for HDFS namenodes on an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) The issue seems fixed! I'd say that the last step is to create an alarm for https://grafana.wikimedia.org/d/000000585/hadoop?panelId=87&fullscreen... [10:32:07] * elukey lunch!! [11:18:19] (03PS5) 10Fdans: Add UDF to get wiki project from referer string [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523903 (https://phabricator.wikimedia.org/T228151) [11:27:33] (03PS4) 10Fdans: Add file extension and media classification to mediacounts job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/522390 (https://phabricator.wikimedia.org/T225911) [13:12:58] (03PS5) 10Fdans: [wip]Add file extension and media classification to mediacounts job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/522390 (https://phabricator.wikimedia.org/T225911) [13:34:37] 10Analytics, 10Analytics-Kanban: Frequent/Long GC old gen collections for HDFS namenodes on an-master100[1,2] - https://phabricator.wikimedia.org/T228620 (10elukey) Created https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#HDFS_Namenode_Heap_settings [13:45:25] 10Analytics, 10Analytics-Kanban: Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data - https://phabricator.wikimedia.org/T227484 (10Ottomata) I only backfilled two recent hours that failed for this data. [13:47:59] 10Analytics: third party domain data is getting refined - https://phabricator.wikimedia.org/T228557 (10Ottomata) Hm, ok.... [14:04:21] hey a-team, can I ask a question about turnill0? [14:04:53] dsaez: of course! [14:05:01] last information about pageviews there is from June 30th, is that normal? [14:07:09] dsaez: no, that's not normal, I'm checking what's going on, thanks for letting us know [14:07:28] ok! thanks milimetric [14:08:14] so weird, coordinator's reporting success throughout July: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0064444-190417151359684-oozie-oozi-C/ [14:15:59] milimetric: I can see segments for pageview_hourly up to today [14:16:19] dsaez: did you use pageview hourly or daily? [14:16:32] daily doesn't show data beyond June 30 [14:16:41] daily [14:17:01] hourly shows it fine [14:18:03] I recall something that joseph explained to me in the past, that pageviews daily is updated month by month [14:18:17] meanwhile if you want "fresh" data you use hourly [14:18:32] because IIRC daily is composed by monthly segments [14:18:37] but I could be super wrong [14:19:35] (so the monthly segments are created when the month is over, hence lagging) [14:19:57] I am checking http://localhost:8081/#/datasources/pageviews_daily (the druid1001's coordinator uo) [14:20:00] ui [14:20:27] milimetric: --^ [14:20:34] hm, that would make sense except there's a daily coordinator [14:20:47] for example, this was yesterday's run: https://hue.wikimedia.org/oozie/list_oozie_workflow/0006866-190715143115257-oozie-oozi-W/?coordinator_job_id=0064444-190417151359684-oozie-oozi-C [14:21:11] I was also looking at the pageviews_daily datasource [14:21:12] that is IIRC to aggregate hourly by day [14:21:25] ok, I'll check the code [14:21:29] this is confusing :) [14:21:47] hahaha yes [14:21:57] so in the druid coord ui, I can see [14:22:16] 1) pageviews_hourly, that has segments for all the hours of the 22nd (today) but daily segments for older days [14:22:25] you're right: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/druid/monthly/coordinator.properties#L61 [14:22:26] 2) pageviews_daily, that has monthly segments, [14:22:30] monthly updates daily [14:22:33] daily updates hourly [14:22:35] hourly updates.... [14:22:38] lol [14:22:41] ahahhaha [14:22:46] inception [14:22:59] also hourly... [14:23:17] this doesn't make sense, we have to rename these things :) [14:24:14] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10fdans) @MusikAnimal @Doc_James @Tgr the following are the endpoints we're planning to roll out from the webrequest-based data we have currently: https://wikitech.wikimedia.org/wiki... [14:24:18] dsaez: so yes, it's "normal" in that the "daily" datasource in Druid updates monthly. If you need the most recent data, take a look at the pageview_hourly datasource and you can aggregate daily there [14:24:54] oook, thanks! [14:27:18] PROBLEM - yarn.wikimedia.org HTTPS on analytics-tool1004 is CRITICAL: connect to address 10.64.36.116 and port 443: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster [14:27:45] this is expected! [14:28:01] sorry but the nginx package for buster needs a rebuild etc... [14:28:14] the message is really confusing though [14:29:49] why does nginx in buster need a rebuild? [14:34:24] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Tnegrin) Adding the SDC folks as we probably want to think about how this integrates with structured data. Or not. [14:38:43] moritzm: it seems missing a patch for ssl_dyn_rec_enable [14:40:10] moritzm: on analytics-tool1004 I get nginx: [emerg] unknown directive "ssl_dyn_rec_enable" in /etc/nginx/nginx.conf:53 [14:40:22] I was checking with e*ma and it seems that we add a patch to nginx? [14:41:46] yeah, but we should use the opportunity to revisit which of our patches are still needed/sensible with 1.14 [14:42:19] most probably are still needed, but maybe some can get folded in favour of upstream development [14:42:28] moritzm: I completely agree, I might have used the wrong terminology with "we need a rebuild", I was trying to quickly update my team :) [14:42:45] ack :-) [14:46:19] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Tgr) Cool! Is the project domain in the URL the referer, or the hosting project? In the latter case, having similar endpoints (file, site, top) for querying Commons-hosted images... [14:49:18] (03CR) 10Nuria: [C: 04-1] "Couple nits, I think this is almost ready to go." (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/523903 (https://phabricator.wikimedia.org/T228151) (owner: 10Fdans) [14:51:26] nuria: I ran a query to check max string length on referers (internal) and max was ~2500 chars [14:51:38] fdans: that sounds spammy [14:51:54] fdans: did you check what referrer was ? [14:52:27] fdans: cause large UAS/headers indicate automated traffic and VERY LARGE ones indicate automated traffic of hostile nature [14:52:41] fdans: we can take a look [14:52:43] nuria: it was internal but i didn't see the string itself [14:53:47] nuria: btw you want me to take this change? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/517641/ [14:54:11] fdans: then it feels like a tool in labs (or similar) sending a lot of uneeded info [14:54:40] fdans: I HAVE to send the patch for teh other media counts work with the renames, sporry about that [14:54:49] arg, i got totally distracted with issues on refine [14:55:16] (03CR) 10Nuria: [C: 03+2] "Bumping up jar version, getting ready to re-start pageview jobs" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/524298 (https://phabricator.wikimedia.org/T226730) (owner: 10Nuria) [14:55:24] (03CR) 10Nuria: [V: 03+2 C: 03+2] Bumping up jar version on webrequest load [analytics/refinery] - 10https://gerrit.wikimedia.org/r/524298 (https://phabricator.wikimedia.org/T226730) (owner: 10Nuria) [14:56:51] nuria: fun fact, this is the longest-named multiword article in enwiki [14:56:52] https://en.wikipedia.org/wiki/Instruction_Concerning_the_Criteria_for_the_Discernment_of_Vocations_with_regard_to_Persons_with_Homosexual_Tendencies_in_view_of_their_Admission_to_the_Seminary_and_to_Holy_Orders [14:57:56] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Git Commit hook that adds a whole new file when a new version of schema is committed - https://phabricator.wikimedia.org/T206812 (10Ottomata) Since we merged use of this into mediawiki/event-sche... [14:59:04] fdans: jaja, but referrers from wiki only send top domain [14:59:18] fdans: so referrer for thsi article will be just en.wikipedia.org [14:59:25] fdans: what a great find [14:59:26] nuria: hmmm i don't think so [14:59:37] i think thwy send the whole thing in most cases [14:59:53] let me double check [15:00:13] fdans: let me see, we might send only top domain if we are linking outside [15:01:36] ping ottomata standdup? [15:02:39] holaaa ottomata [15:08:35] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) [15:08:59] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10ops-eqiad: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. - https://phabricator.wikimedia.org/T225128 (10Ottomata) OO, when we reimage these, let's use Buster! :) [15:40:46] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Ottomata) p:05Normal→03High a:03Milimetric [15:43:21] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Operations, 10Research-Backlog: Make oozie swift upload emit event to Kafka about swift object upload complete - https://phabricator.wikimedia.org/T227896 (10Ottomata) [15:45:54] 10Analytics, 10EventBus, 10MassMessage, 10Operations, and 3 others: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10CDanis) 05Resolved→03Open >>! In T226109#5279992, @CDanis wrote: > Am I alone in feeling like this probably deserves an [[ https://wikitech.wikimed... [15:46:32] 10Analytics, 10EventBus, 10MassMessage, 10Operations, and 3 others: Write incident report for jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10CDanis) p:05Unbreak!→03Normal [15:48:33] 10Analytics, 10Analytics-Kanban: Page creation data stream died June 6 - https://phabricator.wikimedia.org/T228188 (10Ottomata) [15:48:37] 10Analytics, 10Analytics-EventLogging: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Ottomata) [16:02:17] 10Analytics: third party domain data is getting refined - https://phabricator.wikimedia.org/T228557 (10fdans) p:05Triage→03High [16:03:24] 10Analytics, 10Analytics-Kanban: Better error message for refine monitor so it takes into account that backfilling might be happening - https://phabricator.wikimedia.org/T228522 (10fdans) p:05Triage→03High [16:04:58] 10Analytics, 10LDAP-Access-Requests, 10Operations, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10fdans) a:03Nuria [16:05:06] 10Analytics, 10LDAP-Access-Requests, 10Operations, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10fdans) p:05Triage→03High [16:08:18] 10Analytics, 10Operations, 10Traffic: Fix geoip updaters for new MaxMind hashed keys by 2019-08-15 - https://phabricator.wikimedia.org/T228533 (10Milimetric) @faidon: we don't have any updaters on our end, we just move the databases around and keep backups for historical use. But let us know if you run into... [16:12:17] 10Analytics, 10Multimedia, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Ramsey-WMF) [16:21:23] 10Analytics, 10EventBus, 10MassMessage, 10Operations, and 3 others: Write incident report for jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10jijiki) [16:46:22] 10Analytics, 10LDAP-Access-Requests, 10Operations, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10Nuria) @Varnent Deb needs to sign an nda, someone will verify is been so and she can be added to the group that has access to this... [16:52:02] 10Analytics, 10LDAP-Access-Requests, 10Operations, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10Varnent) @Nuria - the NDA was a part of her onboarding - so she should be all set. :) [16:52:59] 10Analytics, 10WMDE-Analytics-Engineering: Public Data Review Needed - https://phabricator.wikimedia.org/T227905 (10Nuria) @GoranSMilovanovic can you describe what fields does this data have as in a snippet and a description of every column? [16:57:50] 10Analytics, 10ChangeProp, 10Core Platform Team, 10EventBus: RESTBase content rerenders sometimes don't pick up the newest changes - https://phabricator.wikimedia.org/T176412 (10daniel) [16:59:45] ottomata: i have a meeting in 2 mins but in half an hour I was thinking of restarting pageview jobs [17:00:09] wait, i need to deploy the change to refinery that bumps up teh jar first, will do that cc milimetric , fdans [17:00:39] yes, lemme know if you want more eyes [17:04:38] k [17:33:20] !log finished deploying refinery (no refinery source deploy, just bumping up jars) [17:33:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:38:37] (03CR) 10Nuria: [V: 03+2 C: 03+2] Hash tokens from the EL Sanitization white-list for iOS app [analytics/refinery] - 10https://gerrit.wikimedia.org/r/520134 (https://phabricator.wikimedia.org/T226849) (owner: 10Chelsyx) [17:39:18] ottomata: for this i kill the bundle right? https://hue.wikimedia.org/oozie/list_oozie_bundle/0002737-190626064919032-oozie-oozi-B [17:39:51] cc milimetric , or just teh coordinator? [17:42:00] nuria: the bundle, because both coordinators need to be restarted [17:42:05] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie/Administration#Gotchas_when_restarting_webrequest_load_bundle [17:42:18] (some quick info about the restart if needed) [17:43:11] milimetric: ok, just killed https://hue.wikimedia.org/oozie/list_oozie_bundle/0002737-190626064919032-oozie-oozi-B [17:48:34] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10Isaac) My assessment of the files that were recommended for release from mtizzoni's home directory: * Dataset 1: aggregated pageview count for each day in 2016 across the whole US for 128 different... [17:48:49] ottomata: restart command looks like: [17:48:52] https://www.irccloud.com/pastebin/Z6i6FQb6/ [17:49:54] cc milimetric [17:50:18] looking [17:50:49] nuria: is the start time the earliest across the coordinators? [17:51:01] milimetric: ah let me see upload, good call [17:51:20] (it's from the link Luca sent) [17:51:34] milimetric: yes it is [17:51:45] milimetric: i just had realized that i had looked at text alone [17:51:54] milimetric: ok, restarting [17:52:05] nuria: also, why specify the launcher_memory? Isn't that set properly in bundle.properties? [17:52:25] milimetric: ya [17:52:31] milimetric: removing [17:52:47] k, I don't see anything else [17:53:08] I generally do the hdfs dfs -ls command to get the version and paste that in the command [17:53:13] just in case there's any surprises [17:54:58] milimetric: ok, re-sarted, will look at data in couple hours https://hue.wikimedia.org/oozie/list_oozie_bundle/0007610-190715143115257-oozie-oozi-B [18:09:20] * elukey off! [18:37:46] hip: milimetric , i think the first half of the design doc is stable. it looks like we are still iterating on the workflow process [18:37:54] but the first half is good enough for me to send to SRE/REleng [18:38:06] for comments [18:38:10] going to go ahead and do that [18:38:14] ottomata: yeah, sounds good [18:42:42] sounds good