Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add www.yomiuri.co.jp custom parser #381

Merged
merged 2 commits into from
Apr 24, 2019

Conversation

kik0220
Copy link
Contributor

@kik0220 kik0220 commented Apr 15, 2019

add www.yomiuri.co.jp custom parser

@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: feat: add www.yomiuri.co.jp custom parser

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "バルセロナ、マンUに先勝…CL準々決勝第1戦",
  "content": "<div><div class=\"p-main-contents\">\n            <figure id=\"attachment_532857\" class=\"wp-caption none thumbnails-left\"><a href=\"https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg\"><img src=\"https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg?type=large\" alt=\"&#x5148;&#x5236;&#x70B9;&#x304C;&#x6C7A;&#x307E;&#x308A;&#x3001;&#x559C;&#x3076;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#x306E;&#x30E1;&#x30C3;&#x30B7;&#xFF08;&#x5DE6;&#xFF09;&#x3068;&#x30B9;&#x30A2;&#x30EC;&#x30B9;&#xFF1D;&#x30ED;&#x30A4;&#x30BF;&#x30FC;\" width=\"563\" class=\"wp-image-532857 alignleft\"></a><figcaption class=\"wp-caption-text\">&#x5148;&#x5236;&#x70B9;&#x304C;&#x6C7A;&#x307E;&#x308A;&#x3001;&#x559C;&#x3076;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#x306E;&#x30E1;&#x30C3;&#x30B7;&#xFF08;&#x5DE6;&#xFF09;&#x3068;&#x30B9;&#x30A2;&#x30EC;&#x30B9;&#xFF1D;&#x30ED;&#x30A4;&#x30BF;&#x30FC;</figcaption></figure><p class=\"par1\">&#x3000;&#x3010;&#x30ED;&#x30F3;&#x30C9;&#x30F3;&#xFF1D;&#x5CA1;&#x7530;&#x6D69;&#x5E78;&#x3011;&#xFF11;&#xFF10;&#x65E5;&#x306B;&#x884C;&#x308F;&#x308C;&#x305F;&#x30B5;&#x30C3;&#x30AB;&#x30FC;&#x30FB;&#x6B27;&#x5DDE;&#x30C1;&#x30E3;&#x30F3;&#x30D4;&#x30AA;&#x30F3;&#x30BA;&#x30EA;&#x30FC;&#x30B0;&#xFF08;&#xFF23;&#xFF2C;&#xFF09;&#x306E;&#x6E96;&#x3005;&#x6C7A;&#x52DD;&#x7B2C;&#xFF11;&#x6226;&#x3067;&#x3001;&#xFF14;&#x5B63;&#x3076;&#x308A;&#x306E;&#x512A;&#x52DD;&#x3092;&#x76EE;&#x6307;&#x3059;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#xFF08;&#x30B9;&#x30DA;&#x30A4;&#x30F3;&#xFF09;&#x306F;&#x30A2;&#x30A6;&#x30A7;&#x30FC;&#x3067;&#x30DE;&#x30F3;&#x30C1;&#x30A7;&#x30B9;&#x30BF;&#x30FC;&#x30FB;&#x30E6;&#x30CA;&#x30A4;&#x30C6;&#x30C3;&#x30C9;&#xFF08;&#x30A4;&#x30F3;&#x30B0;&#x30E9;&#x30F3;&#x30C9;&#xFF09;&#x306B;&#xFF11;&#x2015;&#xFF10;&#x3067;&#x52DD;&#x3061;&#x3001;&#xFF14;&#x5F37;&#x5165;&#x308A;&#x306B;&#x524D;&#x9032;&#x3057;&#x305F;&#x3002;&#x30A2;&#x30E4;&#x30C3;&#x30AF;&#x30B9;&#xFF08;&#x30AA;&#x30E9;&#x30F3;&#x30C0;&#xFF09;&#x306F;&#x30DB;&#x30FC;&#x30E0;&#x3067;&#x30E6;&#x30D9;&#x30F3;&#x30C8;&#x30B9;&#xFF08;&#x30A4;&#x30BF;&#x30EA;&#x30A2;&#xFF09;&#x3068;&#xFF11;&#x2015;&#xFF11;&#x3067;&#x5F15;&#x304D;&#x5206;&#x3051;&#x305F;&#x3002;&#x30E6;&#x30D9;&#x30F3;&#x30C8;&#x30B9;&#x306F;&#x3001;&#x3051;&#x304C;&#x304B;&#x3089;&#x5FA9;&#x5E30;&#x3057;&#x305F;&#x30DD;&#x30EB;&#x30C8;&#x30AC;&#x30EB;&#x4EE3;&#x8868;&#xFF26;&#xFF37;&#x30ED;&#x30CA;&#x30EB;&#x30C9;&#x304C;&#x5148;&#x5236;&#x70B9;&#x3092;&#x6319;&#x3052;&#x305F;&#x3002;&#x4E21;&#x30AB;&#x30FC;&#x30C9;&#x306E;&#x7B2C;&#xFF12;&#x6226;&#x306F;&#xFF11;&#xFF16;&#x65E5;&#x306B;&#x884C;&#x308F;&#x308C;&#x308B;&#x3002;</p>\n\n\n\n                      </div></div>",
  "author": null,
  "date_published": "2019-04-11T13:22:00.000Z",
  "lead_image_url": "https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg?type=ogp",
  "dek": null,
  "next_page_url": null,
  "url": "https://www.yomiuri.co.jp/sports/soccer/20190411-OYT1T50287/",
  "domain": "www.yomiuri.co.jp",
  "word_count": 59,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • dek

  • next_page_url

✅ All tests passed

@kik0220 kik0220 force-pushed the feat-yomiuri-co-jp-extractor branch from f47eef9 to c535072 Compare April 21, 2019 08:52
@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: feat: add www.yomiuri.co.jp custom parser

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "バルセロナ、マンUに先勝…CL準々決勝第1戦",
  "content": "<div><div class=\"p-main-contents\">\n            <figure id=\"attachment_532857\" class=\"wp-caption none thumbnails-left\"><a href=\"https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg\"><img src=\"https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg?type=large\" alt=\"&#x5148;&#x5236;&#x70B9;&#x304C;&#x6C7A;&#x307E;&#x308A;&#x3001;&#x559C;&#x3076;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#x306E;&#x30E1;&#x30C3;&#x30B7;&#xFF08;&#x5DE6;&#xFF09;&#x3068;&#x30B9;&#x30A2;&#x30EC;&#x30B9;&#xFF1D;&#x30ED;&#x30A4;&#x30BF;&#x30FC;\" width=\"563\" class=\"wp-image-532857 alignleft\"></a><figcaption class=\"wp-caption-text\">&#x5148;&#x5236;&#x70B9;&#x304C;&#x6C7A;&#x307E;&#x308A;&#x3001;&#x559C;&#x3076;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#x306E;&#x30E1;&#x30C3;&#x30B7;&#xFF08;&#x5DE6;&#xFF09;&#x3068;&#x30B9;&#x30A2;&#x30EC;&#x30B9;&#xFF1D;&#x30ED;&#x30A4;&#x30BF;&#x30FC;</figcaption></figure><p class=\"par1\">&#x3000;&#x3010;&#x30ED;&#x30F3;&#x30C9;&#x30F3;&#xFF1D;&#x5CA1;&#x7530;&#x6D69;&#x5E78;&#x3011;&#xFF11;&#xFF10;&#x65E5;&#x306B;&#x884C;&#x308F;&#x308C;&#x305F;&#x30B5;&#x30C3;&#x30AB;&#x30FC;&#x30FB;&#x6B27;&#x5DDE;&#x30C1;&#x30E3;&#x30F3;&#x30D4;&#x30AA;&#x30F3;&#x30BA;&#x30EA;&#x30FC;&#x30B0;&#xFF08;&#xFF23;&#xFF2C;&#xFF09;&#x306E;&#x6E96;&#x3005;&#x6C7A;&#x52DD;&#x7B2C;&#xFF11;&#x6226;&#x3067;&#x3001;&#xFF14;&#x5B63;&#x3076;&#x308A;&#x306E;&#x512A;&#x52DD;&#x3092;&#x76EE;&#x6307;&#x3059;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#xFF08;&#x30B9;&#x30DA;&#x30A4;&#x30F3;&#xFF09;&#x306F;&#x30A2;&#x30A6;&#x30A7;&#x30FC;&#x3067;&#x30DE;&#x30F3;&#x30C1;&#x30A7;&#x30B9;&#x30BF;&#x30FC;&#x30FB;&#x30E6;&#x30CA;&#x30A4;&#x30C6;&#x30C3;&#x30C9;&#xFF08;&#x30A4;&#x30F3;&#x30B0;&#x30E9;&#x30F3;&#x30C9;&#xFF09;&#x306B;&#xFF11;&#x2015;&#xFF10;&#x3067;&#x52DD;&#x3061;&#x3001;&#xFF14;&#x5F37;&#x5165;&#x308A;&#x306B;&#x524D;&#x9032;&#x3057;&#x305F;&#x3002;&#x30A2;&#x30E4;&#x30C3;&#x30AF;&#x30B9;&#xFF08;&#x30AA;&#x30E9;&#x30F3;&#x30C0;&#xFF09;&#x306F;&#x30DB;&#x30FC;&#x30E0;&#x3067;&#x30E6;&#x30D9;&#x30F3;&#x30C8;&#x30B9;&#xFF08;&#x30A4;&#x30BF;&#x30EA;&#x30A2;&#xFF09;&#x3068;&#xFF11;&#x2015;&#xFF11;&#x3067;&#x5F15;&#x304D;&#x5206;&#x3051;&#x305F;&#x3002;&#x30E6;&#x30D9;&#x30F3;&#x30C8;&#x30B9;&#x306F;&#x3001;&#x3051;&#x304C;&#x304B;&#x3089;&#x5FA9;&#x5E30;&#x3057;&#x305F;&#x30DD;&#x30EB;&#x30C8;&#x30AC;&#x30EB;&#x4EE3;&#x8868;&#xFF26;&#xFF37;&#x30ED;&#x30CA;&#x30EB;&#x30C9;&#x304C;&#x5148;&#x5236;&#x70B9;&#x3092;&#x6319;&#x3052;&#x305F;&#x3002;&#x4E21;&#x30AB;&#x30FC;&#x30C9;&#x306E;&#x7B2C;&#xFF12;&#x6226;&#x306F;&#xFF11;&#xFF16;&#x65E5;&#x306B;&#x884C;&#x308F;&#x308C;&#x308B;&#x3002;</p>\n\n\n\n                      </div></div>",
  "author": null,
  "date_published": "2019-04-11T13:22:00.000Z",
  "lead_image_url": "https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg?type=ogp",
  "dek": null,
  "next_page_url": null,
  "url": "https://www.yomiuri.co.jp/sports/soccer/20190411-OYT1T50287/",
  "domain": "www.yomiuri.co.jp",
  "word_count": 59,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • dek

  • next_page_url

✅ All tests passed

@postlight-org
Copy link
Collaborator

🤖 Automated Parsing Preview 🤖

Commit: Merge branch 'master' into feat-yomiuri-co-jp-extractor

Screenshot of fixture (this embed should work after repo is public)

Original Article | HTML Fixture | Parsed Content Preview

Parsed JSON
{
  "title": "バルセロナ、マンUに先勝…CL準々決勝第1戦",
  "content": "<div><div class=\"p-main-contents\">\n            <figure id=\"attachment_532857\" class=\"wp-caption none thumbnails-left\"><a href=\"https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg\"><img src=\"https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg?type=large\" alt=\"&#x5148;&#x5236;&#x70B9;&#x304C;&#x6C7A;&#x307E;&#x308A;&#x3001;&#x559C;&#x3076;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#x306E;&#x30E1;&#x30C3;&#x30B7;&#xFF08;&#x5DE6;&#xFF09;&#x3068;&#x30B9;&#x30A2;&#x30EC;&#x30B9;&#xFF1D;&#x30ED;&#x30A4;&#x30BF;&#x30FC;\" width=\"563\" class=\"wp-image-532857 alignleft\"></a><figcaption class=\"wp-caption-text\">&#x5148;&#x5236;&#x70B9;&#x304C;&#x6C7A;&#x307E;&#x308A;&#x3001;&#x559C;&#x3076;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#x306E;&#x30E1;&#x30C3;&#x30B7;&#xFF08;&#x5DE6;&#xFF09;&#x3068;&#x30B9;&#x30A2;&#x30EC;&#x30B9;&#xFF1D;&#x30ED;&#x30A4;&#x30BF;&#x30FC;</figcaption></figure><p class=\"par1\">&#x3000;&#x3010;&#x30ED;&#x30F3;&#x30C9;&#x30F3;&#xFF1D;&#x5CA1;&#x7530;&#x6D69;&#x5E78;&#x3011;&#xFF11;&#xFF10;&#x65E5;&#x306B;&#x884C;&#x308F;&#x308C;&#x305F;&#x30B5;&#x30C3;&#x30AB;&#x30FC;&#x30FB;&#x6B27;&#x5DDE;&#x30C1;&#x30E3;&#x30F3;&#x30D4;&#x30AA;&#x30F3;&#x30BA;&#x30EA;&#x30FC;&#x30B0;&#xFF08;&#xFF23;&#xFF2C;&#xFF09;&#x306E;&#x6E96;&#x3005;&#x6C7A;&#x52DD;&#x7B2C;&#xFF11;&#x6226;&#x3067;&#x3001;&#xFF14;&#x5B63;&#x3076;&#x308A;&#x306E;&#x512A;&#x52DD;&#x3092;&#x76EE;&#x6307;&#x3059;&#x30D0;&#x30EB;&#x30BB;&#x30ED;&#x30CA;&#xFF08;&#x30B9;&#x30DA;&#x30A4;&#x30F3;&#xFF09;&#x306F;&#x30A2;&#x30A6;&#x30A7;&#x30FC;&#x3067;&#x30DE;&#x30F3;&#x30C1;&#x30A7;&#x30B9;&#x30BF;&#x30FC;&#x30FB;&#x30E6;&#x30CA;&#x30A4;&#x30C6;&#x30C3;&#x30C9;&#xFF08;&#x30A4;&#x30F3;&#x30B0;&#x30E9;&#x30F3;&#x30C9;&#xFF09;&#x306B;&#xFF11;&#x2015;&#xFF10;&#x3067;&#x52DD;&#x3061;&#x3001;&#xFF14;&#x5F37;&#x5165;&#x308A;&#x306B;&#x524D;&#x9032;&#x3057;&#x305F;&#x3002;&#x30A2;&#x30E4;&#x30C3;&#x30AF;&#x30B9;&#xFF08;&#x30AA;&#x30E9;&#x30F3;&#x30C0;&#xFF09;&#x306F;&#x30DB;&#x30FC;&#x30E0;&#x3067;&#x30E6;&#x30D9;&#x30F3;&#x30C8;&#x30B9;&#xFF08;&#x30A4;&#x30BF;&#x30EA;&#x30A2;&#xFF09;&#x3068;&#xFF11;&#x2015;&#xFF11;&#x3067;&#x5F15;&#x304D;&#x5206;&#x3051;&#x305F;&#x3002;&#x30E6;&#x30D9;&#x30F3;&#x30C8;&#x30B9;&#x306F;&#x3001;&#x3051;&#x304C;&#x304B;&#x3089;&#x5FA9;&#x5E30;&#x3057;&#x305F;&#x30DD;&#x30EB;&#x30C8;&#x30AC;&#x30EB;&#x4EE3;&#x8868;&#xFF26;&#xFF37;&#x30ED;&#x30CA;&#x30EB;&#x30C9;&#x304C;&#x5148;&#x5236;&#x70B9;&#x3092;&#x6319;&#x3052;&#x305F;&#x3002;&#x4E21;&#x30AB;&#x30FC;&#x30C9;&#x306E;&#x7B2C;&#xFF12;&#x6226;&#x306F;&#xFF11;&#xFF16;&#x65E5;&#x306B;&#x884C;&#x308F;&#x308C;&#x308B;&#x3002;</p>\n\n\n\n                      </div></div>",
  "author": null,
  "date_published": "2019-04-11T13:22:00.000Z",
  "lead_image_url": "https://www.yomiuri.co.jp/media/2019/04/20190411-OYT1I50076-1.jpg?type=ogp",
  "dek": null,
  "next_page_url": null,
  "url": "https://www.yomiuri.co.jp/sports/soccer/20190411-OYT1T50287/",
  "domain": "www.yomiuri.co.jp",
  "word_count": 59,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

null fields

  • author

  • dek

  • next_page_url

✅ All tests passed

@toufic-m toufic-m merged commit 7b07f88 into postlight:master Apr 24, 2019
@kik0220 kik0220 deleted the feat-yomiuri-co-jp-extractor branch April 24, 2019 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants