LINUX.ORG.RU

Нужно побыстрому распарсить сайт

 


0

1

Есть 1000 ссылок, если по ним пройти wget-ом, то там в html коде нет никакой инфы.

<!DOCTYPE html><html ng-app=app ng-cloak><head><meta charset=utf-8><title ng-bind=MetaTags.title>PIR Expo | Главная</title><meta name=description ng-attr-content="{{ MetaTags.description }}" content="Международный выставочный проект PIR EXPO — главное профессиональное событие в индустрии гостеприимства в России и СНГ"><meta property=og:title ng-attr-content="{{ MetaTags.title }}" content="Pir Expo 2016"><meta property=og:description ng-attr-content="{{ MetaTags.description }}" content="Международный выставочный проект PIR EXPO — главное профессиональное событие в индустрии гостеприимства в России и СНГ"><meta property=og:type content=website><meta property=og:url content=https://pirexpo.com/ ><meta property=og:image content=https://pirexpo.com/og.png><meta property=og:image:width content=1200><meta property=og:image:height content=630><meta name=viewport content="width=1024"><meta name=yandex-verification content=e83390cccf1b78a2><meta name=fragment content=!><link id=favicon href=/favicon.ico rel=icon type=image/x-icon><script src=https://api.pir.ru/api.js></script><script src=/env.js></script><script>window.retina=window.devicePixelRatio>1,window.retina&&document.getElementById("favicon")&&document.getElementById("favicon").setAttribute("href",document.getElementById("favicon").getAttribute("href").replace(".ico","@2x.ico"))</script><script>!function(e,t,a){(t[a]=t[a]||[]).push(function(){try{t.yaCounter39210000=new Ya.Metrika({id:3921e4,clickmap:!0,trackLinks:!0,accurateTrackBounce:!0,webvisor:!0,trackHash:!0})}catch(e){}});var c=e.getElementsByTagName("script")[0],n=e.createElement("script"),r=function(){c.parentNode.insertBefore(n,c)};n.type="text/javascript",n.async=!0,n.src="https://mc.yandex.ru/metrika/watch.js","[object Opera]"==t.opera?e.addEventListener("DOMContentLoaded",r,!1):r()}(document,window,"yandex_metrika_callbacks")</script><link href="/main.fd2ce9f444e9f8849f2f.css" rel="stylesheet"></head><body><script>!function(e,t,a,n,r){e[n]=e[n]||[],e[n].push({"gtm.start":(new Date).getTime(),event:"gtm.js"});var g=t.getElementsByTagName(a)[0],m=t.createElement(a),s="dataLayer"!=n?"&l="+n:"";m.async=!0,m.src="//www.googletagmanager.com/gtm.js?id="+r+s,g.parentNode.insertBefore(m,g)}(window,document,"script","dataLayer","GTM-56HTZV")</script><div ng-class="{ 'page_main': $state.current.name === 'app.pages.main'|| $state.includes('app.pages.exhibitions') }" class=page><header header style=display:none class="header header-on-scroll"><div ng-controller=HeaderTanslateCtrl><div class=header__user-info><div class=header__inner><div ng-click=exhibitionChoiceToggle() class=user-interested><div class=user-interested__label>{{translate.interesting }}:</div><div ng-bind=interestedExhibitions() class=user-interested__text></div><svg class="icon icon_svg"><use xlink:href=#icon-edit></use></svg></div><nav class=user-menu><a ui-sref=app.pages.cart class="user-menu__item user-menu__item_basket"><svg ng-hide=basket.length class="icon icon_svg icon_basket ng-animate-disabled"><use xlink:href=#icon-basket_v2></use></svg><svg ng-show=basket.length class="icon icon_svg icon_basket_v3 ng-animate-disabled"><use xlink:href=#icon-basket_v3></use></svg><span class=user-menu__text-wrapper><span class=user-menu__text>{{translate.cart}}</span><span ng-show=basket.length class="user-menu__colon ng-animate-disabled">:</span><span ng-show=basket.length ng-bind=basket.length class="user-menu__number ng-animate-disabled"></span></span></a><a ui-sref=app.pages.favorites class="user-menu__item user-menu__item_liked"><svg ng-class="{ 'icon_star-liked' : favorites.length }" class="icon icon_svg icon_star ng-animate-disabled"><use xlink:href=#icon-star></use></svg><span class=user-menu__text-wrapper><span class=user-menu__text>{{translate.chosenOne}}</span><span ng-show=favorites.length class="user-menu__colon ng-animate-disabled">:</span><span ng-show=favorites.length ng-bind=favorites.length class="user-menu__number ng-animate-disabled"></span></span></a><a ng-hide=user.isLogin ui-sref=app.popups.signin class="user-menu__item user-menu__item_login ng-animate-disabled"><svg class="icon icon_svg icon_person"><use xlink:href=#icon-person></use></svg><span class=user-menu__text-wrapper><span class=user-menu__text>{{translate.login}}</span></span></a><a ng-show=user.isLogin ui-sref=app.pages.cabinet.profile class="user-menu__item user-menu__item_cabinet ng-animate-disabled"><svg class="icon icon_svg icon_person"><use xlink:href=#icon-person></use></svg><span class=user-menu__text-wrapper><span class=user-menu__text>{{translate.cabinet}}</span></span></a><a ng-show=user.isLogin ng-click=logout() class="user-menu__item user-menu__item_logout ng-animate-disabled"><svg class="icon icon_svg"><use xlink:href=#icon-logout></use></svg><span class=user-menu__text-wrapper><span class=user-menu__text>{{translate.logout}}</span></span></a></nav></div></div><div class=header__separator></div><div class=header__main><div class=header__main-background></div><div class=header__inner><a scroll-to-top=true href="{{ lang === 'ru' ? '/' : '/en'}}" class=header__logo></a><nav class="menu menu_header header__menu"><div ng-class="{active: isExhibitionsPage()}" class="menu__item menu__item_exhibitions">{{translate.exhibitions}}<exhibition-dropdown></exhibition-dropdown></div><a ng-class="{active: isEventsPage()}" ui-sref=app.pages.events.best class=menu__item>{{translate.program}}</a><a ng-class="{active: isExpositionPage()}" ui-sref=app.pages.exposition.companies class=menu__item>{{translate.exposition}}</a><a ui-sref=app.pages.get-ticket class="menu__item menu__item_important">{{translate.getTicket}}</a><a ui-sref-active=active ui-sref=app.pages.exhibitor.participation class=menu__item>{{translate.participate}}</a></nav></div></div></div><div ng-class="{ 'header__border_has': $state.current.name === 'app.pages.main'}" class=header__border></div></header><div additional-menu-toggle style=display:none class=additional-menu-toggle><div class=additional-menu-toggle__line></div><div class=additional-menu-toggle__line></div><div class=additional-menu-toggle__line></div></div><div additional-menu style=display:none class=additional-menu><div class=additional-menu__overlay></div><div class=additional-menu__bar><div class=additional-menu__top><div class=lang-switch><div ng-class="{ 'active': lang == 'ru' }" ng-click="changeLanguage('ru')" class=lang-switch__item>РУС</div><div class=lang-switch__separator></div><div ng-class="{ 'active': lang == 'en' }" ng-click="changeLanguage('en')" class=lang-switch__item>EN</div></div><nav ng-controller=MenuTanslateCtrl class="menu menu_additional additional-menu__menu"><div class=menu__item-wrapper><a ui-sref=app.pages.about.greetings ng-class="{active: isAboutPage()}" class=menu__item>{{ translate.about }}</a></div><div class=menu__item-wrapper><a ui-sref=app.pages.exhibitor.participation ng-class="{active: isExhibitorPage()}" class=menu__item>{{ translate.exhibitor }}</a></div><div class=menu__item-wrapper><a ui-sref=app.pages.press-center.releases ng-class="{active: isPressCenterPage()}" class=menu__item>{{ translate.pressCenter }}</a></div><div class=menu__item-wrapper><a ui-sref-active=active ui-sref=app.pages.contacts class=menu__item>{{ translate.contacts }}</a></div><div class=menu__item-wrapper><a ui-sref-active=active ui-sref=app.pages.exhibitor.hotels class=menu__item>{{ translate.hotels }}</a></div><div class=menu__item-wrapper><a ui-sref-active=active ui-sref=app.pages.location class=menu__item>{{ translate.location }}</a></div></nav></div><div class=additional-menu__contacts><div class=additional-menu__contacts-item><a href=mailto:info@pirexpo.com class="link-text-icon link-text-icon_dark link-text-icon_large link-text-icon_underline"><svg class="icon icon_svg icon_mail_v2"><use xlink:href=#icon-mail_v2></use></svg><span class=link-text-icon__text>info@pirexpo.com</span></a></div><div class=additional-menu__contacts-item><a href=tel:+7(495)637-94-40 class="link-text-icon link-text-icon_dark link-text-icon_large"><svg class="icon icon_svg icon_phone_v2"><use xlink:href=#icon-phone_v2></use></svg><span class=link-text-icon__text>8 495 637-94-40</span></a></div></div></div></div><div class=exhibition-dropdown-overlay></div><exhibition-choice></exhibition-choice><div ui-view class=content></div></div><footer ng-controller=FooterTanslateCtrl ng-cloak hide-till-load class=footer><div class=footer__inner><div class="footer__line footer__line_first"><a scroll-to-top=true href="{{ lang === 'ru' ? '/' : '/en'}}" class=footer__logo></a><div class=footer__contacts><a href=mailto:info@pirexpo.com class="link-text-icon link-text-icon_light"><svg class="icon icon_svg icon_mail"><use xlink:href=#icon-mail></use></svg><span class=link-text-icon__text>info@pirexpo.com</span></a><a href=tel:+7(495)637-94-40 class="link-text-icon link-text-icon_light"><svg class="icon icon_svg icon_phone"><use xlink:href=#icon-phone></use></svg><span class=link-text-icon__text>+7 (495) 637-94-40</span></a></div></div><div class="footer__line footer__line_second"><div class=footer__social><div class="icon icon_social"></div>{{ translate.social }}<social-dropdown></social-dropdown></div><nav class="menu menu_footer footer__menu"><a class="menu__item menu__item_exhibitions">{{ translate.exhibitions }}<exhibition-dropdown type=footer></exhibition-dropdown></a><a ui-sref=app.pages.events.best class=menu__item>{{ translate.program }}</a><a ui-sref=app.pages.exposition.companies class=menu__item>{{ translate.exposition }}</a><a ui-sref=app.pages.get-ticket class=menu__item>{{translate.getTicket}}</a><a ui-sref=app.pages.exhibitor.participation class=menu__item>{{translate.participate}}</a></nav></div><div class="footer__line footer__line_third"><nav class="menu menu_footer-secondary footer__menu"><a ui-sref=app.pages.about.greetings class=menu__item>{{ translate.about }}</a><a ui-sref=app.pages.exhibitor.participation class=menu__item>{{ translate.exhibitor }}</a><a ui-sref=app.pages.press-center.releases class=menu__item>{{ translate.pressCenter }}</a><a ui-sref=app.pages.contacts class=menu__item>{{ translate.contacts }}</a><a ui-sref=app.pages.exhibitor.hotels class=menu__item>{{ translate.hotels }}</a><a ui-sref=app.pages.location class=menu__item>{{ translate.location }}</a></nav></div><div class="footer__line footer__line_fourth"><p class=footer__copy>{{ translate.copy }}.</p><p class=footer__beta>{{ translate.madeIn }}&nbsp;<a href=http://betaagency.ru/ target=_blank>Beta Digital Production</a></p></div></div></footer><script src="/main.fd2ce9f444e9f8849f2f.js"></script></body></html>

Там походу ява скрипт какой-то, я нихера в этом не шарю. Можно как-то получить результат этих скриптов в виде html, json или типа того и какие инструменты можно использовать? Есть у кого опыт в парсинге сайтов?

https://2016.pirexpo.com/exposition/companies/ - сам сайт

Deleted

Последнее исправление: Romashev (всего исправлений: 1)

Ответ на: комментарий от Deleted

Ну да, вообще селениум рулит, не смотря на тормоза бравзеров. Я его сколько-то лет назад на перле скриптовал.

sergej ★★★★★
()
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.