mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
224 lines
10 KiB
Text
224 lines
10 KiB
Text
![]() |
==Phrack Inc.==
|
||
|
|
||
|
Volume 0x0b, Issue 0x39, Phile #0x0a of 0x12
|
||
|
|
||
|
|=-------------=[ Against the System: Rise of the Robots ]=--------------=|
|
||
|
|=-----------------------------------------------------------------------=|
|
||
|
|=-=[ (C)Copyright 2001 by Michal Zalewski <lcamtuf@bos.bindview.com> ]=-=|
|
||
|
|
||
|
|
||
|
-- [1] Introduction -------------------------------------------------------
|
||
|
|
||
|
"[...] big difference between the web and traditional well controlled
|
||
|
collections is that there is virtually no control over what people can
|
||
|
put on the web. Couple this flexibility to publish anything with the
|
||
|
enormous influence of search engines to route traffic and companies
|
||
|
which deliberately manipulating search engines for profit become a
|
||
|
serious problem."
|
||
|
|
||
|
-- Sergey Brin, Lawrence Page (see references, [A])
|
||
|
|
||
|
Consider a remote exploit that is able to compromise a remote system
|
||
|
without sending any attack code to his victim. Consider an exploit
|
||
|
which simply creates local file to compromise thousands of computers,
|
||
|
and which does not involve any local resources in the attack. Welcome to
|
||
|
the world of zero-effort exploit techniques. Welcome to the world of
|
||
|
automation, welcome to the world of anonymous, dramatically difficult
|
||
|
to stop attacks resulting from increasing Internet complexity.
|
||
|
|
||
|
Zero-effort exploits create their 'wishlist', and leave it somewhere
|
||
|
in cyberspace - can be even its home host, in the place where others
|
||
|
can find it. Others - Internet workers (see references, [D]) - hundreds
|
||
|
of never sleeping, endlessly browsing information crawlers, intelligent
|
||
|
agents, search engines... They come to pick this information, and -
|
||
|
unknowingly - to attack victims. You can stop one of them, but can't
|
||
|
stop them all. You can find out what their orders are, but you can't
|
||
|
guess what these orders will be tomorrow, hidden somewhere in the abyss
|
||
|
of not yet explored cyberspace.
|
||
|
|
||
|
Your private army, close at hand, picking orders you left for them
|
||
|
on their way. You exploit them without having to compromise them. They
|
||
|
do what they are designed for, and they do their best to accomplish it.
|
||
|
Welcome to the new reality, where our A.I. machines can rise against us.
|
||
|
|
||
|
Consider a worm. Consider a worm which does nothing. It is carried and
|
||
|
injected by others - but not by infecting them. This worm creates a
|
||
|
wishlist - wishlist of, for example, 10,000 random addresses. And waits.
|
||
|
Intelligent agents pick this list, with their united forces they try to
|
||
|
attack all of them. Imagine they are not lucky, with 0.1% success ratio.
|
||
|
Ten new hosts infected. On every of them, the worm does extactly the
|
||
|
same - and agents come back, to infect 100 hosts. The story goes - or
|
||
|
crawls, if you prefer.
|
||
|
|
||
|
Agents work virtually invisibly, people get used to their presence
|
||
|
everywhere. And crawlers just slowly go ahead, in never-ending loop.
|
||
|
They work systematically, they do not choke with excessive data - they
|
||
|
crawl, there's no "boom" effect. Week after week after week, they try
|
||
|
new hosts, carefully, not overloading network uplinks, not generating
|
||
|
suspected traffic, recurrent exploration never ends. Can you notice
|
||
|
they carry a worm? Possibly...
|
||
|
|
||
|
-- [2] An example ---------------------------------------------------------
|
||
|
|
||
|
When this idea came to my mind, I tried to use the simpliest test, just
|
||
|
to see if I am right. I targeted, if that's the right word, general-purpose
|
||
|
web indexing crawlers. I created very short HTML document and put it
|
||
|
somewhere. And waited few weeks. And then they come. Altavista, Lycos
|
||
|
and dozens of others. They found new links and picked them
|
||
|
enthusiastically, then disappeared for days.
|
||
|
|
||
|
bigip1-snat.sv.av.com:
|
||
|
GET /indexme.html HTTP/1.0
|
||
|
|
||
|
sjc-fe5-1.sjc.lycos.com:
|
||
|
GET /indexme.html HTTP/1.0
|
||
|
|
||
|
[...]
|
||
|
|
||
|
They came back later, to see what I gave them to parse.
|
||
|
|
||
|
http://somehost/cgi-bin/script.pl?p1=../../../../attack
|
||
|
http://somehost/cgi-bin/script.pl?p1=;attack
|
||
|
http://somehost/cgi-bin/script.pl?p1=|attack
|
||
|
http://somehost/cgi-bin/script.pl?p1=`attack`
|
||
|
http://somehost/cgi-bin/script.pl?p1=$(attack)
|
||
|
http://somehost:54321/attack?`id`
|
||
|
http://somehost/AAAAAAAAAAAAAAAAAAAAA...
|
||
|
|
||
|
|
||
|
Our bots followed them exploiting hypotetical vulnerabilities,
|
||
|
compromising remote servers:
|
||
|
|
||
|
sjc-fe6-1.sjc.lycos.com:
|
||
|
GET /cgi-bin/script.pl?p1=;attack HTTP/1.0
|
||
|
|
||
|
212.135.14.10:
|
||
|
GET /cgi-bin/script.pl?p1=$(attack) HTTP/1.0
|
||
|
|
||
|
bigip1-snat.sv.av.com:
|
||
|
GET /cgi-bin/script.pl?p1=../../../../attack HTTP/1.0
|
||
|
|
||
|
[...]
|
||
|
|
||
|
(BigIP is one of famous "I observe you" load balancers from F5Labs)
|
||
|
Bots happily connected to non-http ports I prepared for them:
|
||
|
|
||
|
GET /attack?`id` HTTP/1.0
|
||
|
Host: somehost
|
||
|
Pragma: no-cache
|
||
|
Accept: text/*
|
||
|
User-Agent: Scooter/1.0
|
||
|
From: scooter@pa.dec.com
|
||
|
|
||
|
GET /attack?`id` HTTP/1.0
|
||
|
User-agent: Lycos_Spider_(T-Rex)
|
||
|
From: spider@lycos.com
|
||
|
Accept: */*
|
||
|
Connection: close
|
||
|
Host: somehost:54321
|
||
|
|
||
|
GET /attack?`id` HTTP/1.0
|
||
|
Host: somehost:54321
|
||
|
From: crawler@fast.no
|
||
|
Accept: */*
|
||
|
User-Agent: FAST-WebCrawler/2.2.6 (crawler@fast.no; [...])
|
||
|
Connection: close
|
||
|
|
||
|
[...]
|
||
|
|
||
|
But not only publicly available crawlbot engines can be targeted.
|
||
|
Crawlbots from alexa.com, ecn.purdue.edu, visual.com, poly.edu,
|
||
|
inria.fr, powerinter.net, xyleme.com, and even more unidentified
|
||
|
crawl engines found this page and enjoyed it. Some robots didn't
|
||
|
pick all URLs. For example, some crawlers do not index scripts
|
||
|
at all, others won't use non-standard ports. But majority of
|
||
|
the most powerful bots will do - and even if not, trivial filtering
|
||
|
is not the answer. Many IIS vulnerabilities and so on can be triggered
|
||
|
without invoking any scripts.
|
||
|
|
||
|
What if this server list was randomly generated, 10,000 IPs or 10,000
|
||
|
.com domains? What is script.pl is replaced with invocations of
|
||
|
three, four, five or ten most popular IIS vulnerabilities or
|
||
|
buggy Unix scripts? What if one out of 2,000 is actually exploited?
|
||
|
|
||
|
What if somehost:54321 points to vulnerable service which can
|
||
|
be exploited with partially user-dependent contents of HTTP
|
||
|
requests (I consider majority of fool-proof services that do not
|
||
|
drop connections after first invalid command vulnerable)? What if...
|
||
|
|
||
|
There is an army of robots, different species, different functions,
|
||
|
different levels of intelligence. And these robots will do whatever
|
||
|
you tell them to do. It is scary.
|
||
|
|
||
|
-- [3] Social considerations ----------------------------------------------
|
||
|
|
||
|
Who is guilty if webcrawler compromises your system? The most obvious
|
||
|
answer is: the author of original webpage crawler visited. But webpage
|
||
|
authors are hard to trace, and web crawler indexing cycle takes
|
||
|
weeks. It is hard to determine when specific page was put on the net
|
||
|
- they can be delivered in so many ways, processed by other robots
|
||
|
earlier; there is no tracking mechanism we can find in SMTP protocol and
|
||
|
many others. Moreover, many crawlers don't remember where they "learned"
|
||
|
new URLs. Additional problems are caused by indexing flags, like "noindex"
|
||
|
without "nofollow" option. In many cases, author's identity and attack
|
||
|
origin wouldn't be determined, while compromises would take place.
|
||
|
|
||
|
And, finally, what if having particular link followed by bots wasn't
|
||
|
what the author meant? Consider "educational" papers, etc - bots won't
|
||
|
read the disclaimer and big fat warning "DO NOT TRY THESE LINKS"...
|
||
|
|
||
|
By analogy to other cases, e.g. Napster forced to filter their contents
|
||
|
(or shutdown their services) because of copyrighted information exchanged
|
||
|
by their users, causing losses, it is reasonable to expect that
|
||
|
intelligent bot developers would be forced to implement specific filters,
|
||
|
or to pay enormous compensations to victims suffering because of bot
|
||
|
abuse.
|
||
|
|
||
|
On the other hand, it seems almost impossible to successfully filter
|
||
|
contents to elliminate malicious code, if you consider the number and
|
||
|
wide variety of known vulnerabilities. Not to mention targeted attacks
|
||
|
(see references, [B], for more information on proprietary solutions and
|
||
|
their insecuritities). So the problem persists. Additional issue is that
|
||
|
not all crawler bots are under U.S. jurisdiction, which makes whole
|
||
|
problem more complicated (in many countries, U.S. approach is found at
|
||
|
least controversial).
|
||
|
|
||
|
-- [4] Defense ------------------------------------------------------------
|
||
|
|
||
|
As discussed above, webcrawler itself has very limited defense and
|
||
|
avoidance possibilities, due to wide variety of web-based
|
||
|
vulnerabilities. One of more reasonable defense ideas is to use
|
||
|
secure and up-to-date software, but - obviously - this concept is
|
||
|
extremely unpopular for some reasons - www.google.com, with
|
||
|
unique documents filter enabled, returns 62,100 matches for "cgi
|
||
|
vulnerability" query (see also references, [D]).
|
||
|
|
||
|
Another line of defense from bots is using /robots.txt standard
|
||
|
robot exclusion mechanism (see references, [C], for specifications).
|
||
|
The price you have to pay is partial or complete exclusion of your
|
||
|
site from search engines, which, in most cases, is undesired. Also,
|
||
|
some robots are broken, and do not respect /robots.txt when following
|
||
|
a direct link to new website.
|
||
|
|
||
|
-- [5] References ---------------------------------------------------------
|
||
|
|
||
|
[A] "The Anatomy of a Large-Scale Hypertextual Web Search Engine"
|
||
|
Googlebot concept, Sergey Brin, Lawrence Page, Stanford University
|
||
|
URL: http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm
|
||
|
|
||
|
[B] Proprietary web solutions security, Michal Zalewski
|
||
|
URL: http://lcamtuf.coredump.cx/milpap.txt
|
||
|
|
||
|
[C] "A Standard for Robot Exclusion", Martijn Koster
|
||
|
URL: http://info.webcrawler.com/mak/projects/robots/norobots.html
|
||
|
|
||
|
[D] "The Web Robots Database"
|
||
|
URL: http://www.robotstxt.org/wc/active.html
|
||
|
URL: http://www.robotstxt.org/wc/active/html/type.html
|
||
|
|
||
|
[E] "Web Security FAQ", Lincoln D. Stein
|
||
|
URL: http://www.w3.org/Security/Faq/www-security-faq.html
|
||
|
|
||
|
|=[ EOF ]=---------------------------------------------------------------=|
|
||
|
|