August 23, 1999
Designing a growing back end
Veterans of high-volume Web sites weigh in on how to plan to
expand
By Lynda Radosevich
These days, if your business strategy does not
include running a dot-com with hypergrowth traffic rates, you are
considered passé. But increased traffic is destined to make Web sites
operate more slowly and crash -- unless IT professionals build the
behind-the-scenes servers, software, and networks to handle heavy-duty
use before the crowds come.
In a recent panel discussion, four technology officers at companies
with dynamic, high-volume Web sites discussed what they have learned
about scaling back-end systems. Back-end systems comprise the core
server components -- including applications, databases, and supporting
technologies, such as load-balancing software -- rather than static HTML
Web servers.
The panel included Marc Hansen, chief technology officer (CTO) at
iEngineer.com, a San Francisco-based engineering portal, and former vice
president of systems architecture at J. Crew; Rick Lamb, chief operating
officer at Bottle Rocket, a New York-based online game site; Jon Tulk,
CTO at HealthAxis.com, an online health insurance company in Norristown,
Penn., and former CTO at music site N2K; and Dan Woods, CTO at
TheStreet.com, a New York-based financial news site. InfoWorld
Deputy Features Editor Lynda Radosevich moderated the forum.
The panelists described the experience and wise practices they picked
up preparing their companies' Web sites for high-volume traffic.
Expect Web site traffic demands and bottlenecks to be moving
targets
Jon Tulk: "The traffic goes up, and you see that you're
starting to get a little slow in one area. You figure out where [the
bottleneck is] coming from, whether it's a network issue, whether it's
CPU capacity, [or whether it's related to the] memory disc, database, or
whatever. You go in and resolve that problem; then the next day, the
system's getting slow again. You're almost never chasing the same
problem twice."
Rick Lamb: Referring to a former job at an online game
company: "We had no idea what the spikes and surges would be against our
system."
Focus on
the back-end database and overall architecture
Marc Hansen: "Performance isn't any one thing; it's a
good fundamental architecture, and then it's hundreds of things. We
fine-tuned our cache on our database so well -- using a main cache and
gigabytes of memory -- [that] our disk activity was extraordinarily low,
except for the disk log. Eventually our bottleneck became the rotation
to the hard disk. As time goes on, the bottlenecks will change ... but
if you keep all the historical data, you can go back to it and predict
what's going to happen."
Dan Woods: "Segment your architecture so you have lots
of smaller machines ... handling one application."
Marc Hansen: "Think of [structuring your architecture]
like constructing a building. We could order a big pile of lumber and
start nailing it together, or we can create a clear set of plans before
we start. When it comes to creating the plans, there are home builders
and there are architects who work on large-scale buildings. There is a
continuum of skills available. Matching the skills to the size of the
project is important. Choosing the correct initial architecture is also
important. If you start with a wood-frame house, it will be difficult to
grow into a 40-story office building.
Select a high-quality application server, and buy scalable
database servers
Jon Tulk: "[The application server] should provide
application-server load balancing, session-state management,
database-connection pooling, thread pooling, data or content caching,
and, ideally, some monitoring, measurement, and management facilities.
We have selected Bluestone Sapphire/Web, which is an excellent product."
Marc Hansen: "We're talking about real databases here,
not PCs with [Microsoft] SQL Server on them. Put in place all the
properties you would put in place for a mainframe. ... In fact, I would
recommend you hire yourself an IBM data center manager."
Rick Lamb: "[This] is absolutely true for
application-server and database-server layers: At the Web-server layer,
PCs running Linux can be a perfectly acceptable solution and slightly
more cost-effective."
Marc Hansen: "I agree with Rick that Linux works fine
for HTTP servers, but I would differ in that I believe that the salary
cost of maintaining skills on another OS [such as Linux] and hardware
cost more than the slight savings in hardware [that Linux can
provide]."
Build a good data model up front
Jon Tulk: "The data model we had was not designed that
well up front, basically because those of us who first built the site
didn't have database backgrounds. We didn't realize ... how hard it is
to change a data model once you're on your way. And we carried that
history with us for the next five years. It affected the performance of
the site. It affected the ease of integrating new features and adding
data fields and columns to the data tables. It affected the ability to
replicate the database, and that affected the ability to do
distribution."
Rick Lamb: "Design your database into subject areas and
entities -- don't rush the design. Review all application requirements
and loads prior to committing to a database implementation. Test the
system on paper, in diagrams. Find and productively employ experienced
data modelers."
Get good database administrators, and pay them well
Jon Tulk: "With help from a couple of consulting DBAs
[database administrators], we added a lot of indexes [to the database],
and we had a huge difference in performance. The load on our server went
down almost in half at the same time as our traffic nearly tripled --
this was going into last Christmas season -- and it was all to do with
optimizations. It's a lot easier to fix this up front by finding the
good data modelists, and with a good design."
Marc Hansen: "We were shocked to discover the market
price [for a DBA], which was about double what we budgeted -- about six
figures." However, "It turned out to be the best investment we ever made
in terms of solving scalability problems. A good DBA pays [a company
back in hardware savings]."
Dan Woods: "If you can find really smart people, I
would pay them above market rates. You should never have them thinking
about their compensation. They get 15 to 20 headhunter calls a month,
and [compensation] should be a nonissue.
"I've got a playbook on [hiring good DBAs]. First, the person must be
personally secure. They must not brag about how much they know and what
they did. They must be interested in your business and what you want to
do. Second, they must understand the role of technology -- to help
business, not to be an end in itself. Third is, are they smart? And the
last is whether they have experience. I find if you run that playbook
over and over, you get an environment where you have really smart people
who attract other really smart people and nobody is a bully or
fearmonger."
Cache data and pieces of static content that will be used to
construct a page
Dan Woods: "People get drunk on dynamic content with
unproven business value, such as push and personalization. ... Keep
everything, if possible, on static pages; avoid dynamic content as much
as possible. The way you handle that for a large-scale Web site is, you
have a high-speed templating agent that is working against application
components that are cached."
Marc Hansen: "Dan's comments about static content are
true, but there are some kinds of Web businesses that don't lend
themselves to static content, like e-commerce. For example, at J. Crew,
when you want to buy something, we actually verify inventory in real
time in our inventory management system that ran on CICS on [an IBM]
System 390."
Dan Woods: "I would much rather do something like take
the screen of updates from your real-time inventory and update all the
static pages in the background, then run the site using Thunderstone
[dynamic publishing software], so you're delivering flat pages all the
time. Then you've got the maximum scalable system."
Do capacity planning for traffic spikes and ongoing
growth
Jon Tulk: "Traffic planning is an absolute must, and
it's hard to do when you start, because you don't have enough data to
predict off of. After building some data, you can use a spreadsheet to
create a simple traffic prediction model based on the historical data.
Get more sophisticated later as the need demands and time permits. Also,
it is good to choose some performance metrics that you will measure over
time. One useful metric is the number of Web pages [served] per minute,
per CPU. This can be used to predict hardware requirements against the
traffic model or to monitor system performance changes over time."
Dan Woods: "We figure that [building capacity for] six
to eight times the average load is what we need to not have to turn
anybody away. ... Our big days are around a million page views. We think
that on the biggest day, we'll probably get two and a half to 3 million
[page views]."
Marc Hansen: "Measure everything that's going on, and
keep the data. Because your business people will decide to run a
promotion, and they'll say, `Remember that promotion we did last week?
We're going to run one like it but with double the traffic.' Now you can
go back and look at all your instrumented data and see what's going to
happen."
Use outsourcing to host sites more reliably and less expensively
than you can in-house, but develop and prototype new applications
in-house, where you can make changes more rapidly
Rick Lamb: "You simply can't take upon yourself the
complete responsibility of putting together a bulletproof design,
reacting to what are usually pretty aggressive schedule demands. We use
consultants on the design side. We use consultants on the program
management side. ... There is tremendous value to capturing and
preserving the intelligence from consultants. ... It's often difficult
to enable a high degree of cooperation between outsource consultants and
in-house staff, but you've got to overcome that with human management."
Dan Woods: "It is very difficult to duplicate
internally what you can attain from the first-tier colocation providers.
I mean, these organizations have separate power feeds from different
grids, multiple UPSes [uninterruptible power supplies], multiple
generators, and full-time staff to maintain the power and environment
infrastructure. It would never be cost-effective for us to do that on an
individual basis."
Jon Tulk: "At N2K, we sourced our own machines ...
because we already had a computer facility. And we preferred having the
system right there in the next room. Likewise, at HealthAxis, we're
bringing the site home, because for the next few months, while we're
really building the thing and changing hardware, architecture, and
software, it's going to be advantageous to have really tight access to
[the system]."
If your company is going public, establish the focus of the
business and the stability of the Web site before the initial public
offering
Dan Woods: "Once you become public, you have a whole
new set of people talking to your senior management. The Wall Street
analysts are going to ask, `What are you guys doing about disaster
recovery?' ... A lot of hard questions are being asked about
environments."
Deputy Features Editor Lynda Radosevich thinks proper planning can
keep growing back ends from adding too much fat to the IT
department.
For rundown of the most recent feature stories on InfoWorld
Electric see Features
at a glance.
Questions or
comments? Send an e-mail to our Editors.