How many lines of code is Candy Japan?
Here in Japan there are many unique sweets which are not sold in other countries, so I started a site called Candy Japan to send those abroad on a twice-monthly basis.
When I started there were no good platforms to start such a service, as "subscription boxes" were not a thing yet. So I had to code everything myself. This post is about how much work that turned out to be.
If you would rather listen to me speak, this is also available as a 14 minute YouTube version. If you'd like to additionally see me uncomfortable on a stage, there is also a video when I gave this as a presentation at a Hacker News meetup.
Back in 2011 there were no readymade subscription box platforms to use. Even the term "subscription box" wasn't in popular use yet. To write the site I used Python on top of Google App Engine. The site has integrations with PayPal and Recurly (credit card middleware). No other major dependencies, just MixPanel and Google Analytics.
The landing page has 104 lines of code backing it. How many lines of Python code do you think the entire codebase of Candy Japan has? Write your guess down somewhere so you can see in the end how close you got. Don't include the HTML & CSS templates in your guess, just the Python code.
Besides the landing page, there are other pages which the users can see, such as the FAQ, list of candies that were sent before and the thumbnail generation for them. The images are in the App Engine BlobStore. These add a total of 337 lines of code.
Under the surface
If the landing page and other customer-facing pages are the tip of the iceberg, what lies underneath?
There is some code to talk to PayPal. When I originally did this integration, the API I could use was one called the PayPal Payments Standard NVP API. The acronym NVP stands for Name-Value Pair, which is a way to transmit responses back from PayPal in a way which is a bit more painful than JSON. This integration added 712 lines of code.
Recurly is middleware which sits between you and credit card payment gateways. It is designed to be easy to integrate with, so that only ended up adding 222 lines of code.
I thought it would be nice if people could buy each other gift cards. These would be prepaid cards you could buy and send to friends as a special link. The person purchasing the gift would not even need to know the shipping address of the recipient.
This turned out to be dangerous and is currently not enabled. I'll tell you why next. In any case, I already wrote 420 lines of code for it.
There are bad guys on the internet. Sometimes there are credit card leaks, some retailer gets their database hacked and as a result a bunch of valid numbers are released into the wild. Bad guys will then use the stolen card numbers to place purchases in online stores.
Gift cards are doubly attractive fraud targets, because not only do the criminals get to check if their card numbers work, but also get the card which possibly has some resale value. The fraudster could potentially sell the cards on eBay, leaving me with a chargeback and a sad situation where the eBay customer thought they were buying a legit card.
For the time being gift cards are disabled, but before the year-end holidays I would like to re-enable them, probably with PayPal only, as it is much more fraud-resistant.
To try to tell the bad guys from the good guys, I have 587 lines of fraud detection code.
After experiencing the problem with fraud, I had to stop using the Recurly shopping cart widget, because it wasn't fraud-resistant enough.
Instead I had to implement my own shopping cart flow to gather all possible signals which could help detect fraud more accurately. This added 510 lines of code.
After I get orders, I have to actually ship them too. With only a few subscribers, it would be easy to just deal with them manually. After you have hundreds of subscribers with thousands of past accounts also in your database, you want some kind of system to help you out.
I wrote some code to go through all the accounts and try to figure out who I should actually be sending candy to. This is not always so straightforward, as there are some edge cases too. For instance accounts can be paused, there may be manual adjustments to them. In some cases I might want to send stuff to people even if their payment hasn't gone through yet, if I have good reason to believe that it will come in soon.
When you make a thousand shipping labels a month, you end up spending a lot of money on the physical labels themselves. I wrote some more code to try to make a compact PDF file that can be printed in one go on a smalle number of sticker sheets. Compared to printing one at a time, this is both more economical and faster.
Shipping-related code adds 1165 lines.
Sometimes the post office returns packages to us. The address may have been wrong, or maybe the customer moved or wasn't home to accept the package.
It's extra work to manually find out which account returned boxes are for, emailing the customer and adjusting their accounts. To cut down on this labor, I print barcodes on the boxes so we can just scan them when they get returned. Now it takes about 10 seconds to process a returned package, compared to 10 - 15 minutes digging through the database and writing emails manually.
This does add 505 lines to the codebase.
There are all kinds of administrative tools as well. While you could look at the database directly, it's much more convenient to have a tool for viewing account information. You also want to be able to search for accounts based on name, address or subscription ID number.
Additionally reports need to be produced, for example for tax reasons. These admin tools add 634 lines of code.
If you put a website online, people will not just magically find you. In the beginning it's a bit easier to get visitors, as you can post a new site to places like Product Hunt. Some bloggers might also cover your site because it's new.
After you've been around longer, your site isn't news anymore and people won't hear about it unless you somehow keep promoting it. This costs money and effort. You end up writing code to try to measure and improve your marketing.
First of all, to know who my target audience is, I added a questionnaire step to the subscription flow.
You also need to know how much you can spend on ads, so you end up calculating things like retention rates. There are some ways to get more of visitors to convert, such as A/B testing and sending reminders to people who abandoned their carts.
Another way to market is to send free sample boxes to bloggers, but adding those orders manually to the database gets tiresome. I get around a dozen such requests each week. To make dealing with those take less time, there is an alternative sign-up flow only for bloggers. For this flow, there is no payment step, but instead I will look at each application and deny or accept it.
All this marketing stuff adds 1219 lines of code.
When someone joins, you want to send them an automatic welcome email. When a reviewer is accepted, you want to send an email telling them that. Also a reminder so they wouldn't forget to write the review when the box arrives.
I do sometimes just contact people manually for fun, but there are also automatic emails that go out.
These add 253 lines of code.
Total 8341 lines
How did you do? I'll rate your guess as "excellent" if you guessed between 5000 - 15000 lines. "Pretty good" if 3000 - 30000. "Meh" if 1000 - 50000. I'm flattered if you thought I could pull it off with less than 1000 lines. More than 50000? This isn't assembler.
I spent a few months actively developing the codebase and about 5 years gradually adding stuff to it.
It would have made sense to split my site into a CMS (maybe WordPress) and have the subscription part be separate. Everything visible on the site (FAQ, candy images, landing page, blog) could have been managed through the CMS. This would have reduced some code that tries to be a CMS, but does a poor job at it while adding unnecessary complexity.
The app engine NoSQL data store is not good for running reports. You end up writing Python code which would be better expressed in SQL. Otherwise I am happy with choosing App Engine.
I don't know how I could have expected it, but somehow from the start I should have prepared for fraud. You want to at least keep an eye on any suspicious activity and react quickly if you start getting many chargebacks.
Thanks to Artem Olegovich for the illustration.