Monday, July 20, 2015

Avoiding EEPROM and Flash Memory Wearout


Summary: If you're periodically updating a particular EEPROM value every few minutes (or every few seconds) you could be in danger of EEPROM wearout. Avoiding this requires reducing the per-cell write frequency. For some EEPROM technology anything more frequent than about once per hour could be a problem. (Flash memory has similar issues.)

Time Flies When You're Recording Data:

EEPROM is commonly used to store configuration parameters and operating history information in embedded processors. For example, you might have a rolling "flight recorder" function to record the most recent operating data in case there is a system failure or power loss. I've seen specifications for this sort of thing require recording data every few seconds.

The problem is that  EEPROM only works for a limited number of write cycles.  After perhaps 100,000 to 1,000,000 (depending on the particular chip you are using), some of your deployed systems will start exhibiting EEPROM wearout and you'll get a field failure. (Look at your data sheet to find the number. If you are deploying a large number of units "worst case" is probably more important to you than "typical.")  A million writes sounds like a lot, but they go by pretty quickly.  Let's work an example, assuming that a voltage reading is being recorded to the same byte in EEPROM every 15 seconds.

1,000,000 writes at one write per 15 seconds is 4 writes per minute:
  1,000,000 / ( 4 * 60 minutes/hr * 24 hours/day ) = 173.6 days.
In other words, your EEPROM will use up its million-cycle wearout budget in less than 6 months.

Below is a table showing the time to wearout (in years) based on the period used to update any particular EEPROM cell. The crossover values for 10 year product life are one update every 5 minutes 15 seconds for an EEPROM with a million cycle life. For a 100K life EEPROM you can only update a particular cell every 52 minutes 36.  This means any hope of updates every few seconds just aren't going to work out if you expect your product to last years instead of months. Things scale linearly, although in real products secondary factors such as operating temperature and access mode can play a factor.



Reduce Frequency
The least painful way to resolve this problem is to simply record the data less often. In some cases that might be OK to meet your system requirements.

Or you might be able to record only when things change more than a small amount, with a minimum delay between successive data points. However, with event-based recording be mindful of value jitter or scenarios in which a burst of events can wear out EEPROM.

(It would be nice if you could track how often EEPROM has been written. But that requires a counter that's kept in EEPROM ... so that idea just pushes the problem into the counter wearing out.)

Low Power Interrupt
In some processors there is a low power interrupt that can be used to record one last data value in EEPROM as the system shuts down due to loss of power. In general you keep the value you're interested in a RAM location, and push it out to EEPROM only when you lose power.  Or, perhaps, you record it to EEPROM once in a while and push another copy out to EEPROM as part of shut-down to make sure you record the most up-to-date value available.

It's important to make sure that there is a hold-up capacitor that will keep the system above the EEPROM programming voltage requirement for long enough.  This can work if you only need to record a value or two rather than a large block of data. But it is easy to get this wrong, so be careful!

Rotating Buffer
The classical solution for EEPROM wearout is to use a rotating buffer (sometimes called a circular FIFO) of the last N recorded values. You also need a counter stored in EEPROM so that after a power cycle you can figure out which entry in the buffer holds the most recent copy. This reduces EEPROM wearout proportionally to the number of copies of the data in the buffer. For example, if you rotate through 10 different locations that take turns recording a single monitored value, each location gets modified 1/10th as often, so EEPROM wearout is improved by a factor of 10. You also need to keep a separate counter or timestamp for each of the 10 copies so you can sort out which one is the most recent after a power loss.  In other words, you need two rotating buffers: one for the value, and one to keep track of the counter. (If you keep only one counter location in EEPROM, that counter wears out since it has to be incremented on every update.)  The disadvantage of this approach is that it requires 10 times as many bytes of EEPROM storage to get 10 times the life, plus 10 copies of the counter value.  You can be a bit clever by packing the counter in with the data. And if you are recording a large record in EEPROM then an additional few bytes for the counter copies aren't as big a deal as the replicated data memory. But any way you slice it, this is going to use a lot of EEPROM.

Atmel has an application note that goes through the gory details:
AVR-101: High Endurance EEPROM Storage:  http://www.atmel.com/images/doc2526.pdf

Special Case For Remembering A Counter Value
Sometimes you want to keep a count rather than record arbitrary values. For example, you might want to count the number of times a piece of equipment has cycled, or the number of operating minutes for some device.  The worst part of counters is that the bottom bit of the counter changes on every single count, wearing out the bottom count byte in EEPROM.

But, there are special tricks you can play. An application note from Microchip has some clever ideas, such as using a gray code so that only one byte out of a multi-byte counter has to be updated on each count. They also recommend using error correcting codes to compensate for wear-out. (I don't know how effective ECC will be at wear-out, because it will depend upon whether bit failures are independent within the counter data bytes -- so be careful of using that idea). See this application note:   http://ww1.microchip.com/downloads/en/AppNotes/01449A.pdf

Note: For those who want to know more, Microchip has a tutorial on the details of wearout with some nice diagrams of how EEPROM cells are designed:
ftp://ftp.microchip.com/tools/memory/total50/tutorial.html

Don't Re-Write Unchanging Values
Another way to reduce wearout is to read the current value in a memory location before updating. If the value is the same, skip the update, and eliminate the wearout cycle associated with an update that has no effect on the data value. Make sure you account for the worst case (how often can you expect values to be the same?). But even if the worst case is bad, this technique will give you a little extra margin of safety if you get lucky once in a while and can skip writes.

If you've run into any other clever ideas for EEPROM wearout mitigation please let me know.

Leraning More
Nash Reilly has a nice series of tutorial postings on how Flash/EEPROM technology works. (I found out about these via Jack Ganssle's newsletter.)
http://cushychicken.github.io/nand-pt1-transistors/
http://cushychicken.github.io/nand-pt2-floating/
http://cushychicken.github.io/nand-pt3-arrays/
http://cushychicken.github.io/nand-pt4-pages-blocks/
http://cushychicken.github.io/nand-pt5-how-nand-breaks/
http://cushychicken.github.io/nand-pt6-dealing-with-flaws/ 
http://cushychicken.github.io/inconvenient-truths/

Oct 2019: Tesla is said to have a flash wearout problem for its SSDs.  https://insideevs.com/news/376037/tesla-mcu-emmc-memory-issue/


12 comments:

  1. If you ever faced the problem, that EEPROM was to expensive for your application, then probably you heard about "EEPROM emulation" / "Flash EEPROM emulatin" / "FEE".
    With flash technology you have 1 to 3 problems:
    1. allowed erase cycles of 1k (sometimes 10k)
    2. larger erasable units, so calles pages (1kByte or more)
    3. larger writeable units (typically 4 Bytes, but even 64 Bytes are possible)

    This means any concept that works with cyclically storing data in flash memory should perfectly work with a real EEPROM.

    I once worked on developing such a FEE-module.
    Our concept was following:
    Two or more pages are used. Only one page is active and holds valid data. This is marked in the page header. Data is stored together with an unique ID in the same page. If data needs to be updated, it is stored into free memory of the same page. Searching for the ID is started at the end of page, so always the newest data is found. If the page is full, all newest data is copied to the other page and the old one will be erased.

    ReplyDelete
  2. Thanks for sharing your experience. This sounds like a variation of the rotating buffers suitable for large pages. It will help to use hardware that while it is erased in bulk, is written a byte at a time so that the hardware doesn't erase and re-write the whole page on every update.

    I've also seen some tricky ways to encode bits for counters by erasing once and then doing multiple writes to the same byte that carefully change only one bit at a time to reduce wearout. (I used to know name for that technique but I lost it a long time ago.) There are lots of tricky things you can do if you are sure you understand your hardware. The techniques in this posting are just the most likely to work for any EEPROM ideas.

    ReplyDelete
  3. Perhaps now it would be prudent to avoid the wearout altogether by changing the memory technology from EEPROM to FRAM or MRAM. Considering the costs of extra development time and getting it wrong, adding an extra memory chip is not too audacious a solution. There are also MCUs with FRAM inside enabling a simple single-chip unified memory design.

    ReplyDelete
  4. One manufacturer indicated that a "cycle" only occurs when the byte is erased, and that bits may be written from 1 to 0 without using a "cycle". This means that byte use may be extended by:
    - Counting by setting successive bits from 1 to 0. This extends the byte life by 8x.
    - Counting down. Counting up requires a byte erase every cycle. Counting down requires a byte erase only every other cycle. This extends counter life by 2x with very little pain.

    ReplyDelete
    Replies

    1. Part 1: the successive bit trick is an advanced technique I was referring to. IF your hardware supports this (which you need to be careful to verify), you can replace the low 3 bits of a counter with:
      0xFF = 000 (binary)
      0xFE = 001
      0xFC = 010
      0xF8 = 011
      0xF0 = 100
      0xE0 = 101
      0xC0 = 110
      0x80 = 111
      0x00 (unused)

      Part 2 -- Well OK, but 2x isn't a lot of gain.
      If you really want to play this game I'd suggest looking into a gray code. But I haven't checked the details of that approach.

      In both cases you have to ask questions like if your chip provider changes that aspect of your part will all your devices with the new version of the component start failing? So I try to avoid tricks like this unless there is really no choice.

      Delete
  5. Also in recent EEPROM's, there is PAGE endurance specified in DS, not cell, and this page could be up to 255 bytes in size (I guess they internals are more life FLASH devices, rather then true EEPROM, you can write and read any particular cell freely, but can erase only bunch of them). So, if using device like 25LC1024 with page=256bytes, even if you implement 10x FIFO, you still have same wearout as without it, as long as your data remains in the same page, so you have scatter it across this pages (with HUGE overhead). I learned it in a hard way...
    http://www.microchip.com/ParamChartSearch/chart.aspx?branchID=70038&mid=&lang=en

    ReplyDelete
  6. Another, maybe obvious idea. Say you need to store and often update two 1-byte variables, A and B. You use first two 2-byte words of EEPROM to store offsets where variables A and B are located (initially they will be 0004 and 0005, since locations 0...3 are occupied with offset value words). To update a variable (say B) you take its current offset from word 2 (bytes 2 & 3) of EEPROM (initially it will be "0005") and write there. Then you check if your data are written correctly. If not (EEPROM is weared out at offset 0005) you try the next unused offset (say 0006). If it works, you write 0006 into bytes 2&3, and do your next 100000 writes of variable B into this new offset. Then you repeat these steps until you wear out the whole EEPROM.

    ReplyDelete
  7. Switch to FRAM and avoid the problem entirely. We calculated that we could write to a location in FRAM every millisecond for 31 years before wearing out the location.

    ReplyDelete
  8. thank you for the useful information. the Atmel and Microchip application notes are specially welcomed.

    I'd like to add that a side effect of avoiding same-value writes is saving time. for example, in a Microchip PIC a single EEPROM byte write takes 5 ms. saving this time is not only beneficial for normal operation CPU load, but also gives a safety margin in the case of emergency data save during a brown-out event. win-win.

    ReplyDelete
  9. here is another idea that combines some of the others: Let say you have a 32 bit counter.You make unions in RAM with 4 bytes, a 32 bit uint as the the counter and a 4 bytes array something like: union{ struct{ uint8_t b1;uint8_t b2;uint8_t b3;uint8_t b4;}b;uint32_t counter; uint8_t ca[4];};. You will also need a single eeprom byte for recording the offset, Now you increment the counter and compar the 4 bytes starting from b1 to see which bytes changed and write these bytes to eeprom by using ca[0-n]. Next you read the counter and compare it to the one you just incremented and wrote to the eeprom. If not equal, you increment the offset, write it to eeprom and try to write the union to eeprom again (using the new offset) until you can read what you wrote. This way with allocating 8 byte you can almost get 4 times the endurance of the eeprom

    ReplyDelete
  10. Hi, This is Ram
    I am using Flowcode software for microcontroller programming. And the ic is PIC18f46k40 we have some problem in eeprom, if we store some value in specific address then after 3 or 4 month it will automatically change
    and if we set again then it will work for 4 or 5 month and again same problem. Their is no power fluctuation but i can not understand why this is happen so please guide how to solve

    ReplyDelete
  11. See this other page for the usual suspects:
    https://betterembsw.blogspot.com/2011/11/avoiding-eeprom-corruption-problems.html

    ReplyDelete

Please send me your comments. I read all of them, and I appreciate them. To control spam I manually approve comments before they show up. It might take a while to respond. I appreciate generic "I like this post" comments, but I don't publish non-substantive comments like that.

If you prefer, or want a personal response, you can send e-mail to comments@koopman.us.
If you want a personal response please make sure to include your e-mail reply address. Thanks!

Job and Career Advice

I sometimes get requests from LinkedIn contacts about help deciding between job offers. I can't provide personalize advice, but here are...