MongoDB ที่ทำให้ผมตื่นตอนตี 3 (และบทเรียนจากมัน)

วันที่ไม่อยากจำ: เซิร์ฟเวอร์ล่มตอนตี 3

ยังจำได้เลยคืนวันนั้น กำลังนอนหลับสบายๆ แล้วมือถือดังใส่หู “เซิร์ฟเวอร์ล่มครับ! MongoDB หยุดตอบสนองแล้ว!” 😱

ลุกขึ้นมาด้วยความตกใจ เปิดเครื่องดู MongoDB logs เจอ error แบบนี้:

MongoError: connection pool destroyed
MongoError: server selection timeout after 30000ms
MongoError: cursor id not found

ตอนนั้นเป็น production site ที่มี user เข้ามาเยอะ ความเครียดระดับ 11/10 😰

สาเหตุคือ: ไม่เคยตั้ง connection pool limits และ index ไม่มี

เรื่องราวแรกกับ MongoDB

เริ่มใช้ MongoDB เพราะคิดว่า “NoSQL มันง่าย ไม่ต้องออกแบบ schema”

ความคิดแรกๆ:

แค่โยนข้อมูลลงไปก็ได้แล้ว ไม่ต้องสร้าง table
ไม่ต้องคิดเรื่อง relation
JSON structure ใช้งานง่าย

แต่เจอความจริงคือ:

Schema ก็ยังต้องออกแบบ (แต่เป็น flexible schema)
Performance ต้องใส่ใจเรื่อง indexing เหมือนกัน
Aggregation pipeline ซับซ้อนไม่แพ้ SQL

ความผิดพลาดขั้นแรก

1. ไม่มี Index แล้วทำ Query เล่นใหญ่

Code ตัวแรกเป็นแบบนี้:

// ❌ ไม่ดี - query โดยไม่มี index
const users = await User.find({ 
  createdAt: { $gte: new Date('2023-01-01') },
  status: 'active'
});

พอ collection มีข้อมูลเยอะขึ้น response time จาก 50ms กลายเป็น 5 วินาที 🐌

แก้โดยเพิ่ม compound index:

// สร้าง index
db.users.createIndex({ createdAt: 1, status: 1 });

// หรือใน Mongoose
userSchema.index({ createdAt: 1, status: 1 });

2. N+1 Query Problem

ใน relational database เจอปัญหานี้บ่อย แต่คิดไม่ถึงว่า MongoDB ก็เจอเหมือนกัน:

// ❌ N+1 queries - ช้ามาก
const posts = await Post.find();
for (let post of posts) {
  post.author = await User.findById(post.authorId); // Query 100 ครั้งถ้ามี 100 posts
}

แก้ด้วย Aggregation Pipeline:

// ✅ ดีกว่า - 1 query เดียว
const postsWithAuthor = await Post.aggregate([
  {
    $lookup: {
      from: 'users',
      localField: 'authorId',
      foreignField: '_id',
      as: 'author'
    }
  },
  {
    $unwind: '$author'
  }
]);

3. Connection Pool ไม่ตั้ง

ใช้ default connection pool แล้วพอ traffic เยอะขึ้น connection หมด:

// ❌ Default connection pool (มักจะน้อยเกินไป)
mongoose.connect('mongodb://localhost:27017/mydb');

// ✅ ตั้ง pool size ที่เหมาะสม
mongoose.connect('mongodb://localhost:27017/mydb', {
  maxPoolSize: 50,        // Maximum number of connections
  minPoolSize: 5,         // Minimum number of connections  
  maxIdleTimeMS: 30000,   // Close connections after 30s of inactivity
  serverSelectionTimeoutMS: 5000, // Keep trying to send operations for 5s
  socketTimeoutMS: 45000, // Close sockets after 45s of inactivity
});

เคสจริง: Aggregation Pipeline จากนรก

เจอ requirement: “อยากได้รายงานยอดขายรายเดือน แยกตาม category และ region”

เขียน aggregation pipeline ครั้งแรกออกมาแบบนี้:

// Version 1: ช้ามาก และใช้ memory เยอะ
const salesReport = await Order.aggregate([
  {
    $unwind: '$items' // ❌ ทำให้ document เยอะขึ้นเป็นเท่าตัว
  },
  {
    $lookup: {
      from: 'products',
      localField: 'items.productId',
      foreignField: '_id',
      as: 'product'
    }
  },
  {
    $unwind: '$product'
  },
  {
    $lookup: {
      from: 'categories',
      localField: 'product.categoryId', 
      foreignField: '_id',
      as: 'category'
    }
  },
  {
    $unwind: '$category'
  },
  {
    $group: {
      _id: {
        month: { $month: '$createdAt' },
        category: '$category.name',
        region: '$region'
      },
      totalSales: { $sum: '$items.price' }
    }
  }
]);

ปัญหา:

ใช้เวลา 30+ วินาที
ใช้ RAM เกือบหมด
Block การทำงานของ operations อื่น

แก้โดยเพิ่ม index และปรับ pipeline:

// Version 2: เร็วกว่า และใช้ memory น้อยกว่า

// 1. เพิ่ม indexes ที่จำเป็น
db.orders.createIndex({ createdAt: 1, region: 1 });
db.products.createIndex({ categoryId: 1 });

// 2. ใส่ $match ตัวแรก เพื่อ filter ข้อมูลก่อน
// 3. ใช้ $lookup ด้วย pipeline เพื่อ join เฉพาะข้อมูลที่ต้องการ
const salesReport = await Order.aggregate([
  {
    $match: {
      createdAt: { 
        $gte: new Date('2023-01-01'),
        $lt: new Date('2024-01-01')
      }
    }
  },
  {
    $unwind: '$items'
  },
  {
    $lookup: {
      from: 'products',
      let: { productId: '$items.productId' },
      pipeline: [
        { $match: { $expr: { $eq: ['$_id', '$$productId'] } } },
        {
          $lookup: {
            from: 'categories',
            localField: 'categoryId',
            foreignField: '_id',
            as: 'category'
          }
        },
        { $unwind: '$category' },
        { $project: { categoryName: '$category.name' } }
      ],
      as: 'productInfo'
    }
  },
  {
    $unwind: '$productInfo'
  },
  {
    $group: {
      _id: {
        month: { $month: '$createdAt' },
        year: { $year: '$createdAt' },
        category: '$productInfo.categoryName',
        region: '$region'
      },
      totalSales: { $sum: '$items.price' },
      orderCount: { $sum: 1 }
    }
  },
  {
    $sort: { '_id.year': 1, '_id.month': 1, '_id.category': 1 }
  }
]);

เวลาลดลงจาก 30 วินาที เหลือ 2 วินาที! 🚀

เรื่อง Schema Design ที่เรียนรู้

1. Embed vs Reference: ศิลปะแห่งการตัดสินใจ

ตอนแรกคิดว่า: ใส่ทุกอย่างใน document เดียวกันเลย เพราะ “NoSQL”

// ❌ Over-embedding
const userSchema = {
  name: String,
  email: String,
  orders: [
    {
      items: [
        {
          product: {
            name: String,
            price: Number,
            category: {
              name: String,
              description: String
            },
            reviews: [
              {
                user: { name: String, email: String },
                rating: Number,
                comment: String
              }
            ]
          },
          quantity: Number
        }
      ],
      total: Number
    }
  ]
}

ปัญหาที่เจอ:

Document size ใหญ่มาก (16MB limit)
Update ยาก (ต้อง update หลาย level)
Query performance แย่

หลักการที่เรียนรู้:

One-to-Few: ใช้ embedding

const userSchema = {
  name: String,
  email: String,
  addresses: [ // ไม่เกิน 10-20 addresses
    {
      type: String, // home, work, etc.
      street: String,
      city: String
    }
  ]
}

One-to-Many: ใช้ reference

const userSchema = {
  name: String,
  email: String
};

const orderSchema = {
  userId: ObjectId, // reference
  items: Array,
  total: Number
};

One-to-Squillions: ใช้ parent reference

const blogPostSchema = {
  title: String,
  content: String
};

const commentSchema = {
  postId: ObjectId, // reference กลับไป
  author: String,
  message: String
};

2. การออกแบบ Schema สำหรับ Performance

กฎสำคัญที่เรียนรู้:

Query Pattern เป็นตัวกำหนด Schema

// ถ้า query บ่อยแบบนี้
User.find({ 'profile.age': { $gte: 18 }, 'profile.location': 'Bangkok' })

// ต้องมี index แบบนี้
db.users.createIndex({ 'profile.age': 1, 'profile.location': 1 })

Denormalization เพื่อลด $lookup

// แทนที่จะเก็บแค่ productId
const orderSchema = {
  items: [
    {
      productId: ObjectId,
      // denormalize ข้อมูลที่ใช้บ่อย
      productName: String,
      price: Number,
      category: String
    }
  ]
};

เทคนิคขั้นสูงที่เรียนรู้จากการใช้งานจริง

1. Partial Indexes เพื่อประหยัด Space

// สร้าง index เฉพาะ documents ที่ตรงเงื่อนไข
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { status: 'active' } }
);

// Index นี้จะใช้เฉพาะกับ query แบบนี้
db.users.find({ email: 'test@test.com', status: 'active' });

2. Text Search ที่ใช้งานได้จริง

// สร้าง text index
db.articles.createIndex({
  title: 'text',
  content: 'text',
  tags: 'text'
}, {
  weights: {
    title: 10,    // title สำคัญที่สุด
    content: 5,   // content รองลงมา
    tags: 1       // tags ธรรมดา
  }
});

// Search
const articles = await Article.find(
  { $text: { $search: 'mongodb performance' } },
  { score: { $meta: 'textScore' } }
).sort({ score: { $meta: 'textScore' } });

3. Change Streams สำหรับ Real-time Features

// Watch changes ใน collection
const changeStream = db.orders.watch([
  { $match: { 'fullDocument.status': 'completed' } }
]);

changeStream.on('change', (change) => {
  if (change.operationType === 'update') {
    // ส่ง notification ไป user
    sendOrderCompletedEmail(change.fullDocument);
    
    // Update analytics
    updateSalesReport(change.fullDocument);
  }
});

4. Transactions เมื่อจำเป็น (ใช้น้อยๆ)

const session = await mongoose.startSession();

try {
  await session.withTransaction(async () => {
    // ลด stock
    await Product.updateOne(
      { _id: productId },
      { $inc: { stock: -quantity } },
      { session }
    );
    
    // สร้าง order
    const order = new Order({
      userId,
      items: [{ productId, quantity, price }],
      total: price * quantity
    });
    await order.save({ session });
    
    // เพิ่ม points ให้ user
    await User.updateOne(
      { _id: userId },
      { $inc: { points: Math.floor(total * 0.01) } },
      { session }
    );
  });
} finally {
  await session.endSession();
}

ปัญหาใน Production ที่เจอจริง

1. Memory Usage ที่พุ่งสูง

สาเหตุ: Query ที่ return ข้อมูลเยอะมาก

// ❌ โหลดข้อมูลทั้งหมด
const allUsers = await User.find(); // 1 ล้าน records

// ✅ ใช้ pagination + projection
const users = await User.find()
  .select('name email createdAt') // เลือกเฉพาะ field ที่ต้องการ
  .limit(20)
  .skip(page * 20)
  .sort({ createdAt: -1 });

2. Slow Queries

ใช้ MongoDB Profiler หา slow queries:

// Enable profiler
db.setProfilingLevel(1, { slowms: 100 });

// ดู slow queries
db.system.profile.find().sort({ ts: -1 }).limit(5);

แก้โดย:

เพิ่ม indexes ที่เหมาะสม
ใช้ explain() ดู query plan
Optimize aggregation pipeline

// ดู query plan
db.users.find({ age: { $gte: 18 } }).explain('executionStats');

3. การ Backup ที่เรียนรู้จากความผิดพลาด

ครั้งแรก: ใช้ mongodump ตอน peak time

ผลคือ:

Performance ลดลงมาก
Lock database บางส่วน
User complain

ตอนนี้: ใช้ mongodump กับ replica set

# Backup จาก secondary node
mongodump --host secondary-server:27017 --out /backup/$(date +%Y%m%d)

# หรือใช้ --oplog เพื่อ point-in-time recovery
mongodump --oplog --out /backup/$(date +%Y%m%d)

Performance Tuning Checklist

จากประสบการณ์หลายปี สรุปเป็น checklist:

1. Indexes

Query patterns ที่ใช้บ่อยมี index หรือยัง
Compound index เรียงลำดับถูกต้องไหม (Equality, Sort, Range)
มี unused indexes ไหม (ลบออก)
ใช้ partial indexes กับ sparse data

2. Schema Design

Document size ไม่เกิน 1-2 MB
Embed กับ reference ใช้ถูกต้องตาม use case
Denormalization เพื่อลด $lookup ที่ไม่จำเป็น
Array fields ไม่ใหญ่เกินไป

3. Queries

ใช้ projection จำกัด fields
Limit กับ skip ใช้อย่างระมัดระวัง
Aggregation pipeline เรียงลำดับ $match ก่อน
ใช้ $sample แทน random sort

4. Connection & Infrastructure

Connection pooling ตั้งค่าเหมาะสม
Read/Write concerns ตั้งค่าตาม requirement
Monitoring tools (MongoDB Compass, ops manager)
Regular backups กับ testing restore

เครื่องมือที่ช่วยให้ชีวิตง่ายขึ้น

1. MongoDB Compass

GUI ที่ใช้งานง่าย ดู schema, query, index ได้

2. mongoose-profiler

const mongoose = require('mongoose');
require('mongoose-profiler')(mongoose);

// จะแสดง query time ใน console

3. MongoDB Atlas Performance Advisor

แนะนำ indexes ที่ควรเพิ่มตาม query patterns

4. Studio 3T

GUI ขั้นสูง มี SQL to MongoDB query converter

บทเรียนที่ได้จากปีที่ผ่านมา

ข้อดีของ MongoDB ที่สัมผัสได้จริง:

Flexible Schema - เพิ่ม fields ใหม่ได้ง่าย
JSON Native - ใช้กับ Node.js เข้ากันดี
Horizontal Scaling - Scale ออกได้ตาม traffic
Rich Query Language - Aggregation framework เจ๋งมาก
Change Streams - Real-time features ทำได้ง่าย

ข้อเสียที่ต้องยอมรับ:

Memory Hungry - กิน RAM เยอะกว่า SQL databases
Complex Aggregations - Pipeline ซับซ้อนอ่านยาก
No ACID across Collections - transactions จำกัด
Learning Curve - ต้องเรียนรู้ NoSQL patterns
Storage Size - เก็บข้อมูลใช้พื้นที่เยอะกว่า

คำแนะนำสำหรับคนที่เริ่มใช้ MongoDB

1. ออกแบบ Schema ตาม Query Patterns

อย่าคิดแค่ว่าจะเก็บข้อมูลยังไง แต่คิดว่าจะดึงข้อมูลยังไง

2. เรียนรู้ Aggregation Framework

มันเทพมาก แต่ซับซ้อน เริ่มจากง่ายๆ:

$match - filter
$group - group by
$project - select fields
$sort - order by
$lookup - join

3. Monitor ตั้งแต่แรก

อย่าเอา MongoDB ขึ้น production โดยไม่มี monitoring:

Slow query logs
Memory usage
Index usage stats
Connection pool status

4. Backup ที่ทดสอบได้

Backup ที่ restore ไม่ได้คือไม่มี backup

5. เรียนรู้จากคนอื่น

MongoDB University (free courses)
MongoDB Community forums
Follow best practices guides

สรุป: MongoDB กับบทเรียนจากคืนที่ไม่อยากจำ

จากคืนที่ตื่นตอนตี 3 เพราะ MongoDB ล่ม ทำให้เรียนรู้ว่า:

MongoDB ไม่ใช่ silver bullet ที่แก้ปัญหาทุกอย่าง

แต่ถ้าใช้เป็น มันก็เป็นเครื่องมือที่ทรงพลังมาก:

Flexible มากกว่า SQL databases
Scale ได้ดีกว่า
เหมาะกับ modern applications

สิ่งสำคัญที่สุดคือ ต้องเข้าใจ trade-offs และใช้งานอย่างมีสติ:

ต้องการ ACID transactions แน่นๆ → ใช้ SQL database
ต้องการ complex joins เยอะ → ใช้ SQL database
ต้องการ flexibility กับ scale → MongoDB เป็นตัวเลือกที่ดี

และที่สำคัญที่สุด: อย่าลืม monitoring กับ backup!

เพราะการตื่นตอนตี 3 เพราะ database ล่มมันไม่สนุกเลย 😅

แต่พอผ่านมาได้ แล้วเรียนรู้จากความผิดพลาด ตอนนี้ MongoDB กลายเป็นเพื่อนที่ไว้ใจได้แล้ว 🍃